Locales With Non-UTF-8 Character Sets - International Language Environments Guide for Oracle® Solaris 11.4

Language:

Locales With Non-`UTF-8` Character Sets

To avoid conversion issues, Oracle Solaris locales use the UTF-8 encoding form described in UTF-8 Overview of the Unicode character set. All supported languages have a UTF-8 locale as the preferred and supported form.

For historical, technical, and legal reasons, non-UTF-8 locales are also available in Oracle Solaris – the C locale, legacy single-byte (8-bit) ISO locales for EMEA languages, and traditional locales for APAC languages.

Single-byte character sets were popular in the past because they used just one byte (8 bits) to represent one character. But due to the limited size of the sets (a maximum of 256 characters), different languages have to use different character sets. This introduces many problems – a file created in one character set is often unreadable in another character set, representing a multilanguage document is an issue, and also many languages have more characters than can be represented by a single byte, and the like. For these languages, such as Chinese, different traditional multibyte character sets were created.

The non-UTF-8 locales, also called legacy or traditional locales, have limited support in Oracle Solaris 11.4. These limited support locales are not installed by default. Localization that exists for a UTF-8 locale might not be available in the non-UTF-8 locale variant.

The legacy locales are not installed by Oracle Solaris installer. You can install the legacy locales using nlsadm(8). For more information, see Working with Languages and Locales in this book.

Locale facets also need to be set correctly. For more information, see Locale Facets.