International Language Environments Guide for Oracle® Solaris 11.2

Exit Print View

Updated: July 2014
 
 

Locales With Non-UTF-8 Character Sets

To avoid conversion issues, Oracle Solaris locales use the UTF-8 encoding form described in UTF-8 Overview of the Unicode character set. All supported languages have a UTF-8 locale as the preferred and supported form.

For historical, technical, and legal reasons, non-UTF-8 locales are also available in Oracle Solaris - the C locale, legacy single-byte (8–bit) ISO locales for EMEA languages, and traditional locales for APAC languages.

Single-byte character sets were popular in the past because they used just one byte (8 bits) to represent one character. But due to the limited size of the sets (a maximum of 256 characters), different languages have to use different character sets. This introduces many problems - a file created in one character set is often unreadable in another character set, representing a multilanguage document is an issue, and also many languages have more characters than can be represented by a single byte, and the like. For these languages, such as Chinese, different traditional multibyte character sets were created.

The non-UTF-8 locales, also called legacy or traditional locales, have limited support in Oracle Solaris 11. These limited support locales are not available in the GDM login dialog and are not installed by default. Localization that exists for a UTF-8 locale might not be available in the non-UTF-8 locale variant. Some of the limited support locales might be removed from future Oracle Solaris releases.

The legacy locales are not installed by Oracle Solaris installer. To enable these locales, you must install the system/locale/extra package manually, for example,

# pkg install system/locale/extra

Locale facets also need to be set correctly. For more information, see Locale Facets.