The Unicode/UTF-8 locales support Unicode 4.0. The en_US.UTF-8 locale provides multiscript processing support by using UTF-8 as its codeset. This locale handles processing of input and output text in multiple scripts, and was the first locale with this capability in the Oracle Solaris operating system. The capabilities of other UTF-8 locales are similar to those of en_us.UTF-8. The discussion of en_US.UTF-8 that follows applies equally to these locales.
UTF-8 is a file-system safe Universal Character Set Transformation Format of Unicode/ISO/IEC 10646-1 formulated by X/Open-Uniforum Joint Internationalization Working Group (XoJIG) in 1992 and approved by ISO and IEC, as Amendment 2 to ISO/IEC 10646-1:1993 in 1996. This standard has been adopted by the Unicode Consortium, the International Standards Organization, and the International Electrotechnical Commission as a part of Unicode 4.0 and ISO/IEC 10646-1.
Unicode locales in the Oracle Solaris environment support the processing of every code point value that is defined in Unicode 4.0 and ISO/IEC 10646-1 and 10646-2. Supported scripts include pan-European and Asian scripts and also complex text layout scripts for the Arabic, Hebrew, Indic, and Thai languages.
Some Unicode locales, notably the Asian locales, include more Kanji or Hanzi glyphs.
Due to limited font resources, the current Oracle Solaris Unicode locales include character glyphs from the following character sets.
ISO 8859-1 (most Western European languages, such as English, French, Spanish, and German)
ISO 8859-2 (most Central European languages, such as Czech, Polish, and Hungarian)
ISO 8859-6 (Arabic, including many more presentation-form character glyphs)
ISO 8859-9 (Turkish)
TIS 620.2533 (Thai, including many more presentation-form character glyphs)
ISO 8859–15 (most Western European languages with euro sign)
KSC 5601–1992 Annex 3 (Korean)
HKSCS (Traditional Chinese, Hong Kong)
IS 13194.1991, also known as ISCII (Hindi, including many more presentation-form character glyphs)
If you try to view characters for which the en_US.UTF-8 locale does not have corresponding glyphs, the locale displays a no-glyph glyph instead, as shown in the following illustration:
The locale is selectable at installation time and may be designated as the system default locale.
The same level of en_US.UTF-8 locale support is provided for both 64-bit and 32-bit Oracle Solaris systems.
Motif and CDE desktop applications and libraries support the en_US.UTF-8 locale. However, XView™ and OLIT libraries do not support the en_US.UTF-8 locale.