Unicode Support in the Solaris Operating Environment

3.1 Unicode UTF-8 en_US.UTF-8 Locale

en_US.UTF-8 is the flagship Unicode locale in the Solaris operating environment. The en_US.UTF-8 locale is an American English-based locale with multiscript processing support for characters in many different languages. New and enhanced features of all Unicode locales include support of the Unicode 3.0 character set, complex text layout scripts in correct rendition, native Asian input methods, more MIME character sets in dtmail, various new iconv code conversions, and an enhanced PostScript print filter.

All Unicode locales in the Solaris operating environment support multiple scripts. Thirteen input modes area available: English/European, Cyrillic, Greek, Arabic, Hebrew, Thai, Unicode Hex, Unicode Octal, Table lookup, Japanese, Korean, Simplified Chinese, and Traditional Chinese.

Users can input characters from any combination of scripts and the entire Unicode coding space.


Note -

To choose an input mode, press the Compose key and a two-letter code. For example, to input text in Thai, press Compose+tt. Alternatively, click the status area and select an input mode as shown in Figure 3-1. (To select the default English/European mode, press Control+Space.)


Table 3-1 UTF-8 Input Mode two-letter codes

Language 

Code 

Cyrillic 

cc

Greek 

gg

Thai 

tt

Arabic 

ar

Hebrew 

hh

Unicode Hex 

uh

Unicode Octal 

uo

Lookup 

ll

Japanes 

ja

Korean 

ko

Simplified Chinese 

sc

Traditional Chinese 

tc

English/European 

Control+Space

Figure 3-1 UTF-8 Input Mode selection

Graphic

To input text from a Lookup table, select the Lookup input mode. A lookup table with all input modes and various symbol and technical codesets appears, as shown in Figure 3-2.

The Table lookup input mode is the easiest for non-native speakers to input characters in a foreign language--a lookup window displays characters from a selected script, as shown for the Asian input mode in Figure 3-3.

The Arabic, Hebrew, and Thai input modes provide full complex text layout features, including right-to-left display and context-sensitive character rendering. The Unicode octal and hexadecimal code input modes generate Unicode characters from their octal and hexadecimal equivalents, respectively.

The Japanese, Korean, Simplified Chinese, and Traditional Chinese input modes provide full native Asian input.

Figure 3-2 UTF-8 Table Lookup

Graphic

Figure 3-3 Asian input mode

Graphic

For more information on each input method, refer to the chapter Overview of en_US.UTF-8 Locale Support in the latest Solaris International Language Environments Guide, ATOK12 User's Guide, Wnn6 User's Guide, cs00 User's Guide, Korean Solaris User's Guide, Simplified Chinese Solaris User's Guide, and Traditional Chinese Solaris User's Guide.

The Unicode locales can use the enhanced mp(1) printing filter to print text files. mp(1) prints flat text files written in UTF-8 using various Solaris system and printer resident fonts (such as bitmap, Type1, TrueType) depending on the script. The output is standard PostScript. For more information, refer to the mp(1) man page.

The Unciode locale supports various MIME character sets in dtmail, including various Latin, Greek, Cyrillic, Thai, and Asian character sets. Some of the example character sets are: ISO-8859-1 ~ 10, 13, 14, 15, UTF-8, UTF-7, UTF-16, UTF-16BE, UTF-16LE, Shift_JIS, ISO-2022-JP, EUC-KR, ISO-2022-KR, TIS-620, Big5, GB2312, KOI8-R, KOI8-U, and ISO-2022-CN. With this support, users can send and receive email messages encoded in MIME character sets from almost any region in the world. dtmail automatically decodes e-mail by recognizing the MIME character set and content transfer encoding in the message. The sender specifies the MIME character set for the recipient mail user agent.

Figure 3-4 Multiple character sets in dtmail

Graphic