Korean Solaris User's Guide

Supported Character Sets

The locale that you choose determines the characters that are available for input. If you select the ko_KR.EUC locale, for example, you can enter the characters in the KS X 1001 code set. In the ko_KR.UTF–8 locale, you can input all of the 11,172 Korean characters that are composed according to the Johap principle. The following descriptions summarize the encoding standards that define the characters for the ko_KR.EUC locale and for the ko_KR.UTF–8 locale.

Table 4–1 Character Code Standards

ko_KR.EUC (ko) locale

Wansung code 

This two-byte code is specified in Korean Industry Standard KS X 1001, known as KS C 5601-1987, for Hangul, Hanja, and other characters. In the Korean Solaris software, the KS X 1001 characters are in the EUC code set. 

ko_KR.UTF-8 (ko.UTF-8) locale

 

Johap or Packed code 

This two-byte code consists of a leading bit followed by three 5-bit fields. The three fields each contain the codes or a leading consonant that is followed by a vowel and a final consonant, if there is one, for a Hangul character. This two-byte code is specified in Korean Industry Standard KS C 5601-1992–3. 

ko.UTF-8

Korean Universal Multiple Octet Coded Character Set (UCS) Transmission Format. ko.UTF-8 supports all the characters of KS C 5601 and the 11,172 characters from Johap, as well as all Korean-related Unicode 3.2 characters and fonts. ko.UTF-8 supports the following subset of Unicode:

  • Basic Latin and Latin-i (190 characters) – Row 00 of BMP (Basic Multilingual Plane)

  • Symbolic characters – Row 20 to Row 27, and Row 32 of BMP including box (line) drawing characters that are defined in KS C 5601

  • Numerals that are defined in KS C 5601 (20 characters) – Row 21 and Row FF of BMP

  • Roman, Greek, Japanese, and Cyrillic alphabet characters that are defined in KS C 5601 (362 characters) – Row 02, Row 04, Row 30 and Row FF of BMP

  • Jamo (Hangul alphabet) characters (94 characters) – Row 31 of BMP

  • Pre-composed Hangul syllables (11,172 characters) – From Row AC to Row D7 of BMP

  • Hanja characters defined in KS C 5601 (4,888 characters) – From Row 4E to Row 9F and from Row F9 to Row FA of BMP