Three types of coding conventions are currently supported in the Korean Solaris software:
N-byte code. This single-byte code has each byte represent a consonant or vowel. These are combined together to build Hangul characters.
Johap or Packed code. This two-byte code consists of a leading bit followed by three 5-bit fields. These three fields contain the codes for a leading consonant, followed by a vowel, followed by a final consonant (if any) for a Hangul character. This two-byte code is specified in Korean Industry Standard KS C 5601-1992.
Wansung code. This two-byte code is specified in Korean Industry Standard KS C 5601-1987 for Hangul, Hanja, and other characters. In the Korean Solaris software these KS C 5601-1987 characters are in EUC codeset 1.
ko.UTF-8 - Korean Universal Multiple Octet Coded Character Set (UCS) Transmission Format. See "The ko.UTF-8 Locale " for further information.
Korean Solaris software provides code conversion between these four Korean code conventions at three levels of support:
User commands support file transfers for existing files in different codes.
Library functions support application development for existing codes.
STREAMS modules support existing TTY devices using different codes.
The Korean government announced the standard Korean codeset KS C 5700, which is based on Unicode 2.0. KS C 5700 will be widely used in the Korean market, replacing the previous standard, KS C 5601, which is based on ISO 2022.
To comply with this new standard, the ko.UTF-8 locale was developed. UTF-8 is a file system safe (Universal Character Set Transformation Format) Unicode, which is based on ISO 10646-1/Unicode 2.0.
ko.UTF-8 supports all the characters of KSC 5601 and 11,172 characters from Johap. ko.UTF-8 supports all Korean-related Unicode 2.0 characters and fonts. All Unicode characters can be accepted and processed, but some cannot be correctly displayed because of input and output limitations.
ko.UTF-8 supports the following subset of Unicode:
Basic Latin and Latin-1 (190 characters) - Row 00 of BMP (Basic Multilingual Plan)
Symbolic characters - Row 20 to Row 27, and Row 32 of BMP Including box (line) drawing characters that are defined in KS C 5601
Numerals that are defined in KSC 5601 (20 characters) - Row 21 and Row FF of BMP
Roman, Greek, Japanese, and Cyrillic alphabet characters that are defined in KS C 5601 (362 characters) - Row 03, Row 04, Row 30 and Row FF of BMP
Jamo (Hangul alphabet) characters (94 characters) - Row 31 of BMP
Pre-composed Hangul syllables (11,172 characters) - From Row AC to Row D7 of BMP
Hanja characters defined in KS C 5601 (4,888 characters) - From Row 4E to Row 9F and from Row F9 to Row FA of BMP