Asian Application Developer's Guide

Korean Solaris Supported Character Sets

Three types of coding conventions are currently supported in the Korean Solaris software:

N-byte code. This single-byte code has each byte represent a consonant or vowel. These are combined together to build Hangul characters.
Johap or Packed code. This two-byte code consists of a leading bit followed by three 5-bit fields. These three fields contain the codes for a leading consonant, followed by a vowel, followed by a final consonant (if any) for a Hangul character. This two-byte code is specified in Korean Industry Standard KS C 5601-1992.
Wansung code. This two-byte code is specified in Korean Industry Standard KS C 5601-1987 for Hangul, Hanja, and other characters. In the Korean Solaris software these KS C 5601-1987 characters are in EUC codeset 1.
ko.UTF-8 - Korean Universal Multiple Octet Coded Character Set (UCS) Transmission Format. See "The ko.UTF-8 Locale " for further information.

Korean Solaris software provides code conversion between these four Korean code conventions at three levels of support:

User commands support file transfers for existing files in different codes.
Library functions support application development for existing codes.
STREAMS modules support existing TTY devices using different codes.

The `ko.UTF-8` Locale

The Korean government announced the standard Korean codeset KS C 5700, which is based on Unicode 2.0. KS C 5700 will be widely used in the Korean market, replacing the previous standard, KS C 5601, which is based on ISO 2022.

To comply with this new standard, the ko.UTF-8 locale was developed. UTF-8 is a file system safe (Universal Character Set Transformation Format) Unicode, which is based on ISO 10646-1/Unicode 2.0.

ko.UTF-8 supports all the characters of KSC 5601 and 11,172 characters from Johap. ko.UTF-8 supports all Korean-related Unicode 2.0 characters and fonts. All Unicode characters can be accepted and processed, but some cannot be correctly displayed because of input and output limitations.

ko.UTF-8 supports the following subset of Unicode:

Basic Latin and Latin-1 (190 characters) - Row 00 of BMP (Basic Multilingual Plan)
Symbolic characters - Row 20 to Row 27, and Row 32 of BMP Including box (line) drawing characters that are defined in KS C 5601
Numerals that are defined in KSC 5601 (20 characters) - Row 21 and Row FF of BMP
Roman, Greek, Japanese, and Cyrillic alphabet characters that are defined in KS C 5601 (362 characters) - Row 03, Row 04, Row 30 and Row FF of BMP
Jamo (Hangul alphabet) characters (94 characters) - Row 31 of BMP
Pre-composed Hangul syllables (11,172 characters) - From Row AC to Row D7 of BMP
Hanja characters defined in KS C 5601 (4,888 characters) - From Row 4E to Row 9F and from Row F9 to Row FA of BMP

Korean Solaris Supported Character Sets

The ko.UTF-8 Locale

The `ko.UTF-8` Locale