International Language Environments Guide

Korean Localization

In December 1995, the Korean government announced a standard Korean codeset, KS X 1005–1, which is based on ISO 10646-1/Unicode 2.0.

The ISO-10646 character set uses two universal character sets:

UCS-2. Universal Character Set (two-byte form)
UCS-4. Universal Character Set (four-byte form).

The ISO-10646 character set cannot be used directly on IBM PC-based operating systems. For example, the kernel and many other modules of the Solaris operating environment interpret certain byte values as control instructions, such as a null character (0x00) in any string. The ISO-10646 character set can be encoded with any bit combinations in the first or subsequent bytes. The ISO-10646 characters cannot be freely transmitted through the Solaris system with these limitations.

In order to establish a migration path, the ISO-10646 character set defines the UCS Transformation Format (UTF), which recodes the ISO-10646 characters without using C0 controls (0x00..0x1F), C1 controls (0x80..0x9F), space (0x20), and DEL (0x7F).

The ko.UTF-8 is a Solaris locale to support KS X 1005–1, the Korean standard codeset. This locale supports all characters in the previous KS X 1005 and all 11,172 Korean characters. Korean UTF-8 supports the Korean language-related ISO-10646 characters and fonts. Because ISO-10646 covers all characters in the world, all of the various input methods and fonts are supplied so that you can input and output any character in any language. Before Universal UTF/UCS becomes available, Korean UTF-8 supports the ISO-10646 code subset that is related to Korean characters as well as all other characters in the previous Korean standard codeset, and extended ASCII.

In the ko locale, the EUC scheme is used to encode KS X 1001. The ko.UTF-8 locale supports the KS X 1005–1/Unicode 2.0 codeset, which is a superset of KS X 1001. These two locales look the same to the end user, but the internal character encoding is different. The Korean Solaris product supports the following input methods:

For the ko locale:

Hangul 2–BeolSik (one set of consonants and one set of vowels)
Hangul-Hanja conversion
Special character
Hexadecimal code

For the ko.UTF-8 locale:

Hangul 2–BeolSik (one set of consonants and one set of vowels)
Hangul-Hanja conversion
Special character
Hexadecimal code

The following table shows the Korean bitmap fonts for the ko locale.

Table 4–16 Solaris 9 Korean Bitmap Fonts for the ko Locale


Full Family Name	Subfamily	Format	Encoding
Gothic	R/B	PCF (12,14,16,18,20,24)	KS X 1001
Graphic	R/B	PCF (12,14,16,18,20,24)	KS X 1001
Haeso	R/B	PCF (12,14,16,18,20,24)	KS X 1001
Kodig	R/B	PCF (12,14,16,18,20,24)	KS X 1001
Myeongijo	R/B	PCF (12,14,16,18,20,24)	KS X 1001
Pilki	R/B	PCF (12,14,16,18,20,24)	KS X 1001
Round gothic	R/B	PCF (12,14,16,18,20,24)	KS X 1001

The following table shows the Korean bitmap fonts for the ko.UTF-8 locale.

Table 4–17 Solaris 9 Korean Bitmap Fonts for the ko.UTF-8 Locale


Full Family Name	Subfamily	Format	Encoding
Gothic	R/B	PCF (12,14,16,18,20,24)	`KS X 1001 (Johap)`
Graphic	R/B	PCF (12,14,16,18,20,24)	`KS X 1001 (Johap)`
Haeso	R/B	PCF (12,14,16,18,20,24)	`KS X 1001 (Johap)`
Kodig	R/B	PCF (12,14,16,18,20,24)	`KS X 1001 (Johap)`
Myeongijo	R/B	PCF (12,14,16,18,20,24)	`KS X 1001 (Johap)`
Pilki	R/B	PCF (12,14,16,18,20,24)	`KS X 1001 (Johap)`

The following table shows the Korean TrueType Fonts for the ko/ko.UTF-8 locales.

Table 4–18 Solaris 9 Korean TrueType Fonts for the ko/ko.UTF-8 Locales


Full Family Name	Subfamily	Format	Vendor	Encoding
Kodig/Gothic	R	TrueType	Hanyang	Unicode
Myeongijo	R	TrueType	Hanyang	Unicode
Haeso	R	TrueType	Hanyang	Unicode
Round gothic	R	TrueType	Hanyang	Unicode

The following table shows the Korean iconv.

Table 4–19 Korean iconv


Code	Symbol	Target Code	Symbol
`IBM CP933`	`cp933`	`UTF-8` (Unicode 2.0)	`ko_KR-UTF-8`
`ISO646`	`646`	`KS X 1001`	`5601`
`ISO2022-KR`	`iso2022-7`	`KS X 1001`	`ko_KR-euc`
`ISO2022-KR`	`iso2022-7`	`UTF-8` (Unicode 2.0)	`ko_KR-UTF-8`
`KS X 1001`	`5601`	`UTF-8`	`UTF-8`
`KS X 1001`	`EUC-KR`	`UTF-8`	`UTF-8`
`KS X 1001`	`KSC5601`	`UTF-8`	`UTF-8`
`KS X 1001`	`ko_KR-euc`	`UTF-8` (Unicode 2.0)	`ko_KR-UTF-8`
`KS X 1001`	`ko_KR-euc`	`ISO2022-KR`	`ko_KR-iso2022-7`
`KS X 1001`	`ko_KR-euc`	`KS X 1001`	`ko_KR-johap`
`KS X 1001`	`ko_KR-euc`	`KS X 1001`	`ko_KR-johap92`
`KS X 1001`	`ko_KR-euc`	`KS X 1001`	`ko_KR-nbyte`
`KS X 1001`	`ko-KR-nbyte`	`KS X 1001`	`ko_KR-euc`
`KS X 1001`	`ko-KR-johap`	`UTF-8` (Unicode 2.0)	`ko_KR-UTF-8`
`KS X 1001`	`ko-KR-johap`	`KS X 1001`	`ko_KR-euc`
`KS X 1001`	`ko-KR-johap92`	`UTF-8` (Unicode 2.0)	`ko_KR-UTF-8`
`KS X 1001`	`ko-KR-johap92`	`KS X 1001`	`ko_KR-euc`
`UTF-8`	`UTF-8`	`KS X 1001`	`5601`
`UTF-8`	`UTF-8`	`KS X 1001`	`EUC-KR`
`UTF-8`	`UTF-8`	`KS X 1001`	`KSC5601`
`UTF-8`	`ko-KR-UTF-8`	`IBM CP 933`	`cp 933`
`UTF-8`	`ko-KR-UTF-8`	`KS X 1001`	`ko_KR-euc`
`UTF-8`	`ko-KR-UTF-8`	`ISO2022-KR`	`ko_KR-iso2022-7`
`UTF-8`	`ko-KR-UTF-8`	`KS X 1001`	`ko_KR-johap`
`UTF-8`	`ko-KR-UTF-8`	`KS X 1001`	`ko_KR-johap92`