International Language Environments Guide

Chapter 4 Supported Asian Locales

The following sections describe the Asian supported locales:

Asian Supported Locales

The following table provides the a summary of Asian supported locales.

Table 4–1 Summary of Asian Locales

Language 

Locale Name 

Description 

Supported Character Set 

Korean 

ko

ko.UTF-8

Korean (EUC) 

Korean (UTF-8) 

KS X 1001 

KS X 1005–1 

 

 

 

 

Simplified Chinese 

zh_CN.EUC

zh_CN.GBK

zh_CN.GB18030

zh_CN.UTF-8

Simplified Chinese (EUC) 

Simplified Chinese (GBK) 

Simplified Chinese (GB18030–2000) 

Simplified Chinese (UTF-8) 

GB 2312-1980 

GBK 

GB18030–2000 

Unicode 3.1 

 

 

 

 

Traditional Chinese 

zh_TW.EUC

zh_TW.UTF-8

zh_TW.BIG5

zh_HK.BIG5HK

zh_HK.UTF—8

Traditional Chinese (EUC)  

Traditional Chinese (UTF-8) 

Traditional Chinese (BIG5) 

Traditional Chinese (BIG5+HKSCS) 

Traditional Chinese (UTF-8) 

CNS 11643 –1992 

Unicode 3.1 

BIG5  

BIG5+HKSCS 

Unicode 3.1 

Japanese 

ja

ja_JP.eucJP

ja_JP.PCK

ja_JP.UTF-8

Japanese (EUC) 

Japanese (EUC) 

Japanese (PCK)  

Japanese (UTF-8) 

JIS [JIS X 0201-1976, JIS X 0208-1990 and JIS X 0212-1990]

JIS [JIS X 0201–1976 and JIS X 0208–1990]

Unicode3.1 

Thai 

th_TH.TIS620

th_TH.UTF-8

th_TH.ISO8859-11

Thai (TIS620.2533) 

Thai (UTF-8) 

Thai(ISO8859-11)

TIS620.2533 

Unicode 3.1 

ISO8859-11

Hindi 

hi_IN.UTF-8

Hindi (UTF-8) 

Unicode 3.1 

Input Method Auxiliary Window Support for Simplified and Traditional Chinese

This window provides a friendly and extensible input method management tool for all Chinese customer., A new input method auxiliary window supports the following new functions and utilities:

For more detailed information, please see the Simplified Chinese User's Guide and the Traditional Chinese User's Guide.

The input method auxiliary windows supports all UTF-8 locales and the following Chinese locales:

Two kinds of input methods are supported:

The interface model for auxiliary window support is shown in the following figure.

Figure 4–1 Interface Model for Auxiliary Window Support

Graphic

Thai Localization

According to the Thai IT Standard, there are three input levels for the Thai character sequence checking method:

  1. Passthrough level, no input check.

  2. Basic input check level.

  3. Strict input check level.

In the Solaris 9 release, the default input check level is still passthrough level. This means no sequence check, which is the same level as in previous Solaris releases. You can use the F2 Function key to switch between the three levels:

passthrough -> basic -> strict -> passthrough

Thai Input Method Auxiliary Window

A Thai input method auxiliary window supports the following new functions and utilities:

Click the input level button on the auxiliary bar to select a specific Thai input level and input check level. Click the keyboard button to display the Thai virtual keyboard. Use the Thai virtual keyboard to input Thai characters.

Simplified Chinese Localization

Simplified Chinese in the Solaris 9 environment provides four locales: zh, zh.GBK, zh_CN.GB18030, and zh.UTF-8. In the zh locale, the EUC scheme is used to encode GB2312–80. The zh.GBK locale supports the GBK codeset, which is a superset of GB2312–80.

The new GB18030–2000 codeset is now supported in the zh_CN.GB18030 locale.

Simplified Chinese is used mostly in the People's Republic of China (PRC) and in Singapore.

The following input methods are supported for the zh locale:

The following input methods are supported for the zh_CN.GB18030 locale:

The following input methods are supported for both the zh.GBK and the zh.UTF-8 locales:

The auxiliary window for Chinese input methods provides a friendly and extensible input method user interface for all Chinese locales. See Input Method Auxiliary Window Support for Simplified and Traditional Chinese.

For more detailed information about auxiliary windows for Chinese input methods, please see Simplified Chinese User's Guide and Traditional Chinese User's Guide.

The following table shows the TrueType fonts for the zh locale.

Table 4–2 TrueType Fonts for the zh_CN.EUC Locale

Full Family Name 

Subfamily 

Format 

Vendor 

Encoding 

 Fangsong R TrueType Hanyi GB2312.1980
 Hei R TrueType Monotype GB2312.1980
 Kai R TrueType Monotype GB2312.1980
 Song R TrueType Monotype GB2312.1980

The following table shows the bitmap fonts for the zh locale.

Table 4–3 Bitmap Fonts for the zh_CN.EUC Locale

Full Family Name 

Subfamily 

Format 

Encoding 

 Song B PCF (14,16) GB2312.1980
 Song R PCF (12,14,16,20,24) GB2312.1980

The following table shows the TrueType fonts for the zh_CN.GBK locale.

Table 4–4 TrueType Fonts for the zh_CN.GBK Locale

Full Family Name 

Subfamily 

Format 

Vendor 

Encoding 

 Fangsong R TrueType Zhongyi GBK
 Hei R TrueType Zhongyi GBK
 Kai R TrueType Zhongyi GBK
 Song R TrueType Zhongyi GBK

The following table shows the bitmap fonts for the zh_CN.GBK locale.

Table 4–5 Bitmap Fonts for the zh_CN.GBK Locale

Full Family Name 

Subfamily 

Format 

Encoding 

 Song R PCF (12,14,16,20,24) GBK

The following table shows the TrueType fonts for the zh_CN.GB18030 locale.

Table 4–6 TrueType Fonts for the zh_CN.GB18030 Locale

Family Name 

Subfamily 

Format 

Vendor 

Encoding 

FangSong 

TrueType 

FangZheng 

GB18030–2000 

Song 

TrueType 

FangZheng 

GB18030–2000 

Hei 

TrueType 

FangZheng 

GB18030–2000 

Kai 

TrueType 

FangZheng 

GB18030–2000 

The following table shows bitmap fonts for the zh_CN.GB18030 locale.

Table 4–7 Bitmap Fonts for the zh_CN.GB18030 Locale

Family Name 

Subfamily 

Format 

Encoding 

Song 

PCF(12,14,16,20,24) 

GB18030–2000 

The following table shows the supported codeset conversions for Simplified Chinese.

Table 4–8 Codeset Conversions for Simplified Chinese

Code 

Symbol 

Target Code 

Symbol 

GB2312-80

zh_CN.euc

ISO 2022-7

zh_CN.iso2022-7

GB2312-80

zh_CN.euc

ISO 2022-CN

zh_CN.iso2022-CN

GB2312-80

zh_CN.euc

UTF-8

UTF-8

GB18030

zh_CN.gb18030

UTF-8

UTF-8

HZ-GB-2312

HZ-GB-2312

GB2312–80

zh_CN.euc

HZ-GB-2312

HZ-GB-2312

GBK

zh_CN.gbk

HZ-GB-2312

HZ-GB-2312

UTF-8

UTF-8

ISO2022-7

zh_CN.iso2022-7

GB2312-80

zh_CN.euc

ISO2022-CN

zh_CN.iso2022-CN

GB2312-80

zh_CN.euc

ISO2022-CN

zh_CN.iso2022-CN

UTF-8

UTF-8

ISO2022-CN

zh_CN.iso2022-CN

zh.GBK

zh_CN.gbk

UTF-8

UTF-8

GB2312-80

zh_CN.euc

UTF-8

UTF-8

GB18030

zh_CN.gb18030

UTF-8

UTF-8

ISO2022-CN

zh_CN.iso2022-CN

UTF-8

UTF-8

zh.GBK

zh_CN.gbk

zh.GBK

zh_CN.gbk

ISO2022-CN

zh_CN.iso2022-CN

zh.GBK

zh_CN.gbk

UTF-8

UTF-8

Traditional Chinese Localization

Traditional Chinese in the Solaris 9 product provides five locales:

Traditional Chinese is used mostly in Taiwan and Hong Kong, China. The following input methods are supported in the zh_TW.EUC, zh_TW.BIG5, and zh_TW.UTF-8 locales:

The following input methods are supported in the zh_HK.BIG5HK and zh_HK.UTF-8 locales.

The following table shows the Traditional Chinese TrueType Fonts for the zh_TW locales.

Table 4–9 Traditional Chinese TrueType Fonts for the zh_TW Locales

Full Family Name 

Subfamily 

Format 

Vendor 

Encoding 

 Hei R TrueType Hanyi CNS11643.1992
 Kai R TrueType Hanyi CNS11643.1992
 Ming R TrueType Hanyi CNS11643.1992

The following table shows the Traditional Chinese bitmap fonts for the zh_TW locales.

Table 4–10 Traditional Chinese Bitmap Fonts for the zh_TW Locales

Full Family Name 

Subfamily 

Format 

Encoding 

 Ming R PCF (12,14,16,20,24) CNS11643.1992

The following table shows the TrueType fonts for the zh_HK.BIG5HK locale.

Table 4–11 TrueType Fonts for the zh_HK.BIG5HK Locale

Family Name 

Subfamily 

Format 

Vendor 

Encoding 

Ming 

TrueType 

FangZheng 

Big5–HKSCS 

Hei 

TrueType 

FangZheng 

Big5–HKSCS 

Kai 

TrueType 

FangZheng 

Big5–HKSCS 

The following table shows the bitmap fonts for the zh_HK.BIG5HK locale.

Table 4–12 Bitmap Fonts for the zh_HK.BIG5HK Locale

Family Name 

Subfamily 

Format 

Encoding 

Ming 

PCF(12,14,16,20,24) 

Big5–HKSCS 

The following table shows the supported codeset conversions for Traditional Chinese.

Table 4–13 Codeset Conversions for Traditional Chinese

Code 

Symbol 

Target Code 

Symbol 

BIG5

zh_TW-big5

CNS 11643

zh_TW-euc

BIG5

zh_TW-big5

ISO2022–CN

zh_TW-iso2022–CN-EXT

BIG5

zh_TW-big5

UTF-8

UTF-8

BIG5+HKSCS

zh_HK.big5hk

UTF-8

UTF-8

CNS 11643

zh_TW-euc

BIG5

zh_TW-big5

CNS 11643

zh_TW-euc

UTF-8

UTF-8

CNS 11643

zh_TW-euc

ISO2022-7

zh_TW-iso2022-7

CNS 11643

zh_TW-euc

ISO2022-CN-EXT

zh_TW-iso2022-CN-EXT

CNS 11643

zh_TW-euc

UTF-8

UTF-8

ISO2022-7

zh_TW-iso2022-7

CNS 11643

zh_TW-euc

ISO2022-7

zh_TW-iso2022-7

UTF-8

UTF-8

ISO2022-CN

zh_TW-iso2022-CN-EXT

BIG5

zh_TW-big5

ISO2022-CN-EXT

zh_TW-iso2022-CN-EXT

CNS 11643

zh_TW-euc

UTF-8

UTF-8

BIG5

zh_TW-big5

UTF-8

UTF-8

BIG5+HKSCS

zh_HK.big5hk

UTF-8

UTF-8

CNS 11643

zh_TW-euc

UTF-8

UTF-8

ISO 2022-7

zh_TW-iso2022-7

Japanese Localization

This section describes Japanese locale-specific information.

Japanese Locales

Four Japanese locales, which support different character encodings, are available in the Solaris 9 environment. The ja and ja_JP.eucJP locales are based on the Japanese EUC. The ja_JP.eucJP locale conforms to the UI-OSF Japanese Environment Implementation Agreement Version 1.1 and the ja locale conforms to the traditional specification from earlier Solaris releases. The ja_JP.PCK locale is based on PC-Kanji code (known as Shift_JIS) and the ja_JP.UTF-8 is based on UTF-8.

See the eucJP(5) man page for a map between Japanese EUC and the character set. See the PCK(5) man page for the map between PC-Kanji code and the character set.

Japanese Character Sets

The supported Japanese character sets are:

JIS X 0212–1990 is not supported in the ja_JP.PCK locale. JIS X 0213–2000 is supported in the ja_JP.UTF-8 locale only. Not all characters defined in the JIS X 0213–2000 are available. Only those characters defined in the Unicode 3.1 character set are available.

Vendor-defined characters (VDC) and user-defined characters (UDC) are also supported. VDCs occupy unused (reserved) code points of JIS X 0208–1990 or JIS X 0212–1990. UDCs occupy the same code points as VDCs, except those code points allocated for VDCs.

Japanese Fonts

Three Japanese font formats are supported: bitmap, TrueType and Type1. The Japanese Type1 font includes only JIS X 0212 for printing. The Type1 font is also used by UDC.

Japanese bitmap fonts are described in the following table.

Table 4–14 Japanese Bitmap Fonts

Full Family Name 

Subfamily 

Format 

Vendor 

Encoding 

sun gothic

R, B 

PCF(12,14,16,20,24) 

 

JIS X 0208–1983, 

JIS X 0201–1976 

sun minchou

PCF(12,14,16,20,24) 

 

JIS X 0208–1983, 

JIS X 0201–1976 

ricoh hg gothic b

PCF(10,12,14,16,18,20,24) 

RICOH 

JIS X 0208–1983, JIS X 0201–1976 

ricoh hg mincho l

PCF(10,12,14,16,18,20,24) 

RICOH 

JIS X 0208–1983, JIS X 0201–1976 

ricoh gothic

PCF(10,12,14,16,18,20,24) 

RICOH 

JIS X 0212–1990, JIS X 0213–2000 

ricoh mincho

PCF(10,12,14,16,18,20,24) 

RICOH 

JIS X 0212–1990, JIS X 0213–2000 

ricoh heiseimin

PCF(12,14,16,18,20,24) 

RICOH 

JIS X 0212–1990 

Japanese TrueType fonts are described in the following table.

Table 4–15 Japanese TrueType Fonts

Full Family Name 

Subfamily 

Format 

Vendor 

Encoding 

ricoh hg gothic b

Fixed 

TrueType 

RICOH 

JIS X 0208–1983, JIS X 0201–1976 

ricoh hg mincho l

Fixed 

TrueType 

RICOH 

JIS X 0208–.1983, JIS X 0201–1976 

ricoh gothic

Fixed, Proportional 

TrueType 

RICOH 

JIS X 0201–176, JIS X 0208–1983, JIS X 0213–2000 

ricoh mincho

Fixed, Proportional 

TrueType 

RICOH 

JIS X 0201–1976, JIS X 0208–1983, JIS X 0213–2000 

ricoh heiseimin

Fixed 

TrueType 

RICOH 

JIS X 0212–1990 

Japanese Input Systems

ATOK12 is the default Japanese input system in the Solaris 9 environment. It is available for all Japanese locales and all UTF-8 locales when the Japanese locale is installed. The Wnn6 Japanese input system is also available for all Japanese locales. You can switch input systems from the Workspace menu. For Japanese Solaris 1.x BCP support, the kkcv Japanese input system is available.

The following example describes how you would input Japanese input using ATOK12.

  1. Turn conversion mode on by pressing Control + spacebar.

  2. Type Kana character text (for example kanjihenkan).

  3. Convert to kanji character by pressing the spacebar.

    To display other kanji characters, press the space bar to display the conversion candidate table. Type the number you want to select.

  4. To commit the entire text to kanji character text, press return.

    Press the down arrow key to commit only selected characters.

  5. Turn conversion mode off by pressing Control + spacebar.

Terminal Setting for Japanese Terminals

Using Japanese locales on a character-based terminal (TTY) requires that you use terminal settings to make line editing work correctly.

Japanese iconv Module

Several Japanese codeset conversions are supported with iconv(1) and iconv(3). See the iconv_ja(5) man page for details.

User-Defined Character Support

The user-defined character utility sdtudctool handles both outline (Type1) and bitmap (PCF) fonts. Some utilities are also available to migrate the UDC fonts that were created by old utilities in prior releases, such as fontedit, type3creator, and fontmanager.

Differences Between Partial and Full Locales

The following components are only available in the Japanese full locale environment with the Language CD:

Korean Localization

In December 1995, the Korean government announced a standard Korean codeset, KS X 1005–1, which is based on ISO 10646-1/Unicode 2.0.

The ISO-10646 character set uses two universal character sets:

The ISO-10646 character set cannot be used directly on IBM PC-based operating systems. For example, the kernel and many other modules of the Solaris operating environment interpret certain byte values as control instructions, such as a null character (0x00) in any string. The ISO-10646 character set can be encoded with any bit combinations in the first or subsequent bytes. The ISO-10646 characters cannot be freely transmitted through the Solaris system with these limitations.

In order to establish a migration path, the ISO-10646 character set defines the UCS Transformation Format (UTF), which recodes the ISO-10646 characters without using C0 controls (0x00..0x1F), C1 controls (0x80..0x9F), space (0x20), and DEL (0x7F).

The ko.UTF-8 is a Solaris locale to support KS X 1005–1, the Korean standard codeset. This locale supports all characters in the previous KS X 1005 and all 11,172 Korean characters. Korean UTF-8 supports the Korean language-related ISO-10646 characters and fonts. Because ISO-10646 covers all characters in the world, all of the various input methods and fonts are supplied so that you can input and output any character in any language. Before Universal UTF/UCS becomes available, Korean UTF-8 supports the ISO-10646 code subset that is related to Korean characters as well as all other characters in the previous Korean standard codeset, and extended ASCII.

In the ko locale, the EUC scheme is used to encode KS X 1001. The ko.UTF-8 locale supports the KS X 1005–1/Unicode 2.0 codeset, which is a superset of KS X 1001. These two locales look the same to the end user, but the internal character encoding is different. The Korean Solaris product supports the following input methods:

For the ko locale:

For the ko.UTF-8 locale:

The following table shows the Korean bitmap fonts for the ko locale.

Table 4–16 Solaris 9 Korean Bitmap Fonts for the ko Locale

Full Family Name 

Subfamily 

Format 

Encoding 

 Gothic R/B PCF (12,14,16,18,20,24) KS X 1001
 Graphic R/B PCF (12,14,16,18,20,24) KS X 1001
 Haeso R/B PCF (12,14,16,18,20,24) KS X 1001
 Kodig R/B PCF (12,14,16,18,20,24) KS X 1001
 Myeongijo R/B PCF (12,14,16,18,20,24) KS X 1001
 Pilki R/B PCF (12,14,16,18,20,24) KS X 1001
 Round gothic R/B PCF (12,14,16,18,20,24) KS X 1001

The following table shows the Korean bitmap fonts for the ko.UTF-8 locale.

Table 4–17 Solaris 9 Korean Bitmap Fonts for the ko.UTF-8 Locale

Full Family Name 

Subfamily 

Format 

Encoding 

 Gothic R/B PCF (12,14,16,18,20,24)KS X 1001 (Johap)
 Graphic R/B PCF (12,14,16,18,20,24)KS X 1001 (Johap)
 Haeso R/B PCF (12,14,16,18,20,24)KS X 1001 (Johap)
 Kodig R/B PCF (12,14,16,18,20,24)KS X 1001 (Johap)
 Myeongijo R/B PCF (12,14,16,18,20,24)KS X 1001 (Johap)
 Pilki R/B PCF (12,14,16,18,20,24)KS X 1001 (Johap)

The following table shows the Korean TrueType Fonts for the ko/ko.UTF-8 locales.

Table 4–18 Solaris 9 Korean TrueType Fonts for the ko/ko.UTF-8 Locales

Full Family Name 

Subfamily 

Format 

Vendor 

Encoding 

Kodig/Gothic 

TrueType 

Hanyang 

Unicode 

Myeongijo 

TrueType 

Hanyang 

Unicode 

Haeso 

TrueType 

Hanyang 

Unicode 

Round gothic 

TrueType 

Hanyang 

Unicode 

The following table shows the Korean iconv.

Table 4–19 Korean iconv

Code 

Symbol 

Target Code 

Symbol 

IBM CP933cp933

UTF-8 (Unicode 2.0)

ko_KR-UTF-8
ISO646646

KS X 1001

5601
ISO2022-KRiso2022-7

KS X 1001

ko_KR-euc
ISO2022-KRiso2022-7

UTF-8 (Unicode 2.0)

ko_KR-UTF-8
KS X 10015601

UTF-8

UTF-8
KS X 1001EUC-KR

UTF-8

UTF-8
KS X 1001KSC5601

UTF-8

UTF-8
KS X 1001ko_KR-euc

UTF-8 (Unicode 2.0)

ko_KR-UTF-8
KS X 1001ko_KR-euc

ISO2022-KR

ko_KR-iso2022-7
KS X 1001ko_KR-euc

KS X 1001

ko_KR-johap
KS X 1001ko_KR-euc

KS X 1001

ko_KR-johap92
KS X 1001ko_KR-euc

KS X 1001

ko_KR-nbyte
KS X 1001ko-KR-nbyte

KS X 1001

ko_KR-euc
KS X 1001ko-KR-johap

UTF-8 (Unicode 2.0)

ko_KR-UTF-8
KS X 1001ko-KR-johap

KS X 1001

ko_KR-euc
KS X 1001ko-KR-johap92

UTF-8 (Unicode 2.0)

ko_KR-UTF-8
KS X 1001ko-KR-johap92

KS X 1001

ko_KR-euc
UTF-8UTF-8

KS X 1001

5601
UTF-8UTF-8

KS X 1001

EUC-KR
UTF-8UTF-8

KS X 1001

KSC5601
UTF-8ko-KR-UTF-8

IBM CP 933

cp 933
UTF-8ko-KR-UTF-8

KS X 1001

ko_KR-euc
UTF-8ko-KR-UTF-8

ISO2022-KR

ko_KR-iso2022-7
UTF-8ko-KR-UTF-8

KS X 1001

ko_KR-johap
UTF-8ko-KR-UTF-8

KS X 1001

ko_KR-johap92