Unicode Support in the Solaris Operating Environment

Appendix A Codeset Conversions

A.1 Codeset Conversions

The following table provides a detailed listing of the supported code conversions.


Note -

Unicode* includes all of the following codesets: UTF-8, UCS-2, UCS-2BE, UCS-2LE, UCS-4, UCS-4BE, UCS-4LE, UTF-16, UTF-16BE, UTF-16LE.

ISO 8859 codesets can also be referenced without the ISO prefix; for example, ISO 8859-1 = 8859-1.


Table A-1 Supported code conversions

Code 

Code 

Description 

Unicode* 

ISO 646 

Unicode* <--> ISO 646 (ASCII)  

Unicode* 

ISO 8859-1 

Unicode* <--> ISO 8859-1 (Latin-1) 

Unicode* 

ISO 8859-2 

Unicode* <--> ISO 8859-2 (Latin-2) 

Unicode* 

ISO 8859-3 

Unicode* <--> ISO 8859-3 (Latin-3) 

Unicode* 

ISO 8859-4 

Unicode* <--> ISO 8859-4 (Latin-4) 

Unicode* 

ISO 8859-5  

Unicode* <--> ISO 8859-5 (Cyrillic) 

Unicode* 

ISO 8859-6  

Unicode* <--> ISO 8859-6 (Arabic) 

Unicode* 

ISO 8859-7  

Unicode* <--> ISO 8859-7 (Greek) 

Unicode* 

ISO 8859-8  

Unicode* <--> ISO 8859-8 (Hebrew) 

Unicode* 

ISO 8859-9 

Unicode* <--> ISO 8859-9 (Latin-5) 

Unicode* 

ISO 8859-10 

Unicode* <--> ISO 8859-10 (Latin-6) 

Unicode* 

ISO 8859-13 

Unicode* <--> ISO 8859-13 

Unicode* 

ISO 8859-14 

Unicode* <--> ISO 8859-14 

Unicode* 

ISO 8859-15 

Unicode* <--> ISO 8859-15 

Unicode* 

KOI8-R, KO18-U, koi8-r, koi8-u 

Unicode* <--> KOI8-R, KO18-U, koi8-r, koi8-u (Cyrillic) 

UTF-7 

UCS-2, UCS-4, UTF-8 

UTF-7 <--> UCS-2, UCS-4, UTF-8 

UTF-8 

UCS-2, UCS-4, UTF-16 

UTF-8 <--> UCS-2, UCS-4, UTF-16 

UTF-8 

UCS-2BE, UCS-2LE, UCS-4BE, UCS-4LE, UTF-16BE, UTF-16LE 

UTF-8 <--> UCS-2BE, UCS-2LE, UCS-4BE, UCS-4LE, UTF-16BE, UTF-16LE 

UCS-4, UCS-4BE, UCS-4LE 

UCS-2, UCS-2BE, UCS-2LE, UTF-16, UTF-16BE, UTF-16LE 

UCS-4, UCS-4BE, UCS-4LE <--> UCS-2, UCS-2BE, UCS-2LE, UTF-16, UTF-16BE, UTF-16LE 

UTF-8 

UTF-EBCDIC 

UTF-8 <--> UTF-EBCDIC 

UTF-8 

IBM-037, -273, -277, -278, -280 -284, -285, -297, -420 -424, -500, -850, -852 -855, -856, -857, -862 -864, -866, -869, -870 -875, -880, -921, -922 -1025, -1026, -1046, -1112, -1122 

UTF-8 <--> various IBM code pages (PC and EBCDIC) 

UTF-8 

CP850, CP852, CP855, CP857, CP862, CP864, CP866, CP869, CP874, CP1250, CP1251, CP1252, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, CP1258 

UTF-8 <--> various Microsoft code pages  

UTF-8 

eucJP 

UTF-8 <--> Japanese EUC (JIS X0201-1976, JIS X0208-1983 and JIS X0212-1990) 

UTF-8 

PCK 

UTF-8 <--> Japanese PC Kanji (a.k.a. SJIS) 

UTF-8 

ISO-2022-JP 

UTF-8 <--> Japanese MIME charset 

UTF-8-Java 

eucJP 

UTF-8-Java to Japanese EUC (JIS X0201-1976, JIS X0208-1983 and JIS X0212-1990) 

UTF-8-Java 

PCK 

UTF-8-Java to Japanese PC Kanji (a.k.a. SJIS) 

UTF-8-Java 

ISO-2022-JP.RFC1468 

UTF-8-Java to Japanese MIME charset (one-way conversion) 

UTF-8 

ko_KR-euc 

UTF-8 <--> Korean EUC (KS C 5636 and KS C 5601-1987) 

UTF-8 

ko_KR-johap 

UTF-8 <--> Korean Johap (of KS C 5601-1987) 

UTF-8 

ko_KR-johap92 

UTF-8 <--> Korean Johap (of KS C 5601-1992) 

UTF-8 

ko_KR-iso2022-7 

UTF-8 <--> Korean MIME charset (ISO-2022-KR) 

UTF-8 

ko_KR-cp933 

UTF-8 <--> IBM MBCS CP933 ko_KR-euc 

UTF-8 

gb2312 

UTF-8 <--> Simplified Chinese EUC (GB 1988-1980 and GB 2312-1980) 

UTF-8 

iso2022 

UTF-8 <--> Simplified Chinese MIME charset (ISO-2022-CN) 

UTF-8 

GBK 

UTF-8 <--> Simplified Chinese GBK 

UTF-8 

zh_TW-euc 

UTF-8 <--> Traditional Chinese EUC (CNS 11643-1992) 

UTF-8 

zh_TW-big5 

UTF-8 <--> Traditional Chinese Big5 

UTF-8 

zh_TW-iso2022-7 

UTF-8 <--> Traditional Chinese MIME charset (ISO-2022-TW) 

UTF-8 

zh_TW-cp937 

UTF-8 <--> IBM MBCS CP937