International Language Environments Guide

Code Conversions

Unicode locale support adds various code conversions among major codesets of many countries through iconv(1), iconv(3C), and sdtconvtool(1).

In the Solaris 9 environment, the utility geniconvtbl enables user-defined code conversions. The user-defined code conversions created with the geniconvtbl utility can be used with both iconv(1) and iconv(3). For more detail on this utility, refer to thegeniconvtbl(1) and geniconvtbl(4) man pages.

The available fromcode and tocode names that can be applied to iconv(1), iconv_open(3C), and sdtconvtool(1) are shown in the tables in Appendix A, iconv Code Conversions. For more details on iconv code conversion, see the iconv(1), iconv_open(3C), iconv (3) , iconv_close(3C ) geniconvtbl( 1 ) geniconvtbl( 4 ) and sdtconvtool(1) man pages. For more information on available code conversions, see the iconv_en_US.UTF-8(5), iconv(5), iconv_ja(5), iconv_ko(5), iconv_zh(5), and iconv_zh_TW(5) man pages. Also see Appendix A, iconv Code Conversions.

Note –

UCS-2, UCS-4, UTF-16 and UTF-32 are all Unicode/ ISO/IEC 10646 representation forms that recognize Byte Order Mark (BOM) characters defined in the Unicode 3.1 and ISO/IEC 10646-1:2000 standards if the character appears at the beginning of the character stream. Other forms, like UCS-2BE, UCS-4BE, UTF-16BE, and UTF-32BE are all fixed-width Unicode/ISO/IEC 10646 representation forms that do not recognize the BOM character and also assume big endian byte ordering. Representation forms like UCS-2LE, UCS-4LE, UTF-16LE, and UTF-32LE, on the other hand, assume little endian byte ordering. They also do not recognize the BOM character.

For associated scripts and languages of ISO8859–* and KO18–*, see http://czyborra.com/charsets/iso8869.html.