iconv_extra - man pages section 7: Standards, Environments, Macros, Character Sets, and Miscellany

Language:

iconv_extra (7)

Name

iconv_extra - codeset conversion for non-Unicode encodings

Description

iconv and cconv support conversions to and from a wide range of codesets.

The lists below provide basic information about encodings mainly for the EMEA regions. For information on Asian encodings, refer to iconv_ja(7), iconv_ko(7), iconv_zh(7), iconv_zh_HK(7), and iconv_zh_TW(7) manual pages. For information on Unicode encodings, refer to the iconv_unicode(7) manual page.

The codeset names shown are in their canonical form directly usable as fromcode or tocode parameters to iconv(1), iconv_open(3C), and cconv_open(3C), with aliases in parentheses where applicable.

Available iconv and cconv conversions in the current system can be obtained by running iconv -l as described in the iconv(1) manual page.

For additional information on the mappings between canonical names and supported aliases with optional variant levels, refer to the alias(5) manual page and also the /usr/lib/iconv/alias file.

646 - ISO/IEC 646:1991 and Variants

The codeset of the "C" locale in Oracle Solaris, the ISO basic Latin alphabet, is referred with the canonical name 646. Common aliases such as US-ASCII and ASCII are also defined.

The following national variants of the 646 codeset are also available:

646 Codeset	National Variant
`646de`	Germany
`646ch`	Switzerland
`646gb (646en)`	United Kingdom
`646fr`	France
`646ca`	Canada
`646fi`	Finland
`646sv`	Sweden
`646it`	Italy
`646dk (646da)`	Denmark
`646es (646sp)`	Spain
`646pt`	Portugal

ISO 8859 Character Sets

ISO 8859 Character Set	Description
ISO8859-1 (Latin1)	For most West European languages, including: Albanian Finnish Italian Catalan French Norwegian Danish German Portuguese Dutch Galician Spanish English Irish Swedish Faeroese Icelandic
ISO8859-2 (Latin2)	For most Latin-written Slavic and Central European languages: Czech Polish Slovak German Rumanian Slovene Hungarian Croatian
ISO8859-3 (Latin3)	Used for Esperanto, Galician, Maltese, and Turkish.
ISO8859-4 (Latin4)	Introduces letters for Estonian, Latvian, and Lithuanian. It is an incomplete predecessor of ISO 8859-10.
ISO8859-5	For languages that use Cyrillic alphabet, such as Belarusian, Bulgarian, Macedonian, Russian, Serbian, and Ukrainian.
ISO8859-6	Latin + Arabic.
ISO8859-7	Latin + Greek. Does not include accents used in polytonic Greek.
ISO8859-8	Latin + Hebrew.
ISO8859-9 (Latin5)	Replaces the rarely needed Icelandic letters in ISO 8859-1 (Latin 1) with the Turkish ones.
ISO8859-10 (Latin6)	Adds the last Inuit (Greenlandic) and Sami (Lappish) letters that were not included in ISO 8859-4 (Latin 4) to complete coverage of the Nordic area.
ISO8859-11	Latin + Thai. ISO/IEC 8859-11:2001 is equivalent to TIS 620-2533 (1990) with the addition of `0xA0` NO-BREAK SPACE.
ISO8859-13 (Latin7)	Includes characters for Baltic languages which were missing from Latin-4 and Latin-6.
ISO8859-14 (Latin8)	Covers Celtic languages such as Gaelic and the Breton language.
ISO8859-15 (Latin9)	Variant of 8859-1 that modifies 8 less used characters and introduces the euro sign.
ISO8859-16 (Latin10)	Supports Albanian, Croatian, English, Finnish, French, German, Hungarian, Irish Gaelic (new orthography), Italian, Latin, Polish, Romanian, and Slovenian. The currency sign is replaced with the euro sign.

IBM EBCDIC Code Pages

EBCDIC (Extended Binary Coded Decimal Interchange Code) is an 8-bit character encoding mainly used in IBM mainframes. The following table outlines the basics on supported IBM EBCDIC-based code pages.

IBM PC and EBCDIC code pages are prefixed with "IBM-" as like IBM-037 in the codeset name.

EBCDIC Code Page	Country/Region
`IBM-037`	Latin-1 character set
`IBM-273`	Austria, Germany
`IBM-277`	Denmark, Norway
`IBM-278`	Finland, Sweden
`IBM-280`	Italy
`IBM-284`	Latin America, Spain
`IBM-285`	Ireland, United Kingdom
`IBM-297`	France
`IBM-420`	Egypt, Iraq, Jordan, Saudi Arabia, Syria
`IBM-424`	Israel
`IBM-500`	Australia, Austria, Belgium, Brazil, Canada, Denmark, Finland, France, Germany, Iceland, Ireland, Italy, Japan, Latin America, Multinational, Netherlands, New Zealand, Norway, Portugal, South Africa, Spain, Sweden, Switzerland, United Kingdom, and United States
`IBM-838`	Thailand
`IBM-875`	Greece
`IBM-933`	Korea
`IBM-935`	Simplified Chinese
`IBM-937`	Traditional Chinese
`IBM-1025`	Belarus, Bosnia-Herzegovina, Bulgaria, Macedonia (FYR), Montenegro, Russia, Serbia, Serbia-Montenegro, and Yugoslavia
`IBM-1026`	Multinational, Turkey
`IBM-1112`	Estonia, Latvia, Lithuania
`IBM-1122`	Estonia
`IBM-1140`	Australia, Brazil, Canada, Multinational, Netherlands, New Zealand, Portugal, South Africa, Taiwan, and United States
`IBM-1141`	Austria, Germany
`IBM-1142`	Denmark, Norway
`IBM-1143`	Finland, Sweden
`IBM-1144`	Italy
`IBM-1145`	Latin America, Spain
`IBM-1146`	Ireland, United Kingdom
`IBM-1147`	France
`IBM-1148`	Australia, Austria, Belgium, Brazil, Canada, Denmark, Finland, France, Germany, Iceland, Ireland, Italy, Japan, Latin America, Multinational, Netherlands, New Zealand, Norway, Portugal, South Africa, Spain, Sweden, Switzerland, United Kingdom, and United States
`IBM-1149`	Iceland

IBM-PC Code Pages

The following table covers the supported IBM-PC (DOS and Windows) code pages.

IBM-PC Code Page	Country/Region
`IBM-850`	Albania, Australia, Austria, Belgium, Bosnia-Herzegovina, Brazil, Bulgaria, Canada, Croatia, Czech Republic, Denmark, Egypt, Finland, France, Germany, Greece, Hungary, Iceland, Iraq, Ireland, Italy, Jordan, Latin America, Multinational, Netherlands, New Zealand, Norway, Poland, Portugal, Romania, Russia, Saudi Arabia, Slovakia, Slovenia, South Africa, Spain, Sweden, Switzerland, Syria, United Kingdom, and United States
`IBM-852`	Albania, Bosnia-Herzegovina, Croatia, Czech Republic, Hungary, Multinational, Poland, Romania, Slovakia, and Slovenia
`IBM-855`	Bosnia-Herzegovina, Bulgaria, Macedonia (FYR), Montenegro, Multinational, Serbia, Serbia-Montenegro, and Yugoslavia
`IBM-856`	Israel
`IBM-857`	Multinational, Turkey
`IBM-862`	Israel
`IBM-864`	Egypt, Iraq, Jordan, Saudi Arabia, and Syria
`IBM-866`	Russia
`IBM-869`	Greece
`IBM-870`	Albania, Bosnia-Herzegovina, Croatia, Czech Republic, Hungary, Multinational, Poland, Romania, Slovakia, and Slovenia
`IBM-871`	Iceland
`IBM-874`	Thailand
`IBM-921`	Estonia, Latvia, Lithuania
`IBM-922`	Estonia

Microsoft Code Pages

The following table covers the supported Microsoft DOS and Windows code pages. Microsoft code pages are prefixed with "CP" as like CP850 in the codeset name.

Code Page	Description
`CP437`	MS-DOS, Latin United States
`CP720`	MS-DOS, Arabic
`CP737`	MS-DOS, Greek
`CP775`	MS-DOS, Baltic
`CP850`	MS-DOS, Multilingual Latin I
`CP852`	MS-DOS, Latin II
`CP855`	MS-DOS, Cyrillic
`CP857`	MS-DOS, Turkish
`CP860`	MS-DOS, Portuguese
`CP861`	MS-DOS, Icelandic
`CP862`	MS-DOS, Hebrew
`CP863`	MS-DOS, French Canada
`CP864`	MS-DOS, Arabic
`CP865`	MS_DOS, Nordic
`CP866`	MS-DOS, Cyrillic (Russian)
`CP869`	MS-DOS, Greek 2
`CP874`	MS-DOS, Thai
`CP949`	Windows, Korean
`CP1250`	Windows, Central Europe
`CP1251`	Windows, Cyrillic
`CP1252`	Windows, Latin
`CP1253`	Windows, Greek
`CP1254`	Windows, Turkish
`CP1255`	Windows, Hebrew
`CP1256`	Windows, Arabic
`CP1257`	Windows, Baltic
`CP1258`	Windows, Vietnam

Other Code Pages

Code Page	Description
`KOI8-R`, `KOI8-U`	8-bit codesets for Russian and Ukrainian Cyrillic
`PTCP154`	Pratype `CP154` for Cyrillic; based on `CP1251` with added Asian Cyrillic symbols
`ALT`	8-bit Alternative PC Cyrillic
`MAC`	8-bit Macintosh Cyrillic
`DHN`	Dom Handlowy Nauki, 8-bit codeset for Polish text
`Mazovia`	8-bit codeset for Polish text.
`VISCII`	Vietnamese Standard Code for Information Interchange is a modification of ASCII for Vietnamese.
`TCVN`	Vietnamese Standard Code for Information Interchange TCVN 5712:1993.
`TIS-620` (`TIS620-2533`, `EUC-TH`)	Thai Industrial Standard 620-2533 is practically identical to the ISO 8859-11 codeset (see above).
`ISCII (ISCII91)`	Indian Script Code for Information Interchange is an ASCII-compatible codeset for Indic scripts.
`ACE` (`IDNA2008-REGIST`)	ASCII Compatible Encoding defined in the RFCs 3490, 3492, and 5890 without allowing unassigned characters; it also uses STD3 ASCII rules. `IDNA2008-REGIST` is an alias to `ACE` utilizing the IDNA2008 terminologies described in RFC 5890.
`ACE-ALLOW-UNASSIGNED` (`AIDNA2008-LOOKUP`)	Same as ACE except that it allows unassigned characters. It's more suitable for query purposes; the ACE is more suitable for storing or giving host or domain names to machines. `IDNA2008-LOOKUP` is an alias for `ACE-ALLOW-UNASSIGNED` utilizing the IDNA2008 terminologies described in RFC 5890.

Files

/usr/lib/iconv/*.so: iconv conversion modules
/usr/lib/iconv/*.bt: cconv code conversion binary tables for iconv(1), cconv(3C) and iconv(3C)
/usr/lib/iconv/geniconvtbl/binarytables/*.bt: geniconvtbl conversion binary tables
/usr/lib/iconv/alias: Alias table file of codeset names

man pages section 7: Standards, Environments, Macros, Character Sets, and Miscellany