Go to main content

man pages section 7: Standards, Environments, Macros, Character Sets, and Miscellany

Exit Print View

Updated: Wednesday, July 27, 2022
 
 

iconv_extra (7)

Name

iconv_extra - codeset conversion for non-Unicode encodings

Description

iconv and cconv support conversions to and from a wide range of codesets.

The lists below provide basic information about encodings mainly for the EMEA regions. For information on Asian encodings, refer to iconv_ja(7), iconv_ko(7), iconv_zh(7), iconv_zh_HK(7), and iconv_zh_TW(7) manual pages. For information on Unicode encodings, refer to the iconv_unicode(7) manual page.

The codeset names shown are in their canonical form directly usable as fromcode or tocode parameters to iconv(1), iconv_open(3C), and cconv_open(3C), with aliases in parentheses where applicable.

Available iconv and cconv conversions in the current system can be obtained by running iconv -l as described in the iconv(1) manual page.

For additional information on the mappings between canonical names and supported aliases with optional variant levels, refer to the alias(5) manual page and also the /usr/lib/iconv/alias file.

646 - ISO/IEC 646:1991 and Variants

The codeset of the "C" locale in Oracle Solaris, the ISO basic Latin alphabet, is referred with the canonical name 646. Common aliases such as US-ASCII and ASCII are also defined.

The following national variants of the 646 codeset are also available:

646 Codeset
National Variant
646de
Germany
646ch
Switzerland
646gb (646en)
United Kingdom
646fr
France
646ca
Canada
646fi
Finland
646sv
Sweden
646it
Italy
646dk (646da)
Denmark
646es (646sp)
Spain
646pt
Portugal

ISO 8859 Character Sets

ISO 8859 Character Set
Description
ISO8859-1 (Latin1)
For most West European languages, including:
Albanian   Finnish    Italian
Catalan    French     Norwegian
Danish     German     Portuguese
Dutch      Galician   Spanish
English    Irish      Swedish
Faeroese   Icelandic
ISO8859-2 (Latin2)
For most Latin-written Slavic and Central European languages:
Czech      Polish     Slovak
German     Rumanian   Slovene
Hungarian  Croatian
ISO8859-3 (Latin3)
Used for Esperanto, Galician, Maltese, and Turkish.
ISO8859-4 (Latin4)
Introduces letters for Estonian, Latvian, and Lithuanian. It is an incomplete predecessor of ISO 8859-10.
ISO8859-5
For languages that use Cyrillic alphabet, such as Belarusian, Bulgarian, Macedonian, Russian, Serbian, and Ukrainian.
ISO8859-6
Latin + Arabic.
ISO8859-7
Latin + Greek. Does not include accents used in polytonic Greek.
ISO8859-8
Latin + Hebrew.
ISO8859-9 (Latin5)
Replaces the rarely needed Icelandic letters in ISO 8859-1 (Latin 1) with the Turkish ones.
ISO8859-10 (Latin6)
Adds the last Inuit (Greenlandic) and Sami (Lappish) letters that were not included in ISO 8859-4 (Latin 4) to complete coverage of the Nordic area.
ISO8859-11
Latin + Thai. ISO/IEC 8859-11:2001 is equivalent to TIS 620-2533 (1990) with the addition of 0xA0 NO-BREAK SPACE.
ISO8859-13 (Latin7)
Includes characters for Baltic languages which were missing from Latin-4 and Latin-6.
ISO8859-14 (Latin8)
Covers Celtic languages such as Gaelic and the Breton language.
ISO8859-15 (Latin9)
Variant of 8859-1 that modifies 8 less used characters and introduces the euro sign.
ISO8859-16 (Latin10)
Supports Albanian, Croatian, English, Finnish, French, German, Hungarian, Irish Gaelic (new orthography), Italian, Latin, Polish, Romanian, and Slovenian. The currency sign is replaced with the euro sign.

IBM EBCDIC Code Pages

EBCDIC (Extended Binary Coded Decimal Interchange Code) is an 8-bit character encoding mainly used in IBM mainframes. The following table outlines the basics on supported IBM EBCDIC-based code pages.

IBM PC and EBCDIC code pages are prefixed with "IBM-" as like IBM-037 in the codeset name.

EBCDIC Code Page
Country/Region
IBM-037
Latin-1 character set
IBM-273
Austria, Germany
IBM-277
Denmark, Norway
IBM-278
Finland, Sweden
IBM-280
Italy
IBM-284
Latin America, Spain
IBM-285
Ireland, United Kingdom
IBM-297
France
IBM-420
Egypt, Iraq, Jordan, Saudi Arabia, Syria
IBM-424
Israel
IBM-500
Australia, Austria, Belgium, Brazil, Canada, Denmark, Finland, France, Germany, Iceland, Ireland, Italy, Japan, Latin America, Multinational, Netherlands, New Zealand, Norway, Portugal, South Africa, Spain, Sweden, Switzerland, United Kingdom, and United States
IBM-838
Thailand
IBM-875
Greece
IBM-933
Korea
IBM-935
Simplified Chinese
IBM-937
Traditional Chinese
IBM-1025
Belarus, Bosnia-Herzegovina, Bulgaria, Macedonia (FYR), Montenegro, Russia, Serbia, Serbia-Montenegro, and Yugoslavia
IBM-1026
Multinational, Turkey
IBM-1112
Estonia, Latvia, Lithuania
IBM-1122
Estonia
IBM-1140
Australia, Brazil, Canada, Multinational, Netherlands, New Zealand, Portugal, South Africa, Taiwan, and United States
IBM-1141
Austria, Germany
IBM-1142
Denmark, Norway
IBM-1143
Finland, Sweden
IBM-1144
Italy
IBM-1145
Latin America, Spain
IBM-1146
Ireland, United Kingdom
IBM-1147
France
IBM-1148
Australia, Austria, Belgium, Brazil, Canada, Denmark, Finland, France, Germany, Iceland, Ireland, Italy, Japan, Latin America, Multinational, Netherlands, New Zealand, Norway, Portugal, South Africa, Spain, Sweden, Switzerland, United Kingdom, and United States
IBM-1149
Iceland

IBM-PC Code Pages

The following table covers the supported IBM-PC (DOS and Windows) code pages.

IBM-PC Code Page
Country/Region
IBM-850
Albania, Australia, Austria, Belgium, Bosnia-Herzegovina, Brazil, Bulgaria, Canada, Croatia, Czech Republic, Denmark, Egypt, Finland, France, Germany, Greece, Hungary, Iceland, Iraq, Ireland, Italy, Jordan, Latin America, Multinational, Netherlands, New Zealand, Norway, Poland, Portugal, Romania, Russia, Saudi Arabia, Slovakia, Slovenia, South Africa, Spain, Sweden, Switzerland, Syria, United Kingdom, and United States
IBM-852
Albania, Bosnia-Herzegovina, Croatia, Czech Republic, Hungary, Multinational, Poland, Romania, Slovakia, and Slovenia
IBM-855
Bosnia-Herzegovina, Bulgaria, Macedonia (FYR), Montenegro, Multinational, Serbia, Serbia-Montenegro, and Yugoslavia
IBM-856
Israel
IBM-857
Multinational, Turkey
IBM-862
Israel
IBM-864
Egypt, Iraq, Jordan, Saudi Arabia, and Syria
IBM-866
Russia
IBM-869
Greece
IBM-870
Albania, Bosnia-Herzegovina, Croatia, Czech Republic, Hungary, Multinational, Poland, Romania, Slovakia, and Slovenia
IBM-871
Iceland
IBM-874
Thailand
IBM-921
Estonia, Latvia, Lithuania
IBM-922
Estonia

Microsoft Code Pages

The following table covers the supported Microsoft DOS and Windows code pages. Microsoft code pages are prefixed with "CP" as like CP850 in the codeset name.

Code Page
Description
CP437
MS-DOS, Latin United States
CP720
MS-DOS, Arabic
CP737
MS-DOS, Greek
CP775
MS-DOS, Baltic
CP850
MS-DOS, Multilingual Latin I
CP852
MS-DOS, Latin II
CP855
MS-DOS, Cyrillic
CP857
MS-DOS, Turkish
CP860
MS-DOS, Portuguese
CP861
MS-DOS, Icelandic
CP862
MS-DOS, Hebrew
CP863
MS-DOS, French Canada
CP864
MS-DOS, Arabic
CP865
MS_DOS, Nordic
CP866
MS-DOS, Cyrillic (Russian)
CP869
MS-DOS, Greek 2
CP874
MS-DOS, Thai
CP949
Windows, Korean
CP1250
Windows, Central Europe
CP1251
Windows, Cyrillic
CP1252
Windows, Latin
CP1253
Windows, Greek
CP1254
Windows, Turkish
CP1255
Windows, Hebrew
CP1256
Windows, Arabic
CP1257
Windows, Baltic
CP1258
Windows, Vietnam

Other Code Pages

Code Page
Description
KOI8-R, KOI8-U
8-bit codesets for Russian and Ukrainian Cyrillic
PTCP154
Pratype CP154 for Cyrillic; based on CP1251 with added Asian Cyrillic symbols
ALT
8-bit Alternative PC Cyrillic
MAC
8-bit Macintosh Cyrillic
DHN
Dom Handlowy Nauki, 8-bit codeset for Polish text
Mazovia
8-bit codeset for Polish text.
VISCII
Vietnamese Standard Code for Information Interchange is a modification of ASCII for Vietnamese.
TCVN
Vietnamese Standard Code for Information Interchange TCVN 5712:1993.
TIS-620
(TIS620-2533, EUC-TH)
Thai Industrial Standard 620-2533 is practically identical to the ISO 8859-11 codeset (see above).
ISCII (ISCII91)
Indian Script Code for Information Interchange is an ASCII-compatible codeset for Indic scripts.
ACE
(IDNA2008-REGIST)
ASCII Compatible Encoding defined in the RFCs 3490, 3492, and 5890 without allowing unassigned characters; it also uses STD3 ASCII rules.
IDNA2008-REGIST is an alias to ACE utilizing the IDNA2008 terminologies described in RFC 5890.
ACE-ALLOW-UNASSIGNED
(AIDNA2008-LOOKUP)
Same as ACE except that it allows unassigned characters. It's more suitable for query purposes; the ACE is more suitable for storing or giving host or domain names to machines.
IDNA2008-LOOKUP is an alias for ACE-ALLOW-UNASSIGNED utilizing the IDNA2008 terminologies described in RFC 5890.

Files

/usr/lib/iconv/*.so

iconv conversion modules

/usr/lib/iconv/*.bt

cconv code conversion binary tables for iconv(1), cconv(3C) and iconv(3C)

/usr/lib/iconv/geniconvtbl/binarytables/*.bt

geniconvtbl conversion binary tables

/usr/lib/iconv/alias

Alias table file of codeset names

See Also

geniconvtbl(1), iconv(1), cconv(3C), cconv_close(3C), cconv_open(3C), cconvctl(3C), iconv(3C), iconv_close(3C), iconv_open(3C), iconvctl(3C), alias(5), geniconvtbl-cconv(5), iconv_ja(7), iconv_ko(7), iconv_unicode(7), iconv_zh(7), iconv_zh_HK(7), iconv_zh_TW(7)

Chernov, A., Registration of a Cyrillic Character Set, RFC 1489, RELCOM Development Team, July 1993.

Nussbacher, H., and Y. Bourvine, Hebrew Character Encoding for Internet Messages, RFC 1555, Israeli Inter-University, Hebrew University, December 1993.

Reynolds, J., and J. Postel, ASSIGNED NUMBERS, RFC 1700, University of Southern California/Information Sciences Institute, October 1994.

Simonson, K., Character Mnemonics & Character Sets, RFC 1345, Rationel Almen Planlaegning, June 1992.

Spinellis, D., Greek Character Encoding for Electronic Mail Messages, RFC 1947, SENA S.A., May 1996.