Supported Encodings

2 Supported Encodings

The java.io.InputStreamReader, java.io.OutputStreamWriter, java.lang.String classes, and classes in the java.nio.charset package can convert between Unicode and a number of other character encodings. The supported encodings vary between different implementations of the Java SE Platform. The class description for java.nio.charset.Charset lists the encodings that any implementation of the Java SE platform is required to support.

The following tables show the encoding sets supported by this version of the Oracle Java SE platform. The canonical names used by the java.nio APIs are in many cases not the same as those used in the java.io and java.lang APIs.

Basic Encoding Set (contained in java.base module)

Canonical Name for java.nio API	Canonical Name for java.io API and java.lang API	Alias or Aliases	Description
CESU-8	CESU8	CESU8 csCESU-8	Unicode CESU-8
GB18030	GB18030	gb18030-2022 or gb18030-2000 if the system property and value `jdk.charset.GB18030=2000` are specified	Simplified Chinese, PRC standard
IBM00858	Cp858	cp858 ccsid00858 cp00858 858 PC-Multilingual-850+euro	Variant of Cp850 with Euro character
IBM437	Cp437	cp437 ibm437 ibm-437 437 cspc8codepage437 windows-437	MS-DOS United States, Australia, New Zealand, South Africa
IBM775	Cp775	cp775 ibm775 ibm-775 775	PC Baltic
IBM850	Cp850	cp850 ibm-850 ibm850 850 cspc850multilingual	MS-DOS Latin-1
IBM852	Cp852	cp852 ibm852 ibm-852 852 csPCp852	MS-DOS Latin-2
IBM855	Cp855	cp855 ibm-855 ibm855 855 cspcp855	IBM Cyrillic
IBM857	Cp857	cp857 ibm857 ibm-857 857 csIBM857	IBM Turkish
IBM862	Cp862	cp862 ibm862 ibm-862 862 csIBM862 cspc862latinhebrew	PC Hebrew
IBM866	Cp866	cp866 ibm866 ibm-866 866 csIBM866	MS-DOS Russian
ISO-8859-1	ISO8859_1	iso-ir-100 ISO_8859-1 latin1 l1 IBM819 cp819 csISOLatin1 819 IBM-819 ISO8859_1 ISO_8859-1:1987 ISO_8859_1 8859_1 ISO8859-1	ISO-8859-1, Latin Alphabet No. 1
ISO-8859-13	ISO8859_13	iso8859_13 8859_13 iso_8859-13 ISO8859-13	Latin Alphabet No. 7
ISO-8859-15	ISO8859_15	ISO_8859-15 Latin-9 csISO885915 8859_15 ISO-8859-15 ISO8859_15 ISO8859-15 IBM923 IBM-923 cp923 923 LATIN0 LATIN9 L9 csISOlatin0 csISOlatin9 ISO8859_15_FDIS	Latin Alphabet No. 9
ISO-8859-16	ISO8859_16	iso-ir-226 ISO_8859-16:2001 ISO_8859-16 latin10 l10 csISO885916	Latin Alphabet No. 10 or South-Eastern European
ISO-8859-2	ISO8859_2	iso8859_2 8859_2 iso-ir-101 ISO_8859-2 ISO_8859-2:1987 ISO8859-2 latin2 l2 ibm912 ibm-912 cp912 912 csISOLatin2	Latin Alphabet No. 2
ISO-8859-4	ISO8859_4	iso8859_4 iso8859-4 8859_4 iso-ir-110 ISO_8859-4 ISO_8859-4:1988 latin4 l4 ibm914 ibm-914 cp914 914 csISOLatin4	Latin Alphabet No. 4
ISO-8859-5	ISO8859_5	iso8859_5 8859_5 iso-ir-144 ISO_8859-5 ISO_8859-5:1988 ISO8859-5 cyrillic ibm915 ibm-915 cp915 915 csISOLatinCyrillic	Latin/Cyrillic Alphabet
ISO-8859-7	ISO8859_7	iso8859_7 8859_7 iso-ir-126 ISO_8859-7 ISO_8859-7:1987 ELOT_928 ECMA-118 greek greek8 csISOLatinGreek sun_eu_greek ibm813 ibm-813 813 cp813 iso8859-7	Latin/Greek Alphabet (ISO-8859-7:2003)
ISO-8859-9	ISO8859_9	iso8859_9 8859_9 iso-ir-148 ISO_8859-9 ISO_8859-9:1989 ISO8859-9 latin5 l5 ibm920 ibm-920 920 cp920 csISOLatin5	Latin Alphabet No. 5
KOI8-R	KOI8_R	koi8_r koi8 cskoi8r	KOI8-R, Russian
KOI8-U	KOI8_U	koi8_u	KOI8-U, Ukrainian
US-ASCII	ASCII	iso-ir-6 ANSI_X3.4-1986 ISO_646.irv:1991 ASCII ISO646-US us IBM367 cp367 csASCII default 646 iso_646.irv:1983 ANSI_X3.4-1968 ascii7	American Standard Code for Information Interchange
UTF-16	UTF-16	UTF_16 utf16 unicode UnicodeBig	Sixteen-bit Unicode (or UCS) Transformation Format, byte order identified by an optional byte-order mark
UTF-16BE	UnicodeBigUnmarked	UTF_16BE ISO-10646-UCS-2 X-UTF-16BE UnicodeBigUnmarked	Sixteen-bit Unicode (or UCS) Transformation Format, big-endian byte order
UTF-16LE	UnicodeLittleUnmarked	UTF_16LE X-UTF-16LE UnicodeLittleUnmarked	Sixteen-bit Unicode (or UCS) Transformation Format, little-endian byte order
UTF-32	UTF-32	UTF_32 UTF32	32-bit Unicode (or UCS) Transformation Format, byte order identified by an optional byte-order mark
UTF-32BE	UTF-32BE	UTF_32BE X-UTF-32BE	32-bit Unicode (or UCS) Transformation Format, big-endian byte order
UTF-32LE	UTF-32LE	UTF_32LE X-UTF-32LE	32-bit Unicode (or UCS) Transformation Format, little-endian byte order
UTF-8	UTF8	UTF8 unicode-1-1-utf-8	Eight-bit Unicode (or UCS) Transformation Format
windows-1250	Cp1250	cp1250 cp5346	Windows Eastern European
windows-1251	Cp1251	cp1251 cp5347 ansi-1251	Windows Cyrillic
windows-1252	Cp1252	cp1252 cp5348 ibm-1252 ibm1252	Windows Latin-1
windows-1253	Cp1253	cp1253 cp5349	Windows Greek
windows-1254	Cp1254	cp1254 cp5350	Windows Turkish
windows-1257	Cp1257	cp1257 cp5353	Windows Baltic
x-IBM737	Cp737	cp737 ibm737 ibm-737 737	PC Greek
x-IBM874	Cp874	cp874 ibm874 ibm-874 874	IBM Thai
x-UTF-16LE-BOM	UnicodeLittle	UnicodeLittle	Sixteen-bit Unicode (or UCS) Transformation Format, little-endian byte order, with byte-order mark
X-UTF-32BE-BOM	X-UTF-32BE-BOM	UTF_32BE_BOM UTF-32BE-BOM	32-bit Unicode (or UCS) Transformation Format, big-endian byte order, with byte-order mark
X-UTF-32LE-BOM	X-UTF-32LE-BOM	UTF_32LE_BOM UTF-32LE-BOM	32-bit Unicode (or UCS) Transformation Format, little-endian byte order, with byte-order mark

Extended Encoding Set (contained in jdk.charsets module)

Canonical Name for java.nio API	Canonical Name for java.io API and java.lang API	Alias or Aliases	Description
Big5	Big5	csBig5	Big5, Traditional Chinese
Big5-HKSCS	Big5_HKSCS	Big5_HKSCS big5hk big5-hkscs big5hkscs	Big5 with Hong Kong extensions, Traditional Chinese (incorporating 2001 revision)
EUC-JP	EUC_JP	euc_jp eucjis eucjp Extended_UNIX_Code_Packed_Format_for_Japanese csEUCPkdFmtjapanese x-euc-jp x-eucjp	JISX 0201, 0208 and 0212, EUC encoding Japanese
EUC-KR	EUC_KR	euc_kr ksc5601 euckr ks_c_5601-1987 ksc5601-1987 ksc5601_1987 ksc_5601 csEUCKR 5601	KS C 5601, EUC encoding, Korean
GB2312	EUC_CN	gb2312 gb2312-80 gb2312-1980 euc-cn euccn x-EUC-CN EUC_CN	GB2312, EUC encoding, Simplified Chinese
GBK	GBK	windows-936 CP936	GBK, Simplified Chinese
IBM01140	Cp1140	cp1140 ccsid01140 cp01140 1140 ebcdic-us-037+euro	Variant of Cp037 with Euro character
IBM01141	Cp1141	cp1141 ccsid01141 cp01141 1141 ebcdic-de-273+euro	Variant of Cp273 with Euro character
IBM01142	Cp1142	cp1142 ccsid01142 cp01142 1142 ebcdic-no-277+euro ebcdic-dk-277+euro	Variant of Cp277 with Euro character
IBM01143	Cp1143	cp1143 ccsid01143 cp01143 1143 ebcdic-fi-278+euro ebcdic-se-278+euro	Variant of Cp278 with Euro character
IBM01144	Cp1144	cp1144 ccsid01144 cp01144 1144 ebcdic-it-280+euro	Variant of Cp280 with Euro character
IBM01145	Cp1145	cp1145 ccsid01145 cp01145 1145 ebcdic-es-284+euro	Variant of Cp284 with Euro character
IBM01146	Cp1146	cp1146 ccsid01146 cp01146 1146 ebcdic-gb-285+euro	Variant of Cp285 with Euro character
IBM01147	Cp1147	cp1147 ccsid01147 cp01147 1147 ebcdic-fr-277+euro	Variant of Cp297 with Euro character
IBM01148	Cp1148	cp1148 ccsid01148 cp01148 1148 ebcdic-international-500+euro	Variant of Cp500 with Euro character
IBM01149	Cp1149	cp1149 ccsid01149 cp01149 1149 ebcdic-s-871+euro	Variant of Cp871 with Euro character
IBM037	Cp037	cp037 ibm037 ebcdic-cp-us ebcdic-cp-ca ebcdic-cp-wt ebcdic-cp-nl csIBM037 cs-ebcdic-cp-us cs-ebcdic-cp-ca cs-ebcdic-cp-wt cs-ebcdic-cp-nl ibm-037 ibm-37 cpibm37 037	USA, Canada (Bilingual, French), Netherlands, Portugal, Brazil, Australia
IBM1026	Cp1026	cp1026 ibm1026 ibm-1026 1026	IBM Latin-5, Turkey
IBM1047	Cp1047	cp1047 ibm-1047 1047	Latin-1 character set for EBCDIC hosts
IBM273	Cp273	cp273 ibm273 ibm-273 273	IBM Austria, Germany
IBM277	Cp277	cp277 ibm277 ibm-277 277	IBM Denmark, Norway
IBM278	Cp278	cp278 ibm278 ibm-278 278 ebcdic-sv ebcdic-cp-se csIBM278	IBM Finland, Sweden
IBM280	Cp280	cp280 ibm280 ibm-280 280	IBM Italy
IBM284	Cp284	cp284 ibm284 ibm-284 284 csIBM284 cpibm284	IBM Catalan/Spain, Spanish Latin America
IBM285	Cp285	cp285 ibm285 ibm-285 285 ebcdic-cp-gb ebcdic-gb csIBM285 cpibm285	IBM United Kingdom, Ireland
IBM290	Cp290	cp290 ibm290 ibm-290 csIBM290 EBCDIC-JP-kana 290	IBM Japanese Katakana Host Extended SBCS
IBM297	Cp297	cp297 ibm297 ibm-297 297 ebcdic-cp-fr cpibm297 csIBM297	IBM France
IBM420	Cp420	cp420 ibm420 ibm-420 ebcdic-cp-ar1 420 csIBM420	IBM Arabic
IBM424	Cp424	cp424 ibm424 ibm-424 424 ebcdic-cp-he csIBM424	IBM Hebrew
IBM500	Cp500	cp500 ibm500 ibm-500 500 ebcdic-cp-ch ebcdic-cp-bh csIBM500	EBCDIC 500V1
IBM860	Cp860	cp860 ibm860 ibm-860 860 csIBM860	MS-DOS Portuguese
IBM861	Cp861	cp861 ibm861 ibm-861 861 csIBM861 cp-is	MS-DOS Icelandic
IBM863	Cp863	cp863 ibm863 ibm-863 863 csIBM863	MS-DOS Canadian French
IBM864	Cp864	cp864 ibm864 ibm-864 864 csIBM864	PC Arabic
IBM865	Cp865	cp865 ibm865 ibm-865 865 csIBM865	MS-DOS Nordic
IBM868	Cp868	cp868 ibm868 ibm-868 868 cp-ar csIBM868	MS-DOS Pakistan
IBM869	Cp869	cp869 ibm869 ibm-869 869 cp-gr csIBM869	IBM Modern Greek
IBM870	Cp870	cp870 ibm870 ibm-870 870 ebcdic-cp-roece ebcdic-cp-yu csIBM870	IBM Multilingual Latin-2
IBM871	Cp871	cp871 ibm871 ibm-871 871 ebcdic-cp-is csIBM871	IBM Iceland
IBM918	Cp918	cp918 ibm-918 918 ebcdic-cp-ar2	IBM Pakistan (Urdu)
IBM-Thai	Cp838	cp838 ibm838 ibm-838 838	IBM Thailand extended SBCS
ISO-2022-CN	ISO2022CN	ISO2022CN csISO2022CN	GB2312 and CNS11643 in ISO 2022 CN form, Simplified and Traditional Chinese (conversion to Unicode only)
ISO-2022-JP	ISO2022JP	iso2022jp jis csISO2022JP jis_encoding csjisencoding	JIS X 0201, 0208, in ISO 2022 form, Japanese
ISO-2022-JP-2	ISO2022JP2	csISO2022JP2 iso2022jp2	JIS X 0201, 0208, 0212 in ISO 2022 form, Japanese
ISO-2022-KR	ISO2022KR	ISO2022KR csISO2022KR	ISO 2022 KR, Korean
ISO-8859-3	ISO8859_3	iso8859_3 8859_3 ISO_8859-3:1988 iso-ir-109 ISO_8859-3 ISO8859-3 latin3 l3 ibm913 ibm-913 cp913 913 csISOLatin3	Latin Alphabet No. 3
ISO-8859-6	ISO8859_6	iso8859_6 8859_6 iso-ir-127 ISO_8859-6 ISO_8859-6:1987 ISO8859-6 ECMA-114 ASMO-708 arabic ibm1089 ibm-1089 cp1089 1089 csISOLatinArabic	Latin/Arabic Alphabet
ISO-8859-8	ISO8859_8	iso8859_8 8859_8 iso-ir-138 ISO_8859-8 ISO_8859-8:1988 ISO8859-8 cp916 916 ibm916 ibm-916 hebrew csISOLatinHebrew	Latin/Hebrew Alphabet
JIS_X0201	JIS_X0201	JIS0201 JIS_X0201 X0201 csHalfWidthKatakana	JIS X 0201
JIS_X0212-1990	JIS0212	JIS0212 jis_x0212-1990 x0212 iso-ir-159 csISO159JISX02121990	JIS X 0212
Shift_JIS	SJIS	sjis shift_jis shift-jis ms_kanji x-sjis csShiftJIS	Shift-JIS, Japanese
TIS-620	TIS620	tis620 tis620.2533	TIS620, Thai
windows-1255	Cp1255	cp1255	Windows Hebrew
windows-1256	Cp1256	cp1256	Windows Arabic
windows-1258	Cp1258	cp1258	Windows Vietnamese
windows-31j	MS932	MS932 windows-932 csWindows31J	Windows Japanese
x-Big5-HKSCS-2001	x-Big5-HKSCS-2001	Big5_HKSCS_2001 big5hk-2001 big5-hkscs-2001 big5-hkscs:unicode3.0 big5hkscs-2001	Big5 with Hong Kong Supplementary Character Set, 2001 revision
x-Big5-Solaris	Big5_Solaris	Big5_Solaris	Big5 with seven additional Hanzi ideograph character mappings for the Solaris zh_TW.BIG5 locale
x-euc-jp-linux	EUC_JP_LINUX	euc_jp_linux euc-jp-linux	JISX 0201, 0208, EUC encoding Japanese
x-eucJP-Open	EUC_JP_Solaris	EUC_JP_Solaris eucJP-open	JISX 0201, 0208, 0212, EUC encoding Japanese
x-EUC-TW	EUC_TW	euc_tw euctw cns11643 EUC-TW	CNS11643 (Plane 1-7,15), EUC encoding, Traditional Chinese
x-IBM1006	Cp1006	cp1006 ibm1006 ibm-1006 1006	IBM AIX Pakistan (Urdu)
x-IBM1025	Cp1025	cp1025 ibm1025 ibm-1025 1025	IBM Multilingual Cyrillic: Bulgaria, Bosnia, Herzegovinia, Macedonia (FYR)
x-IBM1046	Cp1046	cp1046 ibm1046 ibm-1046 1046	IBM Arabic - Windows
x-IBM1097	Cp1097	cp1097 ibm1097 ibm-1097 1097	IBM Iran (Farsi)/Persian
x-IBM1098	Cp1098	cp1098 ibm1098 ibm-1098 1098	IBM Iran (Farsi)/Persian (PC)
x-IBM1112	Cp1112	cp1112 ibm1112 ibm-1112 1112	IBM Latvia, Lithuania
x-IBM1122	Cp1122	cp1122 ibm1122 ibm-1122 1122	IBM Estonia
x-IBM1123	Cp1123	cp1123 ibm1123 ibm-1123 1123	IBM Ukraine
x-IBM1124	Cp1124	cp1124 ibm1124 ibm-1124 1124	IBM AIX Ukraine
x-IBM1129	Cp1129	cp1129 ibm1129 ibm-1129 1129	IBM AIX Vietnamese
x-IBM1166	Cp1166	cp1166 ibm1166 ibm-1166 1166	IBM Cyrillic Multilingual with euro for Kazakhstan
x-IBM1364	Cp1364	cp1364 ibm1364 ibm-1364 1364	IBM EBCDIC KS X 1005-1
x-IBM1381	Cp1381	cp1381 ibm1381 ibm-1381 1381	IBM OS/2, DOS People's Republic of China (PRC)
x-IBM1383	Cp1383	cp1383 ibm1383 ibm-1383 1383 ibmeuccn ibm-euccn cpeuccn	IBM AIX People's Republic of China (PRC)
x-IBM300	Cp300	cp300 ibm300 ibm-300 300	IBM Japanese Latin Host Double-Byte
x-IBM33722	Cp33722	cp33722 ibm33722 ibm-33722 ibm-5050 ibm-33722_vascii_vpua 33722	IBM-eucJP - Japanese (superset of 5050)
x-IBM833	Cp833	cp833 ibm833 ibm-833	IBM Korean Host Extended SBCS
x-IBM834	Cp834	cp834 ibm834 834 ibm-834	IBM EBCDIC DBCS-only Korean
x-IBM856	Cp856	cp856 ibm-856 ibm856 856	IBM Hebrew
x-IBM875	Cp875	cp875 ibm875 ibm-875 875	IBM Greek
x-IBM921	Cp921	cp921 ibm921 ibm-921 921	IBM Latvia, Lithuania (AIX, DOS)
x-IBM922	Cp922	cp922 ibm922 ibm-922 922	IBM Estonia (AIX, DOS)
x-IBM930	Cp930	cp930 ibm930 ibm-930 930	Japanese Katakana-Kanji mixed with 4370 UDC, superset of 5026
x-IBM933	Cp933	cp933 ibm933 ibm-933 933	Korean Mixed with 1880 UDC, superset of 5029
x-IBM935	Cp935	cp935 ibm935 ibm-935 935	Simplified Chinese Host mixed with 1880 UDC, superset of 5031
x-IBM937	Cp937	cp937 ibm937 ibm-937 937	Traditional Chinese Host miexed with 6204 UDC, superset of 5033
x-IBM939	Cp939	cp939 ibm939 ibm-939 939	Japanese Latin Kanji mixed with 4370 UDC, superset of 5035
x-IBM942	Cp942	cp942 ibm942 ibm-942 942	IBM OS/2 Japanese, superset of Cp932
x-IBM942C	Cp942C	cp942C ibm942C ibm-942C 942C cp932 ibm932 ibm-932 932 x-ibm932	Variant of Cp942
x-IBM943	Cp943	cp943 ibm943 ibm-943 943	IBM OS/2 Japanese, superset of Cp932 and Shift-JIS
x-IBM943C	Cp943C	cp943C ibm943C ibm-943C 943C	Variant of Cp943
x-IBM948	Cp948	cp948 ibm948 ibm-948 948	OS/2 Chinese (Taiwan) superset of 938
x-IBM949	Cp949	cp949 ibm949 ibm-949 949	PC Korean
x-IBM949C	Cp949C	cp949C ibm949C ibm-949C 949C	Variant of Cp949
x-IBM950	Cp950	cp950 ibm950 ibm-950 950	PC Chinese (Hong Kong, Taiwan)
x-IBM964	Cp964	cp964 ibm964 ibm-964 ibm-euctw 964	AIX Chinese (Taiwan)
x-IBM970	Cp970	cp970 ibm970 ibm-970 ibm-eucKR 970	AIX Korean
x-ISCII91	ISCII91	iscii ST_SEV_358-88 iso-ir-153 csISO153GOST1976874 ISCII91	ISCII91 encoding of Indic scripts
x-ISO-2022-CN-CNS	ISO2022CN_CNS	ISO2022CN_CNS ISO-2022-CN-CNS	CNS11643 in ISO 2022 CN form, Traditional Chinese (conversion from Unicode only)
x-ISO-2022-CN-GB	ISO2022CN_GB	ISO2022CN_GB ISO-2022-CN-GB	GB2312 in ISO 2022 CN form, Simplified Chinese (conversion from Unicode only)
x-iso-8859-11	x-iso-8859-11	iso-8859-11 iso8859_11	Latin/Thai Alphabet
x-JIS0208	JIS0208	JIS0208 JIS_C6226-1983 iso-ir-87 x0208 JIS_X0208-1983 csISO87JISX0208	JIS X 0208
x-JISAutoDetect	JISAutoDetect	JISAutoDetect	Detects and converts from Shift-JIS, EUC-JP, ISO 2022 JP (conversion to Unicode only)
x-Johab	x-Johab	ksc5601-1992 ksc5601_1992 ms1361 johab	Korean, Johab character set
x-MacArabic	MacArabic	MacArabic	Macintosh Arabic
x-MacCentralEurope	MacCentralEurope	MacCentralEurope	Macintosh Latin-2
x-MacCroatian	MacCroatian	MacCroatian	Macintosh Croatian
x-MacCyrillic	MacCyrillic	MacCyrillic	Macintosh Cyrillic
x-MacDingbat	MacDingbat	MacDingbat	Macintosh Dingbat
x-MacGreek	MacGreek	MacGreek	Macintosh Greek
x-MacHebrew	MacHebrew	MacHebrew	Macintosh Hebrew
x-MacIceland	MacIceland	MacIceland	Macintosh Iceland
x-MacRoman	MacRoman	MacRoman	Macintosh Roman
x-MacRomania	MacRomania	MacRomania	Macintosh Romania
x-MacSymbol	MacSymbol	MacSymbol	Macintosh Symbol
x-MacThai	MacThai	MacThai	Macintosh Thai
x-MacTurkish	MacTurkish	MacTurkish	Macintosh Turkish
x-MacUkraine	MacUkraine	MacUkraine	Macintosh Ukraine
x-MS932_0213	x-MS950-HKSCS	MS932-0213 MS932_0213 MS932:2004 windows-932-0213 windows-932:2004	Shift_JISX0213 Windows MS932 Variant
x-MS950-HKSCS	MS950_HKSCS	MS950_HKSCS	Windows Traditional Chinese with Hong Kong extensions
x-MS950-HKSCS-XP	x-mswin-936	MS950_HKSCS_XP	HKSCS Windows XP Variant
x-mswin-936	MS936	ms936 ms_936	Windows Simplified Chinese
x-PCK	PCK	pck	Solaris version of Shift_JIS
x-SJIS_0213	x-SJIS_0213	sjis-0213 sjis_0213 sjis:2004 sjis_0213:2004 shift_jis_0213:2004 shift_jis:2004	Shift_JISX0213
x-windows-50220	MS50220	ms50220 cp50220	Windows Codepage 50220 (7-bit implementation)
x-windows-50221	MS50221	ms50221 cp50221	Windows Codepage 50221 (7-bit implementation)
x-windows-874	MS874	ms874 ms-874 windows-874	Windows Thai
x-windows-949	MS949	ms949 windows949 windows-949 ms_949	Windows Korean
x-windows-950	MS950	ms950 windows-950	Windows Traditional Chinese
x-windows-iso2022jp	windows-iso2022jp	windows-iso2022jp	Variant ISO-2022-JP (MS932 based)

Printing Charset Information

The following application prints the aliases of each charset supported by Java SE:

import java.nio.charset.*; 

class DisplayCharsetAliases {
    public static void main(String[] args) {
        System.out.println("Charset -> Aliases");
        System.out.println("==================");
        for (Charset cs : Charset.availableCharsets().values()) {
            System.out.println(cs.name() + " -> " + cs.aliases());
        }
    }
}

Default Charset

The default charset is UTF-8. However, in JDK 17 and earlier releases, the default charset depends on the host and the user.

Standard Java APIs use the default charset unless you specify one. These APIs include:

In the package java.io, the classes InputStreamReader, FileReader, OutputStreamWriter, FileWriter, and PrintStream, which define constructors to create readers, writers, and print streams that encode or decode using the default charset
In the package java.util, the classes Formatter and Scanner, which define constructors whose results use the default charset

Note:

The standard output stream System.out and the standard error output stream System.err don't use the default charset; they use the charset specified by Console.charset().

Specify the encoding for System.out and System.err with the system properties stdout.encoding and stderr.encoding, respectively. The default values of these system properties depend on the platform. The default values take on the value of the native.encoding property when the platform does not provide streams for the console.

Default Charset for JDK 17 and Earlier Releases

In JDK 17 and earlier releases, the default charset is determined when the Java runtime starts. On macOS, the default charset is UTF-8 except in the POSIX C locale. On other operating systems, it depends on the user's locale and the default encoding. For example, on Windows, it's a codepage-based charset such as windows-1252 or windows-31j. The method java.nio.charsets.Charset.defaultCharset() returns the default charset.

You can run the following command to determine the default charset of your JDK:

java -XshowSettings:properties -version 2>&1 | grep file.encoding

Changing the JDK's Default Charset

You can set the value of the file.encoding system property on the command line to one of the following values to specify that the JDK's default charset is UTF-8 or the default charset is determined as in JDK 17 and ealier releases:

UTF-8: The default charset is UTF-8.
COMPAT: The default charset is determined as in JDK 17 and earlier releases.

Other values for file.encoding are not supported.

Note:

Before deploying your application on a JDK whose default charset is UTF-8, check if it has any charset issues by running it on a JDK whose default charset is not UTF-8 with the following command:

java -Dfile.encoding=UTF-8 <your application>

Running Java Applications on JDK Whose Default Charset Is Determined by Environment

JDK 17 introduced the system property native.encoding. Use this property to obtain the underlying host environment's character encoding name, especially if you specified that your JDK determines the default charset as in JDK 17 and earlier releases.

Note:

Setting the value of the system property native.encoding through the command line or with the method System.setProperty() has no effect.

The following example obtains the default charset from the system property native.encoding. Note that you can run this example on any JDK release; if the system property native.encoding hasn't been defined, then the example obtains the default charset from the method Charset.defaultCharset():

String encoding = System.getProperty("native.encoding");
Charset cs = (encoding != null) ? Charset.forName(encoding) : Charset.defaultCharset();

If your application expects the default charset to be determined as in JDK 17 and earlier releases, then use this obtained charset as a constructor argument for objects that rely on a charset, for example:

var reader = new FileReader("file.txt", cs);

Note:

The method call Charset.forName("default") throws an UnsupportedCharsetException. Use Charset.forName("US-ASCII") or Charset.defaultCharset() instead. (In JDK 17 and earlier releases, Charset.forName("default") produces the same result as Charset.forName("US-ASCII").)

The value of native.encoding affects the value of file.encoding:

If file.encoding is set to COMPAT on the command line, then the run-time value of file.encoding will be the same as the run-time value of native.encoding.
If file.encoding is set to UTF-8 on the command line, then the run-time value of file.encoding may differ from the run-time value of native.encoding.

Ensuring Source File Encoding Is Compatible with Your JDK

The javac compiler assumes that .java source files are encoded with the default charset unless configured otherwise with the -encoding option.

Consequently, before compiling an application on a JDK whose default charset is UTF-8, check for charset issues by compiling your application with the following command:

javac -encoding UTF-8 <source files of your application>

Alternatively, if you prefer to save your source files with an encoding other than UTF-8, specify in the -encoding option the value of the native.encoding system property.