Use with Multibyte Environments

Internationalization in WebLogic Server

Overview of Internationalization

Main features of Internationalization (I18n) in WebLogic Server:

All characters inside WebLogic Server are expressed in Unicode. Therefore any input/ output operations of character data will lead to a code conversion.
The encoding conversion must be appropriately specified for WebLogic Server, J2EE component, and resources on the WebLogic Server J2EE container, respectively.
If you do not specify the encoding conversion, the default encoding conversion will be applied.
There are various types of default encoding conversions, and some of them do not conform to the locale of the operating system.

When you are to configure a distributed system that handles multibyte character data using WebLogic Server, you need to fully understand how to specify encoding methods particularly for Java and J2EE. Furthermore, you also need to sufficiently study how the encoding is done in operating system, internet and backend systems which are linked to WebLogic Server, to correctly control the encoding conversions.

The following is a concise description of the encoding handling in WebLogic Server.

Use of Unicode

WebLogic Server is a 100% pure Java application server program. All encodings inside the Server use Unicode.

This allows WebLogic Server to handle characters of all languages at the same time, provided that their characters can be handled by Unicode.

Encoding Conversion

The encoding conversion is needed when WebLogic Server exchanges character data with the outside.

In the common operating systems, the environment to use Unicode which is the internal code inside Java is very rare, instead the encoding called native encoding, which is defined individually for each platform, is generally performed. Some examples of native encoding would be, for Windows, the code page that corresponds to each language, for UNIX, the encoding that corresponds to the locale specified by LANG environment variables, and for databases, the character set specified when the database is generated or the character set for the clients.

For this reason, each time character input or output occurs at WebLogic Server, a conversion between native encoding characters and characters in Unicode must be performed in either way (character set conversion). This encoding conversion occurs every time when character data input/output takes place with the operating system or external resources.

Note: The characters included in the stream to which Java class is serialized do not require the code conversion since Unicode is preserved as internal information of the class after being encoded with UTF-8. Hence generally with EJB or RMI, no considerations on encoding are necessary.

Also, it must be noted that the encoding conversion requires relatively large CPU resources since the conversion must be done for each individual character. This leads to a suggestion that at the time of application designing phase you need to reduce the code conversion as much as possible to obtain better performance.

Separation between the Encoding Conversion for the Server Itself and the Encoding Conversion for Application Components and Resources on WebLogic Server

In WebLogic Server, the encoding conversion for the server itself and the encoding conversion for application components and resources on WebLogic Server are separate.

In WebLogic Server, the encoding of the server log or the Administration Console is determined by the default encoding of a server's Java VM or a browser's language setting independently of the encoding of the application component or the language of the content that the WebLogic Server is serving.

Moreover, by deploying an application component to WebLogic Server, you can configure it to behave identically, regardless of any locale (language) environment the WebLogic Server is running in.

Also, you can set the encoding conversion individually for each resource configured on the WebLogic Server's container (ex. JDBC Connection Pool).

The encoding conversions of WebLogic Server itself include:

System log output of WebLogic Server
Page encoding of the Administration Console
File input/output operations with local file systems

The encoding conversions of individual applications include:

JSP files
Servlets
XML files
Web services

Resources on WebLogic Server include:

JDBC connection
WTC connection etc.

When you specify an encoding on WebLogic Server, you must clarify to which one of above three categories the encoding is to be applied. Furthermore, you must always be aware whether the right character object can be created in WebLogic Server, or the character object inside WebLogic Server is being correctly encoded and output as it is supposed to be.

As above, when multibyte characters are to be handled on WebLogic Server, the entire process of encoding conversion must be understood and any setting as necessary must be made. In some cases, the application software may not be able to handle the multibyte characters correctly without setting encoding conversion.

In any case, when encoding is not specified, some default encoding will be applied. The default encoding applied may vary with each specification and/or environment.

Example of the Default Encoding

The default encodings relating to the behavior of the WebLogic Server include:

Default encoding of a server VM
Default encoding of J2EEs
Default encoding of XML
Default encoding of the HTTP protocol
Default encoding of a Web browser
Default encoding of Web service (such as SOAP/WSDL/UDDI), etc.

Example:

Default encoding of Java VM in Japanese Windows is MS932
Default encoding of J2EE is ISO-8859-1
Default encoding of XML is UTF-8
Default encoding of HTTP is ISO-8859-1

Since, as shown above, a default encoding varies with the technical specification employed, specifying no encoding at all will lead to incorrect multibyte handling in WebLogic Server. Therefore the full understanding of each way to specify encodings described in the following chapters is strongly recommended to control encoding conversions.

The encoding means the "character set" in Java language terminology. There are a number of words that describe a character set, but the definition of each word is slightly different.

The encoding or the character set means the definition which assigns computer-readable codes to the set of characters of a specific language so that the computer can deal with these characters. This definition is called "encoding" in the Java terminology, "character set" in the Internet terminology.

Java absorbs these differences at the input/output stage, allowing it to use only Unicode always internally. This represents the excellence of Java to be able to handle any character set wherever encoding definition is available. In other words, Java is said to have the possibility to absorb all the differences of encoding that exist among various systems. However, at the moment, there is no encoding conversion table that can handle all minute differences. Also the existing encoding tables have some limitations due to the consistency with Unicode.

What is particularly important with Java Web application servers is the difference between encoding names of Java and MIME character set which is defined by IANA used in Internet and XML. To absorb this difference, WebLogic Server has a mapping table between Java encoding names and IANA character set names (see Predefined MIME-Java Encoding Mapping Table in WebLogic Server). Using this, for example, the file defined as Shift_JIS in JSP can be treated as SJIS in Java. Also with Web components, you can change this mapping table of WebLogic Server system and treat, for example, IANA character set name ‘Shift_JIS’ as a Java encoding ‘cp943’ (see Mapping Change for Java Encoding and IANA Character Set Involving HTTP Responses (Not J2EE-Compliant)).

The xerces, an embedded XML parser in WebLogic Server has its own mapping table between IANA and Java. This cannot be customized by users. For example, a character name in IANA ‘Shift_JIS’ is mapped to ‘SJIS’ of Java’s encoding name.

In WebLogic Server, the encoding is basically specified by using encoding names of Java. Also, for J2EE, Internet and XML, IANA character set names are used. The user is requested to change this mapping as necessary.

How to specify Default Encoding for WebLogic Server's Container

WebLogic Server can specify encodings in various different effective areas. For example in JSP, the page tag compliant to JSP2.0 specification is provided to specify the encoding of the individual pages. The encoding for each effective area such as this is nothing to do with the default encoding for JavaVM with which WebLogic Server operates, in other words the encoding which an internal implementation of JavaVM determines from the locale environment of the operating platform. If the locale for JavaVM is English, there is no problem supplying services using JSP file containing multibyte characters. However, with regards to the following items, the character strings will be handled relying on the default encoding of JavaVM.

Log output of WebLogic Server
File input/output operations with local file systems

These will operate with default encoding of JavaVM. When the language and encoding of log messages of WebLogic Server need to be switched by replacing the platform locale, following must be specified. You cannot switch the Java VM default encoding dynamically once the VM has been started. Make sure of the following settings before you restart WebLogic Server.

For Windows

From Control Panel - Region (or Regional Options), select a language, such as English (United States), Japanese, Korean, Chinese (PRC) and Chinese (Taiwan). By this selection, the server will operate using CP1252, MS932, MS949, GBK or MS950 as the default encoding.

For UNIX

Specify the locale supported by your platform in the LANG environment variable.

Some examples of encoding for server vs. LANG environment variables are shown below. For other combinations, consult with your platform manuals.

Platform	Encoding	LANG environment variable
Solaris	EUC-JP, SJIS	ja, ja_JP.eucJP, or ja_JP.PCK
Solaris	EUC-KR	ko or ko_KR
Solaris	GB2312, GBK	zh_CN or zh_CN.GBK
Solaris	GB18030	zh_CN.GB18030
Solaris	Big5	zh_TW.BIG5
HP	EUCJIS, SJIS	ja_JP.eucJP, ja_JP.SJIS
HP	EUC-KR	ko.eucKR or ko_KR
HP	GB2312	zh_CN.hp15CN
HP	GB18030	zh_CN.gb18030
HP	Big5	zh_TW.big5

For example, if you specify EUC-JP on Solaris, the LANG setting looks like this:

LANG=ja

Notes on Configuring Administration and Managed Servers

Use the same encoding for all the WebLogic Servers through out a domain.

In WebLogic Server, it is necessary to have the same encoding settings for all the servers in the domain.

For example, when a Windows platform exists within a domain, standardize with MS932 encoding. In the case of a server with different encoding, that servers' log will not show correctly.

Notes on Configuring Clusters

Use the same encoding for all the WebLogic Servers in a cluster.

In WebLogic Server, it is necessary to have the same encoding settings for all the servers in the cluster.

For example, when a Windows platform exists within a cluster, standardize with MS932 encoding. In the case of a server with different encoding, that servers' log will not show correctly.

Encoding for config.xml

The config.xml file is input/output in UTF-8. When editing the file directly with a text editor, read and save in UTF-8.

JDBC connection

When creating a JDBC connection pool, you must specify an appropriate encoding for a connection to a DB which uses multibyte characters. Also depending upon the requirements from the system to be built, encoding conversion mappings for Web layer and DB layer may have to be matched.

Deployment

In WebLogic Server, multibyte characters in DD files of J2EE components are handled according to XML declaration. If the DD file has no encoding attribute in the XML declaration or has no XML declaration, the file is handled as UTF-8.

Notes on Using Administration Console

Displayed Language on Administration Console

The language displayed when Administration Console is started is the language you specify in the language property for your Web browser. For example, if you have not changed the setting in your IE under Japanese Windows, Japanese language will be displayed when the Administration Console is started. If you wish to change it into English, set the language setting of the browser to "English" and delete all other languages in the list. Note that all output encoding of Administration Console is standardized to UTF-8, regardless of languages.

Encoding for sending an e-mail

For sending an e-mail in WebLogic Server, JavaMail is implemented. Therefore, adding mail.mime.charset, which is the system property for JavaMail, to WebLogic Server startup option will enable you to change the encoding of an e-mail to be sent. (When this property is omitted, the default encoding for JavaVM will be used.)

Example:

Change the encoding of sending an e-mail from WebLogic Server to ISO-2022-JP.

-Dmail.mime.charset=ISO-2022-JP

A typical example of sending an e-mail from WebLogic Server would be to use SMTP for notification of diagnosis service at system management.

Programming

As already described in Overview of Internationalization, all characters inside WebLogic Server are handled by Unicode, but any input/output of character data with external resources will lead to encoding conversion. This section includes topics on some useful notes when processing multibyte characters in view of application programming.

Security

UTF-8 Encoding Support with Public Key Certificates (CR090467)

Conforming to RFC3280, WebLogic Server supports UTF-8 encoding with public key certificates. For details of RFC3280, see Internet X.509 Public Key Infrastructure: Certificate and CRL Profile.

Browser Locale when Setting Security Policy (CR285384)

Security Policy setup will fail if one of the following locales is specified.

ko [Korean]
zh [Chinese]
zh-cn [Chinese/China]
zh-tw [Chinese/Taiwan]

Workaround:

Please change your browser locale to en-us. For how to change the locale setting of your browser, please refer to the browser's help.

Fixed in:

This problem was fixed in WLS9.2MP1.

Web Components

From the view point of WebLogic Server, the external resources which necessitate the encoding conversion are those that use the HTTP protocol. The HTTP protocol is so designed as to transport the messages in various encodings. Therefore it is of a great importance how the encoding conversion between Unicode character strings handled inside the server and the messages encoded by the specific encoding on HTTP protocol is treated as Web components. As the solution to this problem, some encoding conversion settings are prepared as several APIs and parameters, corresponding to J2EE specification and WebLogic Server’s proprietary specification. The user is requested to understand the following explanation and to find the optimum combination of settings to meet the requirements of the system to be built.

Targets of encoding settings for Web components

The targets for encoding setting with regards to J2EE Web components is as follows:

Encoding for responses
Encoding for requests
Encoding for files

In J2EE specification, the default encoding when these items are omitted is specified. The default encoding for each component is as shown below.

Component Name	Default Encoding
Servlet	ISO-8859-1
JSP	ISO-8859-1
XML format JSP Document	UTF-8
Tag File	ISO-8859-1
XML format Tag File	UTF-8

Since ISO-8859-1 encoding is extensively used as default encoding except for XML components, encoding setting is essential for the use of multibyte characters. The details of settings for each Web component are shown below. The meaning of each column in the table is as follows:

Setting Location: Location where encoding is set.
Effective Area: Area where the setting is effective.
Setting Value: Character string used for setting.
J2EE Compliant: YES for J2EE compliant, NO for WebLogic Server proprietary specification.
Priority: The priority with which set value is used (the smaller the value, the higher priority). The lower priority will be overwritten by higher priority. In general, the one with smaller effective area will have higher priority, and vice versa.
Setting Example: Examples of encoding setting.

Specifying the Encoding for a Response

For Servlets

There are following three ways to specify encoding for response to servlet.

Setting Location	Effective Area	Setting Value	J2EE Compliant	Priority	Setting Example
ServletResponse#setContentType method	Each HTTP response	MIME type with Charset attribute (IANA name)	YES	1	`setContentType("text/html;charset=Shift_JIS");`
ServletResponse#setCharacterEncoding method	Each HTTP response	IANA name	YES	1	`setCharacterEncoding("EUC-JP");`
ServletResponse#setLocale method	Each HTTP response	Locale name (Note 1)	YES	2	`setLocale(ja);`

Note 1: Encoding is determined by IANA name that is identified by locale name. See Locale-to-IANA Mapping for locale vs. IANA name.

Note that you need to call these methods before obtaining Writer, as shown below.

res.setContentType("text/html;charset=Shift_JIS"); PrintWriter out = res.getWriter();

For JSPs

There are following five ways to specify encoding for response to JSP.

Setting Location	Effective Area	Setting Value	J2EE Compliant	Priority	Setting Example
contentType attribute of Page directive	Each file	MIME type with Charset attribute (IANA name)	YES	1	`<%@ page contentType="text/html; charset=EUC-JP" %>`
page-encoding element in web.xml	Within specified URL pattern	IANA name and URL pattern	YES	2	`<jsp-config> <jsp-property-group> <url-pattern>/euc/</url-pattern> <page-encoding>EUC-JP</page-encoding> </jsp-property-group> <jsp-property-group> <url-pattern>/utf8/</url-pattern> <page-encoding>UTF-8</page-encoding> </jsp-property-group> </jsp-config>`
pageEncoding attribute of Page directive	Each file	IANA name	YES	2 (Note 1)	`<%@ page pageEncoding="Windows-31J" %>`
encoding element in weblogic.xml (not recommended)	Entire Web application	Java encoding name	NO	3	`<jsp-descriptor> <encoding>Windows-31J</encoding> </jsp-descriptor>`
webapp.encoding.default parameter of application-param element in weblogic-application.xml (Note 2)	Entire Enterprise application	IANA name	NO	4	`<application-param> <param-name>webapp.encoding.default</param-name> <param-value>EUC-JP</param-value> </application-param>`

Note 1: Due to JSP2.0 specification, when web.xml page-encoding element and pageEncoding attribute of page directive do not match, an error occurs when compiling JSP. As the result, the priority of both are the same.

Note 2: The value set here will be reflected on the parameters of ServletResponse#setContentType method inside the Servlet code into which JSP is compiled. Therefore when webapp.encoding.default is changed, the JSP files of the whole of enterprise application need to be rebuilt to keep the change effective.

For JSP Documents

There are following ways to specify encoding for response to JSP Document.

Setting Location	Effective Area	Setting Value	J2EE Compliant	Priority	Setting Example
contentType attribute of Page directive	Each file	MIME type with Charset attribute (IANA name)	YES	1	`<jsp:directive.page contentType="text/html; CHARSET=euc-jp"/>`

Specifying the Encoding for a Request

Among the methods of specifying encoding for a HTTP request, the most compliant one to HTTP specification would be to specify a character set for a charset attribute of ContentType header of HTTP request. By doing this, WegLogic Server on receiving side can correctly recognize HTTP request encoding in protocol base. However, major Web browsers such as Microsoft IE and Netscape browser cannot specify this value. Therefore HTTP request encoding also needs to be specified at WebLogic Server side.

The setting of encoding for a request is common to JSP and Sevlet, and there are following three ways.

Setting Location	Effective Area	Setting Value	J2EE Compliant	Priority	Setting Example
ServletRequest#setCharacterEncoding method	Each HTTP request	IANA name	YES	1	`setCharacterEncoding("EUC-JP");`
input-charset element in weblogic.xml	Within specified URL pattern	Java encoding name and URL pattern	NO	2	`<charset-params> <input-charset> <resource-path>/</resource-path> <java-charset-name>EUC_JP</java-charset-name> </input-charset> <input-charset> <resource-path>/rus/joe/</resource-path> <java-charset-name>Shift_JIS</java-charset-name> </input-charset> </charset-params>`
webapp.encoding.default parameter of application-param element in weblogic-application.xml	Entire Enterprise application	IANA name	NO	3	`<application-param> <param-name>webapp.encoding.default</param-name> <param-value>EUC-JP</param-value> </application-param>`

Specifying the Encoding for a File

The Web components other than Servlet need to be read by some appropriate encoding at the time of Web container being run. For example, JSP compiler will read JSP file using some appropriate encoding when it translates JSP file into Servlet Java code. Likewise, Web components other than Servlet need to have the encoding for files correctly set.

For JSPs

There are following four ways to specify encoding for JSP files.

Setting Location	Effective Area	Setting Value	J2EE Compliant	Priority	Setting Example
page-encoding element in web.xml	Within specified URL pattern	IANA name and URL pattern	YES	1	`<jsp-config> <jsp-property-group> <url-pattern>/euc/</url-pattern> <page-encoding>EUC-JP</page-encoding> </jsp-property-group> <jsp-property-group> <url-pattern>/utf8/</url-pattern> <page-encoding>UTF-8</page-encoding> </jsp-property-group> </jsp-config>`
pageEncoding attribute of Page directive	Each file	IANA name	YES	1 (Note 1)	`<%@ page pageEncoding="Windows-31J" %>`
contentType attribute of Page directive	Each file	MIME type with Charset attribute (IANA name)	YES	2	`<%@ page contentType="text/html; charset=EUC-JP" %>`
encoding element in weblogic.xml (not recommended)	Entire Web application	Java encoding name	NO	3	`<jsp-descriptor> <encoding>Windows-31J</encoding> </jsp-descriptor>`

Note 1: Due to JSP2.0 specification, if web.xml page-encoding element and pageEncoding attribute of page directive do not match, an error occurs at the time of translation. As the result, the priority of both are the same.

For JSP Documents

Since JSP Document is described in XML, how the encoding is specified for JSP Document file will be compliant to XML specification.

Setting Location	Effective Area	Setting Value	J2EE Compliant	Priority	Setting Example
encoding attribute in the XML declaration	Each file	IANA name	YES	1	`<?xml version='1.0' encoding='utf-8' ?>`

Due to JSP2.0 specification, when in JSP Document any page-encoding elements of web.xml or any file encoding by pageEncoding attributes for page directive is set and if any of these is not compliant to encoding attributes of XML declaration of JSP Document, an error occurs at the time of translation.

For Tag Files

There are following ways to specify encoding for Tag Files.

Setting Location	Effective Area	Setting Value	J2EE Compliant	Priority	Setting Example
pageEncoding attribute of tag directive	Each file	IANA name	YES	1	`<%@ tag pageEncoding="Windows-31J" %>`

For XML format Tag Files

There are following ways to specify encoding of XML format Tag files.

Setting Location	Effective Area	Setting Value	J2EE Compliant	Priority	Setting Example
encoding attribute in the XML declaration	Each file	IANA name	YES	1	`<?xml version='1.0' encoding='utf-8' ?>`

Due to JSP2.0 specification, when a file encoding setting by pageEncoding attributes of tag directive is applied to XML format Tag files, an error will occur at compiling.

Parse Method of JSP

Due to JSP2.0 specification, if the same element of page directive appears twice or more, and if these are different, then an error will occur at the time of translation. This happens, for example, when there are two or more contentType with different encodings specified in a single file.

Differences between Static and Dynamic for include tag, and Specifying Encoding in page tag

Static Include

The static include for JSP is described as follows:

<%@ include file="relativeURL" %>

In this case, all files to be included are read and formed into one file first, and then compiling of JSP is performed. Therefore when encoding settings in page directives are done both for JSP to include and JSP to be included, and if these are different, an error will occur at the time of translation as was earlier described in Parse Method of JSP.

Dynamic Include

The dynamic include for JSP is described as follows:

<jsp:include page="{ relativeURL | <%= expression %>}" flush="true" />

For jsp:include, the include operation will not happen when this page is loaded, and the tag will remain. The page will be included when the JSP is executed. Therefore, the encoding set in the JSP that does the including will not apply to the included file(s). Hence, you must also specify the encoding in the included file.

Mapping Change for Java Encoding and IANA Character Set Involving HTTP Responses (Not J2EE-Compliant)

When you specify the encoding using the setContentType() method or the contentType directive in the page tag, use an IANA character set name. However, when you handle the encodings in WebLogic Server, which is a Java application, these values must be Java encoding names. WebLogic Server has the default mappings internally and normally uses them. The default mappings also include mappings which are not defined in IANA, but are conventionally used in the Content-Type for HTML (see Predefined MIME-Java Encoding Mapping Table in WebLogic Server).

Example: x-sjis ----> Shift_JIS

You can change this mapping. This can be set in weblogic.xml as follows:

For example, 'Shift_JIS' setting in the contentType is handled as SJIS in WebLogic Server, because the IANA character set 'Shift_JIS' is mapped to the Java encoding 'Shift_JIS' (Shift_JIS is used as an alias for SJIS in JDK1.4).

Note: In Java1.3, Shift_JIS of IANA character set used to be handled as MS932 (in JDK1.1.8 and thereafter, up to JDK1.4.0; Shift_JIS was changed back to SJIS in JDK1.4.1 and thereafter.)

Consequently, MS932-specific characters cannot be used when Shift_JIS is used with the default setting.

To allocate other encoding than default mapping, you need to overwrite the default mapping as follows: specify following in <charset-mapping> in weblogic.xml.

In the example below, Shift_JIS is mapped to MS932.

<charset-params> <charset-mapping> <iana-charset-name>Shift_JIS</iana-charset-name> <java-charset-name>MS932</java-charset-name> </charset-mapping> </charset-params>

Note: This setting is only valid for HTTP response.For example, this is not effective for file encoding (page encoding) such as JSP.

CGIServlet

When migrating a CGI service which uses multibyte characters to a CGI servlet on the WebLogic Server, you must specify the appropriate ContentType charset parameter in the HTTP header generated by the CGI program. If the ContentType is not set, ISO-8859-1 is used, this being the default encoding for the J2EE Servlet container. You must also use the input-charset parameter in weblogic.xml in order to receive input strings from a client correctly. You need to write it in the DD file of the target Web application. If it does not exist, ISO-8859-1 is used.

Specifying Input Encoding for Form-Based Authentication

To specify input encoding for Form-Based authentication inside the form, specify the encoding name to be used to j_character_encoding. Note that this function is proprietary to WebLogic Server.

< form method="POST" action="j_security_check" > Username: <input type="text" name="j_username"> Password: <input type="password" name="j_password"> <input type="hidden" name="j_character_encoding" value="Shift_JIS"> <input type="submit" value="Login"> <input type="reset" value="Reset"> </form>

Use of Multibyte Character for Request URL

Default Behavior when Decoding URL

If the following type of HTTP request is received,

http://myHostName:port/myContextPath/myRequest/?myRequestParameter

and nothing is set, WebLogic Server handles myContexPath portion and myRequest portion as follows:

Performs URL decoding on myContextPath and myRequest portions

Decodes the byte stream obtained in 1 into a UTF-8 character string

For example, if the User Agent (web browser) is MS IE (Microsoft Internet Explorer), by default, the multibyte characters entered in the address bar are first encoded to UTF-8, and it is then URL encoded. WebLogic Server, with default settings, correctly creates the URL string from this UTF-8 encoded URL.

Note: In IE's Internet Options - Advanced, there is an option called Always send URLs as UTF-8 (requires restart), and this option must be ON (checked).

Remember that myRequestParameter portion is decoded in line with Specifying the Encoding for a Request. For myHostName portion, IESG is standardizing it as an international domain name.

In the case where proprietary User Agent is used and multibyte is necessary in the request URL, first make the character string a UTF-8 byte string, then URL encode and send it to the WebLogic Server. It is recommended by the W3C that the URL be encoded with UTF-8 base when creating the URI. (http://www.w3.org/TR/charmod/#sec-URIs)

Method for Specifying Character Encoding when Decoding URLs

Some User Agents do not perform URL encoding of request URL into UTF-8. With Netscape browser, the characters in address bar are encoded first by the character set of the environment where Netscape browser operates, and then the character string is URL encoded to be sent to WebLogic Server. For example, the Netscape browser on Japanese Windows will URL encode the request URL into Windows-31J. To handle this situation, the setting must be made so that the byte stream that is URL decoded in WebLogic Server is converted to String in Windows-31J. Through the following WebLogic Server startup option, encoding which is used for URL decoding can be changed.

-Dweblogic.http.URIDecodeEncoding=Windows-31J

Note that only one of such setting is allowed for one server instance.

Web services

Using Multibyte Characters in JMS Transfer

To send a message containing multibyte characters in JMS transfer, the message must be sent as BytesMessage.
To do this, obtain a port and set the message type to BytesMessage using any of the following methods.

((Stub)port)._setProperty(WLStub.JMS_TRANSPORT_MESSAGE_TYPE, WLStub.JMS_BYTESMESSAGE)
weblogic.wsee.util.JmsUtil.setJmsTransportBytesMessage((Stub)port)

No special settings are required on the receiver because it automatically selects the same message type as that of the sent message.

Receiving SOAP messages

Web services of WebLogic Server 9.0 implement Enterprise Web Services 1.1 specification (JSR-921). In JSR-921, SOAP1.1 is adopted. HTTP/SOAP messages based on the SOAP1.1 specification have text/xml media type, and the encoding for these messages is handled according to RFC2376. Hence the encoding operations of receiving SOAP messages in Web services of WebLogic Server 9.0 are as follows:

SOAP1.1:

ContentType charset parameter in the HTTP header is used to determine the character set for HTTP/SOAP request.
encoding attribute in the XML declaration of a SOAP message is ignored.
If charset parameter is not specified with ContentType, the message is handled as US-ASCII.

Make sure that the ContentType charset is specified correctly for the client which calls the developed Web service(s) using HTTP/SOAP.

Sending SOAP messages

For WebLogic Server, HTTP/SOAP messages are generated with UTF-8. In this process, UTF-8 is added as charset attribute of ContentType header for the SOAP message.

UDDI Explorer

UDDI explorer only supports us-ascii characters. Multibyte characters cannot be processed correctly.

XMLs

Multibyte Handling in Streaming API for XML (StAX)

Use the ElementFactory class' createStartDocument() as shown below in order to add encoding information to the XML header generated using the Streaming API for XML (StAX).

XMLOutputStreamFactory factory = XMLOutputStreamFactory.newInstance(); XMLOutputStream output = factory.newOutputStream(new OutputStreamWriter(new FileOutputStream(fname),"Shift_JIS")); output.add(ElementFactory.createStartDocument("Shift_JIS","1.0")); output.flush();

The followings are notes on parsing an XML document containing multibyte characters using StAX. The main points are the same as in the notes on using the xerces parser.

Use a byte stream when using a stream in the parser input. By using a byte stream, the parser's automatic XML encoding recognition function can be used. This allows the parser to generate character stream internally by the encoding specified by the encoding attributes in XML declaration, and correctly parse it.
When a Unicode character stream is passed, the parser ignores the encoding specified in xml header. In this case, take caution on the user side to pass the correct character stream.

JDBC

BEA WebLogic Type4 Oracle Driver (To be noted only when Japanese language is used)

In the Case of Using codePageOverride Property

Oracle database has a map between Unicode and code point on the database, for each character set. This map is used when characters are stored in a database or retrieved from the database. For example, when using Oracle Thin driver, the Oracle database server side will use the map to perform the conversion between Unicode and code point on the database.

In the WebLogic Type4 Driver for Oracle, a property called codePageOverride is provided to perform this conversion using JDK converter map. Possible values for codePageOverride property and the behaviors are in the following table:

Value	Destination database to be assured	Operation
SJIS	Character set is JA16SJIS, JA16SJISTILDE or JA16SJISYEN	Assures the conversion by the map that matches the converter for SJIS of JDK, among all the maps that can be used for the character set of the destination database. It does not assure the conversion when the map does not match.
MS932	Character set is JA16SJIS, JA16SJISTILDE or JA16SJISYEN	Assures the conversion by the map that matches the converter for MS932 of JDK, among all the maps that can be used for the character set of the destination database. It does not assure the conversion when the map does not match.
UTF8	All databases	The driver will use UTF-8 for character encoding when communicating with database. Consequently, the handling of characters to be stored in database will be the same as Oracle Thin driver.

The difference between the case of specifying codePageOverride=SJIS and codePageOverride=MS932 will appear directly as the difference between MS932 converter and SJIS converter. For example, it affects the handling of such symbols as ～ (Wave Dash) and ￠ (Cent Sign), that are mapped differently in Unicode. Appropriate settings to meet the requirements of each system to build is recommended. See Countermeasure for Garbled Characters Caused by Unicode Definition and Java Converter (To be noted only when Japanese language is used), etc.

Note: codePageOverride=UTF8 can be used from WebLogic Server 9.1 or later.

In the Case of Omitting codePageOverride Property

In WebLogic Server 9.0 or thereafter, when codePageOverride property is omitted, the handling of the characters to be stored in database is the same as Oracle Thin Driver provided that the character set of destination database is any one of JA16SJIS, JA16SJISTILDE or JA16SJISYEN. See About codePageOverride Property of BEA WebLogic Type4 JDBC Driver for Oracle for the changed contents and some notes on version upgrade from the earlier versions.

Notes on Migration from jDriver for Oracle

If you are using jDriver for Oracle, for a database with JA16SJIS character set, and if you encounter garbled ～ (Wave Dash) after migrating to WebLogic Type4 Oracle Driver, you will be able to solve this problem by changing the database to JA16SJISTILDE or by specifying codePageOverride=MS932.

Miscellaneous

Countermeasure for Garbled Characters Caused by Unicode Definition and Java Converter (To be noted only when Japanese language is used)

If you do not use the same converter for the conversions from the platform native encoding to the Unicode, and from the Unicode to the platform native encoding, the characters may be handled incorrectly. Here some explanation with examples is given for the case where the use of MS932 converter and SJIS converter against the same characters give the different mapping on Unicode.

Assume for example the application below where the data stored in a database is displayed by JSP deployed in WebLogic Server.

Database -------------> WebLogic Server -------------> Web browser
(Native) MS932 (Unicode) SJIS (Native)

It is a rather simple application, but as shown in Encoding Conversion, via WebLogic Server, the conversion between platform native encoding and Unicode encoding is performed at least twice. In this example, MS932 converter is used between database and WebLogic Server, and SJIS converter is used between WebLogic Server and Web browser. In this case, following codes cannot be handled correctly, giving some problems such as garbled characters.

SJIS Code
"～" (0x8160)
"∥" (0x8161)
"￠" (0x817C)
"－" (0x8191)
"￡" (0x8192)
"￢" (0x81CA)

In order to avoid garbled characters, you need to change encoding conversion between WebLogic Server and Web browser or between database and WebLogic Server, to harmonize the two conversions.

Changing Encoding Conversion Between WebLogic Server and Web Browser

When for example JSP page tag specifies:

<%@ page contentType="text/html; charset=Shift_JIS" %>

(Shift_JIS here is IANA name) there are following two ways for using MS932 converter between WebLogic Server and Web browser.

a) Rewrite page tag specification from Shift_JIS (IANA name) to Windows-31J (IANA name).

b) Specify the following definition in weblogic.xml and change the default encoding mapping table that WebLogic Server has internally: from Shift_JIS (IANA name) -> SJIS (Java converter name) to Shift_JIS (IANA name) -> MS932 (Java converter name).

<charset-params> <charset-mapping> <iana-charset-name>Shift_JIS</iana-charset-name> <java-charset-name>MS932</java-charset-name> </charset-mapping> </charset-params>

For the method b), see Mapping Change for Java Encoding and IANA Character Set Involving HTTP Responses (Not J2EE-Compliant). This is useful when method a) is not applicable, in such case as too much modification volume.

Changing Encoding Conversion Between Database and WebLogic Server

When using BEA WebLogic Type4 Oracle Driver, you can change the encoding conversion between database and WebLogic Server by using codePageOverride property.

In the case of Using iMode characters (To be noted only when Japanese language is used)

Java's MS932 encoding table supports conversion of external characters (gaiji). By using MS932, you can provide content using iMode external characters.

Specifying Encoding for WTC TUXEDO Domain

Domain encoding for wtc can be specified for TUXEDO domains. Specify the following parameters at time of startup. The start scripts of WebLogic Server (such as StartWebLogic.cmd file) need to be changed.

-Dweblogic.wtc.encoding=Java encoding name

The encoding specified by this is effective for the entire TUXEDO domain.

Predefined MIME-Java Encoding Mapping Table in WebLogic Server

IANA-to-Java Mapping

US-ASCII ANSI_X3.4-1968
US-ASCII	US-ASCII
ISO-IR-6	US-ASCII
ANSI_X3.4-1986	US-ASCII
ANSI_X3.4-1968	US-ASCII
ISO_646.IRV:1991	US-ASCII
ASCII	US-ASCII
ISO646-US	US-ASCII
US	US-ASCII
IBM367	US-ASCII
CP367	US-ASCII
CSASCII	US-ASCII
IBM-367	US-ASCII

Latin1
ISO-8859-1	ISO-8859-1
ISO-IR-100	ISO-8859-1
ISO_8859-1	ISO-8859-1
LATIN1	ISO-8859-1
L1	ISO-8859-1
IBM819	ISO-8859-1
CP819	ISO-8859-1
CSISOLATIN1	ISO-8859-1
IBM-819	ISO-8859-1

Latin3
ISO-8859-3	ISO-8859-3
ISO-IR-109	ISO-8859-3
ISO_8859-3	ISO-8859-3
LATIN3	ISO-8859-3
L3	ISO-8859-3
CSISOLATIN3	ISO-8859-3

Latin4
ISO-8859-4	ISO-8859-4
ISO-IR-110	ISO-8859-4
ISO_8859-4	ISO-8859-4
LATIN4	ISO-8859-4
L4	ISO-8859-4
CSISOLATIN4	ISO-8859-4

Cyrillic
ISO-8859-5	ISO-8859-5
ISO-IR-144	ISO-8859-5
ISO_8859-5	ISO-8859-5
CYRILLIC	ISO-8859-5
CSISOLATINCYRILLIC	ISO-8859-5

Arabic
ISO-8859-6	ISO-8859-6
ISO-IR-127	ISO-8859-6
ISO_8859-6	ISO-8859-6
ECMA-114	ISO-8859-6
ASMO-708	ISO-8859-6
ARABIC	ISO-8859-6
CSISOLATINARABIC	ISO-8859-6

Greek
ISO-8859-7	ISO-8859-7
ISO-IR-126	ISO-8859-7
ISO_8859-7	ISO-8859-7
ELOT_928	ISO-8859-7
ECMA-118	ISO-8859-7
GREEK	ISO-8859-7
GREEK8	ISO-8859-7
CSISOLATINGREEK	ISO-8859-7

Hebrew
ISO-8859-8	ISO-8859-8
ISO-IR-138	ISO-8859-8
ISO_8859-8	ISO-8859-8
HEBREW	ISO-8859-8
CSISOLATINHEBREW	ISO-8859-8
ISO-8859-8-I	ISO-8859-8
ISO_8859-8-I	ISO-8859-8
CSISO88598I	ISO-8859-8

Latin5
ISO-8859-9	ISO-8859-9
ISO-IR-148	ISO-8859-9
ISO_8859-9	ISO-8859-9
LATIN5	ISO-8859-9
L5	ISO-8859-9
CSISOLATIN5	ISO-8859-9

MIBenum: 109
ISO-8859-13	ISO-8859-13

Latin9
SO-8859-15	ISO-8859-15
ISO_8859-15	ISO-8859-15
LATIN-9	ISO-8859-15

Simplified Chinese
GB2312	GB2312
CSGB2312	GB2312
GB18030	GB18030
ISO-2022-CN	ISO2022CN

Chinese for Taiwan
BIG5	Big5
CSBIG5	Big5

MIBenum 2101
BIG5-HKSCS	Big5-HKSCS

Korean
EUC-KR	EUC-KR
CSEUCKR	EUC-KR
ISO-2022-KR	ISO-2022-KR
CSISO2022KR	ISO-2022-KR

Japanese
SHIFT_JIS	Shift_JIS
SHIFT-JIS	Shift_JIS
CSSHIFTJIS	Shift_JIS
MS_KANJI	Shift_JIS
X-SJIS	Shift_JIS
SJIS	Shift_JIS
WINDOWS-31J	Windows-31J
CSWINDOWS31J	Windows-31J
EUC-JP	EUC-JP
CSEUCPKDFMTJAPANESE	EUC-JP
EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE	EUC-JP
ISO-2022-JP	ISO-2022-JP
CSISO2022JP	ISO-2022-JP
X0201	JIS0201
JIS_X0201	JIS0201
CSHALFWIDTHKATAKANA	JIS0201
X0208	JIS0208
JIS_C6226-1983	JIS0208
ISO-IR-87	JIS0208
JIS_X0208-1983	JIS0208
CSISO87JISX0208	JIS0208
X0212	JIS0212
JIS_X0212-1990	JIS0212
ISO-IR-159	JIS0212
CSISO159JISX02121990	JIS0212

Russian
KOI8-R	KOI8-R
CSKOI8R	KOI8-R

Thai
TIS-620	TIS-620

Traditional Chinese
CNS11643	EUC-TW
EUC-TW	EUC-TW
EUCTW	EUC-TW

MIBenum: 106
UTF-8	UTF-8
UTF8	UTF-8

Unicode
UTF-16	UTF-16
UTF-16BE	UTF-16BE
UTF-16LE	UTF-16LE

MIBenum: 2250 - 2258
WINDOWS-1250	Cp1250
WINDOWS-1251	Cp1251
WINDOWS-1252	Cp1252
WINDOWS-1253	Cp1253
WINDOWS-1254	Cp1254
WINDOWS-1255	Cp1255
WINDOWS-1256	Cp1256
WINDOWS-1257	Cp1257
WINDOWS-1258	Cp1258

EBCDIC
IBM037	Cp037
CP037	Cp037
CSIBM037	Cp037
EBCDIC-CP-US	Cp037
EBCDIC-CP-CA	Cp037
EBCDIC-CP-NL	Cp037
EBCDIC-CP-WT	Cp037
IBM273	Cp273
CP273	Cp273
CSIBM273	Cp273
IBM277	Cp277
CP277	Cp277
CSIBM277	Cp277
EBCDIC-CP-DK	Cp277
EBCDIC-CP-NO	Cp277
IBM278	Cp278
CP278	Cp278
CSIBM278	Cp278
EBCDIC-CP-FI	Cp278
EBCDIC-CP-SE	Cp278
IBM280	Cp280
CP280	Cp280
CSIBM280	Cp280
EBCDIC-CP-IT	Cp280
IBM284	Cp284
CP284	Cp284
CSIBM284	Cp284
EBCDIC-CP-ES	Cp284
EBCDIC-CP-GB	Cp285
IBM285	Cp285
CP285	Cp285
CSIBM285	Cp285
EBCDIC-JP-KANA	Cp290
IBM290	Cp290
CP290	Cp290
CSIBM290	Cp290
EBCDIC-CP-FR	Cp297
IBM297	Cp297
CP297	Cp297
CSIBM297	Cp297
EBCDIC-CP-AR1	Cp420
IBM420	Cp420
CP420	Cp420
CSIBM420	Cp420
EBCDIC-CP-HE	Cp424
IBM424	Cp424
CP424	Cp424
CSIBM424	Cp424
IBM437	Cp437
437	Cp437
CP437	Cp437
CSPC8CODEPAGE437	Cp437
EBCDIC-CP-CH	Cp500
IBM500	Cp500
CP500	Cp500
CSIBM500	Cp500
EBCDIC-CP-CH	Cp500
EBCDIC-CP-BE	Cp500
IBM775	Cp775
CP775	Cp775
CSPC775BALTIC	Cp775
IBM-THAI	Cp838
CSIBMTHAI	Cp838
IBM850	Cp850
850	Cp850
CP850	Cp850
CSPC850MULTILINGUAL	Cp850
IBM852	Cp852
852	Cp852
CP852	Cp852
CSPCP852	Cp852
IBM855	Cp855
855	Cp855
CP855	Cp855
CSIBM855	Cp855
IBM857	Cp857
857	Cp857
CP857	Cp857
CSIBM857	Cp857
IBM00858	Cp858
CP00858	Cp858
CCSID00858	Cp858
IBM860	Cp860
860	Cp860
CP860	Cp860
CSIBM860	Cp860
IBM861	Cp861
861	Cp861
CP861	Cp861
CP-IS	Cp861
CSIBM861	Cp861
IBM862	Cp862
862	Cp862
CP862	Cp862
CSPC862LATINHEBREW	Cp862
IBM863	Cp863
863	Cp863
CP863	Cp863
CSIBM863	Cp863
IBM864	Cp864
CP864	Cp864
CSIBM864	Cp864
IBM865	Cp865
865	Cp865
CP865	Cp865
CSIBM865	Cp865
IBM866	Cp866
866	Cp866
CP866	Cp866
CSIBM866	Cp866
IBM868	Cp868
CP868	Cp868
CSIBM868	Cp868
CP-AR	Cp868
IBM869	Cp869
CP869	Cp869
CSIBM869	Cp869
CP-GR	Cp869
IBM870	Cp870
CP870	Cp870
CSIBM870	Cp870
EBCDIC-CP-ROECE	Cp870
EBCDIC-CP-YU	Cp870
IBM871	Cp871
CP871	Cp871
CSIBM871	Cp871
EBCDIC-CP-IS	Cp871
IBM918	Cp918
CP918	Cp918
CSIBM918	Cp918
EBCDIC-CP-AR2	Cp918
IBM00924	Cp924
CP00924	Cp924
CCSID00924	Cp924
EBCDIC-LATIN9--EURO	Cp924
IBM1026	Cp1026
CP1026	Cp1026
CSIBM1026	Cp1026
IBM01140	Cp1140
CP01140	Cp1140
CCSID01140	Cp1140
IBM01141	Cp1141
CP01141	Cp1141
CCSID01141	Cp1141
IBM01142	Cp1142
CP01142	Cp1142
CCSID01142	Cp1142
IBM01143	Cp1143
CP01143	Cp1143
CCSID01143	Cp1143
IBM01144	Cp1144
CP01144	Cp1144
CCSID01144	Cp1144
IBM01145	Cp1145
CP01145	Cp1145
CCSID01145	Cp1145
IBM01146	Cp1146
CP01146	Cp1146
CCSID01146	Cp1146
IBM01147	Cp1147
CP01147	Cp1147
CCSID01147	Cp1147
IBM01148	Cp1148
CP01148	Cp1148
CCSID01148	Cp1148
IBM01149	Cp1149
CP01149	Cp1149
CCSID01149	Cp1149
MIBenum: 2028 - 2063	2091 - 2100
IBM-1047	Cp1047
IBM1047	Cp1047
CP1047	Cp1047
IBM-37	Cp037
IBM-273	Cp273
IBM-277	Cp277
IBM-278	Cp278
IBM-280	Cp280
IBM-284	Cp284
IBM-285	Cp285
IBM-297	Cp297
IBM-420	Cp420
IBM-424	Cp424
IBM-437	Cp437
IBM-500	Cp500
IBM-775	Cp775
IBM-850	Cp850
IBM-852	Cp852
IBM-855	Cp855
IBM-857	Cp857
IBM-858	Cp858
IBM-860	Cp860
IBM-861	Cp861
IBM-862	Cp862
IBM-863	Cp863
IBM-864	Cp864
IBM-865	Cp865
IBM-866	Cp866
IBM-868	Cp868
IBM-869	Cp869
IBM-870	Cp870
IBM-871	Cp871
IBM-918	Cp918
IBM-924	Cp924
IBM-1026	Cp1026
IBM-1140	Cp1140
IBM-1141	Cp1141
IBM-1142	Cp1142
IBM-1143	Cp1143
IBM-1144	Cp1144
IBM-1145	Cp1145
IBM-1146	Cp1146
IBM-1147	Cp1147
IBM-1148	Cp1148
IBM-1149	Cp1149

Java-to-IANA Mapping

ASCII and its aliases
ASCII	US-ASCII
US-ASCII	US-ASCII
646	US-ASCII
ISO_646.IRV:1983	US-ASCII
ANSI_X3.4-1968	US-ASCII
ISO646-US	US-ASCII
DEFAULT	US-ASCII
ASCII7	US-ASCII

ISO8859_1 and its aliases
ISO8859_1	ISO-8859-1
8859_1	ISO-8859-1
ISO_8859-1:1987	ISO-8859-1
ISO-IR-100	ISO-8859-1
ISO_8859-1	ISO-8859-1
ISO-8859-1	ISO-8859-1
ISO8859-1	ISO-8859-1
LATIN1	ISO-8859-1
L1	ISO-8859-1
IBM819	ISO-8859-1
IBM-819	ISO-8859-1
CP819	ISO-8859-1
819	ISO-8859-1
CSISOLATIN1	ISO-8859-1

ISO8859_2 and its aliases
ISO8859_2	ISO-8859-2
8859_2	ISO-8859-2
ISO_8859-2:1987	ISO-8859-2
ISO-IR-101	ISO-8859-2
ISO_8859-2	ISO-8859-2
ISO-8859-2	ISO-8859-2
ISO8859-2	ISO-8859-2
LATIN2	ISO-8859-2
L2	ISO-8859-2
IBM912	ISO-8859-2
IBM-912	ISO-8859-2
CP912	ISO-8859-2
912	ISO-8859-2
CSISOLATIN2	ISO-8859-2

ISO8859_3 and its aliases
ISO8859_3	ISO-8859-3
8859_3	ISO-8859-3
ISO_8859-3:1988	ISO-8859-3
ISO-IR-109	ISO-8859-3
ISO_8859-3	ISO-8859-3
ISO-8859-3	ISO-8859-3
ISO8859-3	ISO-8859-3
LATIN3	ISO-8859-3
L3	ISO-8859-3
IBM913	ISO-8859-3
IBM-913	ISO-8859-3
CP913	ISO-8859-3
913	ISO-8859-3
CSISOLATIN3	ISO-8859-3

ISO8859_4 and its aliases
ISO8859_4	ISO-8859-4
8859_4	ISO-8859-4
ISO_8859-4:1988	ISO-8859-4
ISO-IR-110	ISO-8859-4
ISO_8859-4	ISO-8859-4
ISO-8859-4	ISO-8859-4
ISO8859-4	ISO-8859-4
LATIN4	ISO-8859-4
L4	ISO-8859-4
IBM914	ISO-8859-4
IBM-914	ISO-8859-4
CP914	ISO-8859-4
914	ISO-8859-4
CSISOLATIN4	ISO-8859-4

ISO8859_5 and its aliases
ISO8859_5	ISO-8859-5
8859_5	ISO-8859-5
ISO_8859-5:1988	ISO-8859-5
ISO-IR-144	ISO-8859-5
ISO_8859-5	ISO-8859-5
ISO-8859-5	ISO-8859-5
ISO8859-5	ISO-8859-5
CYRILLIC	ISO-8859-5
CSISOLATINCYRILLIC	ISO-8859-5
IBM915	ISO-8859-5
IBM-915	ISO-8859-5
CP915	ISO-8859-5
915	ISO-8859-5

ISO8859_6 and its aliases
ISO8859_6	ISO-8859-6
8859_6	ISO-8859-6
ISO_8859-6:1987	ISO-8859-6
ISO-IR-127	ISO-8859-6
ISO_8859-6	ISO-8859-6
ISO-8859-6	ISO-8859-6
ISO8859-6	ISO-8859-6
ECMA-114	ISO-8859-6
ASMO-708	ISO-8859-6
ARABIC	ISO-8859-6
CSISOLATINARABIC	ISO-8859-6
IBM1089	ISO-8859-6
IBM-1089	ISO-8859-6
CP1089	ISO-8859-6
1089	ISO-8859-6

ISO8859_7 and its aliases
ISO8859_7	ISO-8859-7
8859_7	ISO-8859-7
ISO_8859-7:1987	ISO-8859-7
ISO-IR-126	ISO-8859-7
ISO_8859-7	ISO-8859-7
ISO-8859-7	ISO-8859-7
ISO8859-7	ISO-8859-7
ELOT_928	ISO-8859-7
ECMA-118	ISO-8859-7
GREEK	ISO-8859-7
GREEK8	ISO-8859-7
CSISOLATINGREEK	ISO-8859-7
IBM813	ISO-8859-7
IBM-813	ISO-8859-7
CP813	ISO-8859-7
813	ISO-8859-7

ISO8859_8 and its aliases
ISO8859_8	ISO-8859-8
8859_8	ISO-8859-8
ISO_8859-8:1988	ISO-8859-8
ISO-IR-138	ISO-8859-8
ISO_8859-8	ISO-8859-8
ISO-8859-8	ISO-8859-8
ISO8859-8	ISO-8859-8
HEBREW	ISO-8859-8
CSISOLATINHEBREW	ISO-8859-8
IBM916	ISO-8859-8
IBM-916	ISO-8859-8
CP916	ISO-8859-8
916	ISO-8859-8

ISO8859_9 and its aliases
ISO8859_9	ISO-8859-9
8859_9	ISO-8859-9
ISO-IR-148	ISO-8859-9
ISO_8859-9	ISO-8859-9
ISO-8859-9	ISO-8859-9
ISO8859-9	ISO-8859-9
LATIN5	ISO-8859-9
L5	ISO-8859-9
IBM920	ISO-8859-9
IBM-920	ISO-8859-9
CP920	ISO-8859-9
920	ISO-8859-9
CSISOLATIN5	ISO-8859-9

ISO8859_13 and its aliases
ISO8859_13	ISO-8859-13
8859_13	ISO-8859-13
ISO_8859-13	ISO-8859-13
ISO-8859-13	ISO-8859-13
ISO8859-13	ISO-8859-13

ISO8859_15 and its aliases
ISO8859_15	ISO-8859-15
8859_15	ISO-8859-15
ISO-8859-15	ISO-8859-15
ISO_8859-15	ISO-8859-15
ISO8859-15	ISO-8859-15
IBM923	ISO-8859-15
IBM-923	ISO-8859-15
CP923	ISO-8859-15
923	ISO-8859-15
LATIN0	ISO-8859-15
LATIN9	ISO-8859-15
CSISOLATIN0	ISO-8859-15
CSISOLATIN9	ISO-8859-15
ISO8859_15_FDIS	ISO-8859-15

Simplified Chinese
EUC_CN	GB2312
GB2312	GB2312
GB2312-80	GB2312
GB2312-1980	GB2312
EUC-CN	GB2312
EUCCN	GB2312
ISO2022CN	ISO-2022-CN
GB18030	GB18030

Chinese for Taiwan
BIG5	Big5

Big5_HKSCS
BIG5_HKSCS	Big5-HKSCS
BIG5-HKSCS	Big5-HKSCS
BIG5HK	Big5-HKSCS
BIG5-HKSCS:UNICODE3.0	Big5-HKSCS

Korean
KSC5601	EUC-KR
EUC_KR	EUC-KR
EUC-KR	EUC-KR
EUCKR	EUC-KR
KS_C_5601-1987	EUC-KR
KSC5601-1987	EUC-KR
KSC5601_1987	EUC-KR
KSC_5601	EUC-KR
5601	EUC-KR
ISO2022KR	ISO-2022-KR
ISO-2022-KR	ISO-2022-KR
CSISO2022KR	ISO-2022-KR

Japanese
SJIS	Shift_JIS
SHIFT_JIS	Shift_JIS
SHIFT-JIS	Shift_JIS
CSSHIFTJIS	Shift_JIS
X-SJIS	Shift_JIS
MS_KANJI	Shift_JIS
PCK	Shift_JIS
MS932	Windows-31J
WINDOWS-31J	Windows-31J
CSWINDOWS31J	Windows-31J
EUC_JP	EUC-JP
EUC-JP	EUC-JP
EUCJIS	EUC-JP
EUCJP	EUC-JP
CSEUCPKDFMTJAPANESE	EUC-JP
EXTENDED_UNIX_CODE_PACKED_FORMAT_FOR_JAPANESE	EUC-JP
X-EUC-JP	EUC-JP
X-EUCJP	EUC-JP
ISO2022JP	ISO-2022-JP
JIS	ISO-2022-JP
ISO-2022-JP	ISO-2022-JP
CSISO2022JP	ISO-2022-JP
JIS_ENCODING	ISO-2022-JP
CSJISENCODING	ISO-2022-JP
JIS0201	X0201
JIS0208	X0208
JIS0212	ISO-IR-159

Russian
KOI8_R	KOI8-R
KOI8-R	KOI8-R
KOI8	KOI8-R
CSKOI8R	KOI8-R

Thai
TIS620	TIS-620
TIS620.2533	TIS-620
TIS-620	TIS-620

Traditional Chinese
EUC_TW	CNS11643
CNS11643	CNS11643
EUC-TW	CNS11643
EUCTW	CNS11643

UTF8
UTF8	UTF-8
UTF-8	UTF-8
UNICODE-1-1-UTF-8	UTF-8

Unicode
UTF16	UTF-16
UTF-16	UTF-16
UNICODE	UTF-16
UTF-16BE	UTF-16BE
UNICODEBIG	UTF-16BE
UTF-16LE	UTF-16LE
UNICODELITTLE	UTF-16LE

MIBenum: 2250 - 2258
CP1250	windows-1250
CP1251	windows-1251
CP1252	windows-1252
CP1253	windows-1253
CP1254	windows-1254
CP1255	windows-1255
CP1256	windows-1256
CP1257	windows-1257
CP1258	windows-1258

EBCDIC
CP037	EBCDIC-CP-US
IBM037	EBCDIC-CP-US
IBM-037	EBCDIC-CP-US
037	EBCDIC-CP-US
CP273	IBM273
IBM273	IBM273
IBM-273	IBM273
273	IBM273
CP277	EBCDIC-CP-DK
IBM277	EBCDIC-CP-DK
IBM-277	EBCDIC-CP-DK
277	EBCDIC-CP-DK
CP278	EBCDIC-CP-FI
IBM278	EBCDIC-CP-FI
IBM-278	EBCDIC-CP-FI
278	EBCDIC-CP-FI
CP280	EBCDIC-CP-IT
IBM280	EBCDIC-CP-IT
IBM-280	EBCDIC-CP-IT
280	EBCDIC-CP-IT
CP284	EBCDIC-CP-ES
IBM284	EBCDIC-CP-ES
IBM-284	EBCDIC-CP-ES
CP284	EBCDIC-CP-ES
284	EBCDIC-CP-ES
CP285	EBCDIC-CP-GB
IBM285	EBCDIC-CP-GB
IBM-285	EBCDIC-CP-GB
285	EBCDIC-CP-GB
CP290	EBCDIC-JP-KANA
CP297	EBCDIC-CP-FR
IBM297	EBCDIC-CP-FR
IBM-297	EBCDIC-CP-FR
297	EBCDIC-CP-FR
CP420	EBCDIC-CP-AR1
IBM420	EBCDIC-CP-AR1
IBM-420	EBCDIC-CP-AR1
420	EBCDIC-CP-AR1
CP424	EBCDIC-CP-HE
IBM424	EBCDIC-CP-HE
IBM-424	EBCDIC-CP-HE
424	EBCDIC-CP-HE
CP437	IBM437
IBM437	IBM437
IBM-437	IBM437
437	IBM437
CSPC8CODEPAGE437	IBM437
CP500	EBCDIC-CP-CH
IBM500	EBCDIC-CP-CH
IBM-500	EBCDIC-CP-CH
500	EBCDIC-CP-CH
CP775	IBM775
IBM775	IBM775
IBM-775	IBM775
775	IBM775
CP838	IBM-Thai
IBM838	IBM-Thai
IBM-838	IBM-Thai
838	IBM-Thai
CP850	IBM850
IBM850	IBM850
IBM-850	IBM850
850	IBM850
CSPC850MULTILINGUAL	IBM850
CP852	IBM852
IBM852	IBM852
IBM-852	IBM852
852	IBM852
CSPCP852	IBM852
CP855	IBM855
IBM855	IBM855
IBM-855	IBM855
855	IBM855
CSPCP855	IBM855
CP857	IBM857
IBM857	IBM857
IBM-857	IBM857
857	IBM857
CSIBM857	IBM857
CP858	IBM00858
CP860	IBM860
IBM860	IBM860
IBM-860	IBM860
860	IBM860
CSIBM860	IBM860
CP861	IBM861
IBM861	IBM861
IBM-861	IBM861
CP-IS	IBM861
861	IBM861
CSIBM861	IBM861
CP862	IBM862
IBM862	IBM862
IBM-862	IBM862
862	IBM862
CSPC862LATINHEBREW	IBM862
CP863	IBM863
IBM863	IBM863
IBM-863	IBM863
863	IBM863
CSIBM863	IBM863
CP864	IBM864
IBM864	IBM864
IBM-864	IBM864
CSIBM864	IBM864
CP865	IBM865
IBM865	IBM865
IBM-865	IBM865
865	IBM865
CSIBM865	IBM865
CP866	IBM866
IBM866	IBM866
IBM-866	IBM866
866	IBM866
CSIBM866	IBM866
CP868	IBM868
IBM868	IBM868
IBM-868	IBM868
868	IBM868
CP869	IBM869
IBM869	IBM869
IBM-869	IBM869
869	IBM869
CP-GR	IBM869
CSIBM869	IBM869
CP870	EBCDIC-CP-ROECE
IBM870	EBCDIC-CP-ROECE
IBM-870	EBCDIC-CP-ROECE
870	EBCDIC-CP-ROECE
CP871	EBCDIC-CP-IS
IBM871	EBCDIC-CP-IS
IBM-871	EBCDIC-CP-IS
871	EBCDIC-CP-IS
CP918	EBCDIC-CP-AR2
IBM918	EBCDIC-CP-AR2
IBM-918	EBCDIC-CP-AR2
918	EBCDIC-CP-AR2
CP924	IBM00924
CP1026	IBM1026
IBM1026	IBM1026
IBM-1026	IBM1026
1026	IBM1026
CP1140	IBM01140
CP1141	IBM01141
CP1142	IBM01142
CP1143	IBM01143
CP1144	IBM01144
CP1145	IBM01145
CP1146	IBM01146
CP1147	IBM01147
CP1148	IBM01148
CP1149	IBM01149
CP1047	IBM1047

Locale-to-IANA Mapping

ar	ISO-8859-6
be	ISO-8859-5
bg	ISO-8859-5
ca	ISO-8859-1
cs	ISO-8859-2
da	ISO-8859-1
de	ISO-8859-1
el	ISO-8859-7
en	ISO-8859-1
es	ISO-8859-1
et	ISO-8859-1
fi	ISO-8859-1
fr	ISO-8859-1
hr	ISO-8859-2
hu	ISO-8859-2
is	ISO-8859-1
it	ISO-8859-1
iw	ISO-8859-8
ja	Shift_JIS
ko	EUC-KR
lt	ISO-8859-2
lv	ISO-8859-2
mk	ISO-8859-5
nl	ISO-8859-1
no	ISO-8859-1
pl	ISO-8859-2
pt	ISO-8859-1
ro	ISO-8859-2
ru	ISO-8859-5
sh	ISO-8859-5
sk	ISO-8859-2
sl	ISO-8859-2
sq	ISO-8859-2
sr	ISO-8859-5
sv	ISO-8859-1
th	TIS-620
tr	ISO-8859-9
uk	ISO-8859-5
zh	GB2312
zh_TW	Big5