The Java EE 5 Tutorial

Chapter 15 Internationalizing and Localizing Web Applications

The process of preparing an application to support more than one language and data format is called internationalization. Localization is the process of adapting an internationalized application to support a specific region or locale. Examples of locale-dependent information include messages and user interface labels, character sets and encoding, and date and currency formats. Although all client user interfaces should be internationalized and localized, it is particularly important for web applications because of the global nature of the web.

Java Platform Localization Classes

In the Java 2 platform, java.util.Locale represents a specific geographical, political, or cultural region. The string representation of a locale consists of the international standard two-character abbreviation for language and country and an optional variant, all separated by underscore (_) characters. Examples of locale strings include fr (French), de_CH (Swiss German), and en_US_POSIX (English on a POSIX-compliant platform).

Locale-sensitive data is stored in a java.util.ResourceBundle. A resource bundle contains key-value pairs, where the keys uniquely identify a locale-specific object in the bundle. A resource bundle can be backed by a text file (properties resource bundle) or a class (list resource bundle) containing the pairs. You construct resource bundle instance by appending a locale string representation to a base name.

For more details on internationalization and localization in the Java 2 platform, see http://java.sun.com/docs/books/tutorial/i18n/index.html.

In the web technology chapters, the Duke’s Bookstore applications contain resource bundles with the base name messages.BookstoreMessages for the locales en_US, fr_FR, de_DE, and es_MX.

Providing Localized Messages and Labels

Messages and labels should be tailored according to the conventions of a user’s language and region. There are two approaches to providing localized messages and labels in a web application:

The Duke’s Bookstore applications follow the second approach. Here are a few lines from the default resource bundle messages.BookstoreMessages.java:

{"TitleCashier", "Cashier"},
{"TitleBookDescription", "Book Description"},
{"Visitor", "You are visitor number "},
{"What", "What We’re Reading"},
{"Talk", " talks about how Web components can transform the way you develop 
applications for the Web. This is a must read for any self respecting Web developer!"},
{"Start", "Start Shopping"},

Establishing the Locale

To get the correct strings for a given user, a web application either retrieves the locale (set by a browser language preference) from the request using the getLocale method, or allows the user to explicitly select the locale.

The JSTL versions of Duke’s Bookstore automatically retrieve the locale from the request and store it in a localization context (see Internationalization Tag Library). It is also possible for a component to explicitly set the locale by using the fmt:setLocale tag.

The JavaServer Faces version of Duke’s Bookstore allows the user to explicitly select the locale. The user selection triggers a method that stores the locale in the FacesContext object. The locale is then used in resource bundle selection and is available for localizing dynamic data and messages (see Localizing Dynamic Data):

<h:commandLink id="NAmerica" action="storeFront"
    actionListener="#{localeBean.chooseLocaleFromLink}">
    <h:outputText value="#{bundle.english}" />
</h:commandLink>
public void chooseLocaleFromLink(ActionEvent event) {
    String current = event.getComponent().getId();
    FacesContext context = FacesContext.getCurrentInstance();
    context.getViewRoot().setLocale((Locale)
        locales.get(current));
}

Setting the Resource Bundle

After the locale is set, the controller of a web application typically retrieves the resource bundle for that locale and saves it as a session attribute (see Associating Objects with a Session) for use by other components:

messages = ResourceBundle.getBundle("com.sun.bookstore.messages.BookstoreMessages", 
    locale);
session.setAttribute("messages", messages);

The resource bundle base name for the JSTL versions of Duke’s Bookstore is set at deployment time through a context parameter. When a session is initiated, the resource bundle for the user’s locale is stored in the localization context. It is also possible to override the resource bundle at runtime for a given scope using the fmt:setBundle tag and for a tag body using the fmt:bundle tag.

The JavaServer Faces version of Duke’s Bookstore uses two methods for setting the resource bundle. One method is letting the JSP pages set the resource bundle using the f:loadBundle tag. This tag loads the correct resource bundle according to the locale stored in FacesContext.

<f:loadBundle basename="messages.BookstoreMessages"
    var="bundle"/>

For information on this tag, see Loading a Resource Bundle.

Another way a JavaServer Faces application sets the resource bundle is by configuring it in the application configuration file. There are two XML elements that you can use to set the resource bundle: message-bundle and resource-bundle.

If the error messages are queued onto a component as a result of a converter or validator being registered on the component, then these messages are automatically displayed on the page using the message or messages tag. These messages must be registered with the application using the message-bundle tag:

<message-bundle>
    resources.ApplicationMessages
</message-bundle>

For more information on using this element, see Registering Custom Error Messages.

Resource bundles containing messages that are explicitly referenced from a JavaServer Faces tag attribute using a value expression must be registered using the resource-bundle element of the configuration file:

<resource-bundle>
    <base-name>com.sun.bookstore6.resources.CustomMessages</base-name>
    <var>customMessages</var>
</resource-bundle>

For more information on using this element, see Registering Custom Localized Static Text

Retrieving Localized Messages

A web component written in the Java programming language retrieves the resource bundle from the session:

ResourceBundle messages = (ResourceBundle)session.getAttribute("messages");

Then it looks up the string associated with the key Talk as follows:

messages.getString("Talk");

The JSP versions of the Duke’s Bookstore application uses the fmt:message tag to provide localized strings for messages, HTML link text, button labels, and error messages:

<fmt:message key="Talk"/>

For information on the JSTL messaging tags, see Messaging Tags.

The JavaServer Faces version of Duke’s Bookstore retrieves messages using either the message or messages tag, or by referencing the message from a tag attribute using a value expression.

You can only use a message or messages tag to display messages that are queued onto a component as a result of a converter or validator being registered on the component. The following example shows a message tag that displays the error message queued on the userNo input component if the validator registered on the component fails to validate the value the user enters into the component.

<h:inputText id="userNo" value="#{UserNumberBean.userNumber}">
    <f:validateLongRange minimum="0" maximum="10" />
     ...
<h:message
     style="color: red;
     text-decoration: overline" id="errors1" for="userNo"/>

For more information on using the message or messages tags, see Displaying Error Messages with the message and messages Tags.

Messages that are not queued on a component and are therefore not loaded automatically are referenced using a value expression. You can reference a localized message from almost any JavaServer Faces tag attribute.

The value expression that references a message has the same notation whether you loaded the resource bundle with the loadBundle tag or registered it with the resource-bundle element in the configuration file.

The value expression notation is var.message, in which var matches the var attribute of the loadBundle tag or the var element defined in the resource-bundle element of the configuration file, and message matches the key of the message contained in the resource bundle, referred to by the var attribute.

Here is an example from bookstore.jsp:

<h:outputText value="#{bundle.Talk}"/>

Notice that bundle matches the var attribute from the loadBundle tag and that Talk matches the key in the resource bundle.

For information on using localized messages in JavaServer Faces, see Rendering Components for Selecting Multiple Values.

Date and Number Formatting

Java programs use the DateFormat.getDateInstance(int, locale) to parse and format dates in a locale-sensitive manner. Java programs use the NumberFormat.getXXXInstance(locale) method, where XXX can be Currency, Number, or Percent, to parse and format numerical values in a locale-sensitive manner. The servlet version of Duke’s Bookstore uses the currency version of this method to format book prices.

JSTL applications use the fmt:formatDate and fmt:parseDate tags to handle localized dates and use the fmt:formatNumber and fmt:parseNumber tags to handle localized numbers, including currency values. For information on the JSTL formatting tags, see Formatting Tags. The JSTL version of Duke’s bookstore uses the fmt:formatNumber tag to format book prices and the fmt:formatDate tag to format the ship date for an order:

<fmt:formatDate value="${shipDate}" type="date"
    dateStyle="full"/>.

The JavaServer Faces version of Duke’s Bookstore uses date/time and number converters to format dates and numbers in a locale-sensitive manner. For example, the same shipping date is converted in the JavaServer Faces version as follows:

<h:outputText value="#{cashier.shipDate}">
    <f:convertDateTime dateStyle="full"/>
</h:outputText>

For information on JavaServer Faces converters, see Using the Standard Converters.

Character Sets and Encodings

The following sections describe character sets and character encodings.

Character Sets

A character set is a set of textual and graphic symbols, each of which is mapped to a set of nonnegative integers.

The first character set used in computing was US-ASCII. It is limited in that it can represent only American English. US-ASCII contains uppercase and lowercase Latin alphabets, numerals, punctuation, a set of control codes, and a few miscellaneous symbols.

Unicode defines a standardized, universal character set that can be extended to accommodate additions. When the Java program source file encoding doesn’t support Unicode, you can represent Unicode characters as escape sequences by using the notation \uXXXX, where XXXX is the character’s 16-bit representation in hexadecimal. For example, the Spanish version of the Duke’s Bookstore message file uses Unicode for non-ASCII characters:

{"TitleCashier", "Cajero"},
{"TitleBookDescription", "Descripci" + "\u00f3" + "n del
 Libro"},
{"Visitor", "El visitante" + "\u00fa" + "mero "},
{"What", "Qu" + "\u00e9" + " libros leemos"},
{"Talk", " describe cómo los componentes de software de web
 pueden transformar la manera en que desarrollamos las
 aplicaciones para la web. Este libro es obligatorio para
 cualquier programador de respeto!"},
{"Start", "Empezar a Comprar"},

Character Encoding

A character encoding maps a character set to units of a specific width and defines byte serialization and ordering rules. Many character sets have more than one encoding. For example, Java programs can represent Japanese character sets using the EUC-JP or Shift-JIS encodings, among others. Each encoding has rules for representing and serializing a character set.

The ISO 8859 series defines 13 character encodings that can represent texts in dozens of languages. Each ISO 8859 character encoding can have up to 256 characters. ISO-8859-1 (Latin-1) comprises the ASCII character set, characters with diacritics (accents, diaereses, cedillas, circumflexes, and so on), and additional symbols.

UTF-8 (Unicode Transformation Format, 8-bit form) is a variable-width character encoding that encodes 16-bit Unicode characters as one to four bytes. A byte in UTF-8 is equivalent to 7-bit ASCII if its high-order bit is zero; otherwise, the character comprises a variable number of bytes.

UTF-8 is compatible with the majority of existing web content and provides access to the Unicode character set. Current versions of browsers and email clients support UTF-8. In addition, many new web standards specify UTF-8 as their character encoding. For example, UTF-8 is one of the two required encodings for XML documents (the other is UTF-16).

See Appendix Figure 37–6 for more information on character encodings in the Java 2 platform.

Web components usually use PrintWriter to produce responses; PrintWriter automatically encodes using ISO-8859-1. Servlets can also output binary data using OutputStream classes, which perform no encoding. An application that uses a character set that cannot use the default encoding must explicitly set a different encoding.

For web components, three encodings must be considered:

Request Encoding

The request encoding is the character encoding in which parameters in an incoming request are interpreted. Currently, many browsers do not send a request encoding qualifier with the Content-Type header. In such cases, a web container will use the default encoding, ISO-8859-1, to parse request data.

If the client hasn’t set character encoding and the request data is encoded with a different encoding from the default, the data won’t be interpreted correctly. To remedy this situation, you can use the ServletRequest.setCharacterEncoding(String enc) method to override the character encoding supplied by the container. To control the request encoding from JSP pages, you can use the JSTL fmt:requestEncoding tag. You must call the method or tag before parsing any request parameters or reading any input from the request. Calling the method or tag once data has been read will not affect the encoding.

Page Encoding

For JSP pages, the page encoding is the character encoding in which the file is encoded.

For JSP pages in standard syntax, the page encoding is determined from the following sources:

If none of these is provided, ISO-8859-1 is used as the default page encoding.

For JSP pages in XML syntax (JSP documents), the page encoding is determined as described in section 4.3.3 and appendix F.1 of the XML specification.

The pageEncoding and contentType attributes determine the page character encoding of only the file that physically contains the page directive. A web container raises a translation-time error if an unsupported page encoding is specified.

Response Encoding

The response encoding is the character encoding of the textual response generated by a web component. The response encoding must be set appropriately so that the characters are rendered correctly for a given locale. A web container sets an initial response encoding for a JSP page from the following sources:

If none of these is provided, ISO-8859-1 is used as the default response encoding.

The setCharacterEncoding, setContentType, and setLocale methods can be called repeatedly to change the character encoding. Calls made after the servlet response’s getWriter method has been called or after the response is committed have no effect on the character encoding. Data is sent to the response stream on buffer flushes (for buffered pages) or on encountering the first content on unbuffered pages.

Calls to setContentType set the character encoding only if the given content type string provides a value for the charset attribute. Calls to setLocale set the character encoding only if neither setCharacterEncoding nor setContentType has set the character encoding before. To control the response encoding from JSP pages, you can use the JSTL fmt.setLocale tag.

    To obtain the character encoding for a locale, the setLocale method checks the locale encoding mapping for the web application. For example, to map Japanese to the Japanese-specific encoding Shift_JIS, follow these steps:

  1. Select the WAR.

  2. Click the Advanced Settings button.

  3. In the Locale Character Encoding table, Click the Add button.

  4. Enter ja in the Extension column.

  5. Enter Shift_JIS in the Character Encoding column.

If a mapping is not set for the web application, setLocale uses a Application Server mapping.

The first application in Chapter 5, JavaServer Pages Technology allows a user to choose an English string representation of a locale from all the locales available to the Java 2 platform and then outputs a date localized for that locale. To ensure that the characters in the date can be rendered correctly for a wide variety of character sets, the JSP page that generates the date sets the response encoding to UTF-8 by using the following directive:

<%@ page contentType="text/html; charset=UTF-8" %>

Further Information about Internationalizing Web Applications

For a detailed discussion on internationalizing web applications, see the Java BluePrints for the Enterprise at http://java.sun.com/blueprints/enterprise.