Skip Headers
Oracle® Application Server Globalization Support Guide
10g Release 2 (10.1.2)
B14004-02
  Go To Documentation Library
Home
Go To Product List
Solution Area
Go To Table Of Contents
Contents
Go To Index
Index

Previous
Previous
Next
Next
 

4 Implementing HTML Features

This chapter contains the following topics:

4.1 Implementing HTML Features for Global Applications

There are a variety of HTML features to enhance your global Internet applications. The following sections discuss some of the most important HTML features to consider when designing your global applications.

4.2 Formatting HTML Pages to Accommodate Text in Different Languages

Design the format of HTML pages according to the following guidelines:

4.3 Encoding HTML Pages

The encoding of an HTML page is important information for a browser and an Internet application. You can think of the page encoding as the character set used for the locale that an Internet application is serving. The browser needs to know about the page encoding so that it can use the correct fonts and character set mapping tables to display pages. Internet applications need to know about the HTML page encoding so they can process input data from an HTML form. To correctly specify the page encoding for HTML pages, Internet applications must:

4.3.1 Choosing an HTML Page Encoding for Monolingual Applications

The HTML page encoding is based on the user's locale. If the application is monolingual, it supports only one locale per instance. Therefore, you should encode HTML pages in the native encoding for that locale. The encoding should be equivalent to the Oracle character set specified by the NLS_LANG parameter in the Oracle HTTP Server configuration file.

Table 4-1 lists the Oracle character set names for the native encodings of the most commonly used locales, along with the corresponding Internet Assigned Numbers Authority (IANA) encoding names and Java encoding names. Use these character sets for monolingual applications.

Table 4-1 Native Encodings for Commonly Used Locales

Language Oracle Character Set Name IANA Encoding Name Java Encoding Name

Arabic

AR8MSWIN1256

ISO-8859-6

ISO8859_6

Baltic

BLT8MSWIN1257

ISO-8859-4

ISO8859_4

Central European

EE8MSWIN1250

ISO-8859-2

ISO8859_2

Cyrillic

CL8MSWIN1251

ISO-8859-5

ISO8859_5

Greek

EL8MSWIN1253

ISO-8859-7

ISO8859_7

Hebrew

IW8MSWIN1255

ISO-8859-8

ISO8859_8

Japanese

JA16SJIS

Shift_JIS

MS932

Korean

KO16MSWIN949

EUC-KR

MS949

Simplified Chinese

ZHS16GBK

GB2312

GBK

Thai

TH8TISASCII

TIS-620

TIS620

Traditional Chinese

ZHT16MSWIN950

Big5

MS950

Turkish

TR8MSWIN1254

ISO-8859-9

ISO8859_9

Universal

UTF8

UTF-8

UTF8

Western European

WE8MSWIN1252

ISO-8859-1

ISO8859_1


4.3.2 Choosing an HTML Page Encoding for Multilingual Applications

Multilingual applications need to determine the encoding used for the current user's locale at runtime and map the locale to the encoding as shown in Table 4-1.

Instead of using different native encodings for different locales, you can use UTF-8 for all page encodings. Using the UTF-8 encoding not only simplifies the coding for multilingual applications but also supports multilingual content. In fact, if a multilingual Internet application is written in Perl, the best choice for the HTML page encoding is UTF-8 because these programming environments do not provide an intuitive and efficient way to convert HTML content from UTF-8 to the native encodings of various locales.

4.3.3 Specifying the Page Encoding for HTML Pages

The best practice for monolingual and multilingual applications is to specify the encoding of HTML pages returned to the client browser. The encoding of HTML pages can tell the browser to:

  • Switch to the specified encoding

  • Return user input in the specified encoding

The following sections explain how to specify the encoding of an HTML page:

If you use both methods, then specifying the encoding in the HTTP header takes precedence.

4.3.3.1 Specifying the Encoding in the HTTP Header

Include the Content-Type HTTP header in the HTTP specification. It specifies the content type and character set. The most commonly used browsers, such as Netscape 4.0 or later and Internet Explorer 4.0 or later, correctly interpret this header. The Content-Type HTTP header has the following form:

Content-Type: text/plain; charset=iso-8859-4

The charset parameter specifies the encoding for the HTML page. The possible values for the charset parameter are the IANA names for the character encodings that the browser supports. Table 4-1 shows commonly used IANA names.

4.3.3.2 Specifying the Encoding in the HTML Page Header

Use this method primarily for static HTML pages. Specify the character encoding in the HTML header as follows:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

The charset parameter specifies the encoding for the HTML page. The possible values for the charset parameter are the IANA names for the character encodings that the browser supports. Table 4-1 shows commonly used IANA names.

4.3.4 Specifying the Page Encoding in Java Servlets and Java Server Pages

For both monolingual and multilingual applications, you can specify the encoding of an HTML page in the Content-Type HTTP header in a Java Server Page (JSP) using the contentType page directive. For example:

<%@ page contentType="text/html; charset=utf-8" %>

This is the MIME type and character encoding that the JSP file uses for the response it sends to the client. You can use any MIME type or IANA character set name that is valid for the JSP container. The default MIME type is text/html, and the default character set is ISO-8859-1. In the example, the character set is set to UTF-8. The character set of the contentType page directive directs the JSP engine to encode the dynamic HTML page and set the HTTP Content-Type header with the specified character set.

For Java Servlets, you can call the setContentType() method of the Servlet API to specify a page encoding in the HTTP header. The following doGet() function shows how you should call this method:

public void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException 
{

    // generate the MIME type and character set header
    response.setContentType("text/html; charset=utf-8");
    ...
    // generate the HTML page
    Printwriter out = response.getWriter();
    out.println("<HTML>");
    ...
    out.println("</HTML>");
}

You should call the setContentType() method before the getWriter() method because the getWriter() method initializes an output stream writer that uses the character set specified by the setContentType() method call. Any HTML content written to the writer and eventually to a browser is encoded in the encoding specified by the setContentType() call.

4.3.5 Specifying the Page Encoding in Oracle PL/SQL Server Pages

You can specify page encoding for PL/SQL front-end applications and Oracle PL/SQL Server Pages (PSP) in two ways:

  • Specify the page encoding in the NLS_LANG parameter in the corresponding database access descriptor (DAD). Use this method for monolingual applications so you can change the page encoding without changing the application code to support a different locale.

  • Specify the page encoding explicitly from within the PL/SQL procedures and PSP. A page encoding that is specified explicitly overwrites the page encoding inherited from the NLS_LANG character set. Use this method for multilingual applications so that they can use different page encodings for different locales at runtime.

The specified page encoding tells the mod_plsql module and the Web Toolkit to tag the corresponding charset parameter in the Content-Type header of an HTML page and to convert the page content to the corresponding character set.


See Also:


4.3.5.1 Specifying the Page Encoding in PL/SQL for Monolingual Environments

In order for monolingual applications to take the page encoding from the NLS_LANG parameter, the Content-Type HTTP header should not specify a page encoding. For PL/SQL procedures, the call to mime_header() should be similar to the following:

owa_util.mime_header('text/html',false);

For PSP, the content type directive should be similar to the following:

<%@ page contentType="text/html"%>

If the page encoding is not specified in the mime_header() function call or the content type directive, then the Web Toolkit API uses the NLS_LANG character set as the page encoding by default, and converts HTML content to the NLS_LANG character set. Also, the Web Toolkit API automatically adds the default page encoding to the charset parameter of the Content-Type header.

Specifying the Page Encoding in PL/SQL for Multilingual Environments

You can specify page encoding in a PSP the same way that you specify it in a JSP page. The following directive tells the PSP compiler to generate code to set the page encoding in the HTTP Content-Type header for the page:

<%@ page contentType="text/html; charset=utf-8" %>

To specify the encoding in the Content-Type HTTP header for PL/SQL procedures, use the Web Toolkit API in the PL/SQL procedures. The Web Toolkit API consists of the OWA_UTL package, which allows you to specify the Content-Type header as follows:

owa_util.mime_header('text/html', false, 'utf-8')

You should call the mime_header() function in the context of the HTTP header. It generates the following Content-Type header in the HTTP response:

Content-Type: text/html; charset=utf-8

After you specify a page encoding, the Web Toolkit API converts HTML content to the specified page encoding.

4.3.6 Specifying the Page Encoding in Perl

For Perl scripts running in the mod_perl environment, specify the encoding for an HTML page in the HTTP Content-Type header as follows:

$page_encoding = 'utf-8';
$r->content_type("text/html; charset=$page_encoding");
$r->send_http_header;
return OK if $r->header_only;

Specifying the Page Encoding in Perl for Monolingual Applications

For monolingual applications, the encoding of an HTML page should be equivalent to:

  • The character set used for the POSIX locale on which a Perl script runs

  • The Oracle character set specified by the NLS_LANG parameter if the Perl script accesses the database

Specifying the Page Encoding in Perl for Multilingual Applications

For multilingual applications, Perl scripts should run in an environment where:

  • Both the NLS_LANG character set and the character set used for the POSIX locale are equivalent to UTF-8

  • The UTF8 Perl pragma is used

    This pragma tells the Perl interpreter to encode identifiers and strings in the UTF-8 encoding.

This environment allows the scripts to process data in any language in UTF-8. The page encoding of the dynamic HTML pages generated from the scripts, however, could be different from UTF-8. If so, then use the UNICODE::MAPUTF8 Perl module to convert data from UTF-8 to the page encoding.


See Also:

http://www.cpan.org to download the UNICODE::MAPUTF8 Perl module

The following example illustrates how to use the UNICODE::MAPUTF8 Perl module to generate HTML pages in the Shift_JIS encoding:

use Unicode::MapUTF8 qw(from_utf8)
# This shows how the UTF8 Perl pragma is specified 
# but is NOT required by the from_utf8 function.
use utf8; 
...
$page_encoding = 'Shift_JIS';
$r->content_type("text/html; charset=$page_encoding");
$r->send_http_header;
return OK if $r->header_only;
...
#html_lines contains HTML content in UTF-8
print (from_utf8({ -string=>$html_lines, -charset=>$page_encoding}));
...

The from_utf8() function converts dynamic HTML content from UTF-8 to the character set specified in the charset argument.

4.3.7 Specifying the Page Encoding in Oracle Application Server Mobile Services Applications

The page encoding for a mobile services application is specified in the application in the same way as other Java or JSP Internet applications. The page encoding specifies the encoding of the Mobile XML generated by the application, and it should be consistently specified in the Mobile XML prolog and the HTTP Content-Type header. The HelloGlobe.jsp application illustrates how the page encoding for the Mobile XML prolog should be specified.

Example 4-1 HelloGlobe.jsp

<?xml version="1.0" encoding="UTF-8"?>   (1)
<%@ page contentType="text/vnd.oracle.mobilexml; charset=UTF-8"%>   (2)
<SimpleResult>
   <SimpleContainer>
      <SimpleForm title="Hello Globe"
                  target="HelloGlobeReply.jsp" method="POST">
         <SimpleFormItem name="UserName" title="Your Name:" />
      </SimpleForm>
   </SimpleContainer>
</SimpleResult>

In this example, line (1) sets the content encoding XML prolog, and line (2) sets the content encoding in the HTTP Content-Type header.

Oracle Application Server Wireless converts the Mobile XML into the page encoding supported by the target device from the encoding information specified in the XML prolog and the HTTP Content-Type header. It then renders the content in the markup language supported by the target device. If the encodings specified in the XML prolog and the HTTP Content-Type header are inconsistent, then the Oracle Application Server Wireless Mobile XML conversion will fail.

4.3.8 Specifying the Page Encoding in Oracle Application Server Web Cache Enabled Applications

When an edge side include (ESI) fragment is in a different page encoding from that of the corresponding ESI template, Oracle Application Server Web Cache converts the fragment to the page encoding of the template. This is to avoid cases where the content of a cached page is constructed in multiple page encodings. The character set conversion in Oracle Application Server Web Cache takes place only when both the template's and fragment's page encodings are known. Otherwise Oracle Application Server Web Cache assumes they are in the same page encoding, and therefore embeds the fragment into the template without converting the fragment.

Oracle Application Server Web Cache looks for the page encoding information only in the Content-Type header of an HTTP response. It does not look for the page encoding information within the content of the HTTP response.

To avoid losing information during the character set conversion of ESI fragments to ESI templates, applications should use a page encoding for ESI fragments that is a subset of the ESI template page encoding. There are two basic best practices for developers to consider:

  • Use UTF-8 as the page encoding for ESI templates, since UTF-8 is a superset of all other non-Unicode page encodings.

  • Use the same page encoding for ESI fragments and ESI templates. Character set conversion will not happen in this case.

4.3.9 Specifying the Page Encoding in Oracle Application Server Reports Services Applications

The page encodings that you use for different types of Reports Services applications depend on what type of report you are creating. This section discusses the page encoding options for Reports Services.

4.3.9.1 Specifying the Page Encoding in JSP Reports for the Web

You can specify the page encoding in JSP or HTML with the Web Source Editor in Reports Builder.

4.3.9.2 Specifying the Page Encoding in HTML for Oracle Application Server Reports Services

Specify the HTML page encoding in the page header. For example, to specify a Japanese character set, include the following tag in the page header:

<META http-equiv="Content-Type" content="text/html;charset=Shift_JIS">

Reports Builder puts this tag in your report using the Before Report Value and Before Form Value properties. The default values for these properties are similar to the following:

<html><head><meta http-equiv="Content-Type" content="text/html;charset=&Encoding"></head>

The IANA locale name that is equivalent to the NLS_LANG setting for Oracle Reports is assigned to &Encoding dynamically at runtime. Thus you do not need to modify your report or Oracle Reports settings to include the proper locale.


See Also:

Reports Builder online help for more information

4.3.9.3 Specifying the Page Encoding in XML for Oracle Reports

Generally, when using XML, you would specify the encoding for XML by including a statement similar to the following as the prolog at the first line in the XML output file:

<?xml version="1.0" encoding="Shift_JIS"?>

To set this prolog in your report, you can specify the XML Prolog Value property of your report in or use the SRW.SET_XML_PROLOG built-in. The default value for the XML Prolog Value property is:

<?xml version="1.0" encoding="&Encoding"?>

In this case, Reports translates the value set as the NLS_CHARACTERSET into what is expected in the XML specification.


Note:

You can overwrite the mapping by adding entries to your REPORTS_NLS_XML_CHARSET. The syntax is:
old_name1=new_name1[;old_name2=new_name2][;old_name3=new_name3]...

Example:

ISO-8859-8=ISO-8859-8-1;CSEUCKR=EUC-KR;WINDOWS-949=EUC-KR;EUC-CN=GBK;WINDOWS-936=GBK


See Also:

Reports Builder online help for more information

4.4 Encoding URLs

If HTML pages contain URLs with embedded query strings, you must escape any non-ASCII bytes in the query strings in the %XX format, where XX is the hexadecimal representation of the binary value of the byte. For example, if an Internet application embeds a URL that points to a UTF-8 JSP containing the German name "Schloß," then the URL should be encoded as follows:

http://host.domain/actionpage.jsp?name=Schlo%c3%9f

In the preceding URL, c3 and 9f represent the binary value in hexadecimal of the ß character in the UTF-8 encoding.

To encode a URL, be sure to complete the following tasks:

  1. Convert the URL into the encoding expected from the target object. This encoding is usually the same as the page encoding used in your application.

  2. Escape non-ASCII bytes of the URL into the %XX format.

    Most programming environments provide APIs to encode and decode URLs. The following sections describe URL encoding in various environments:

4.4.1 Encoding URLs in Java

If you construct a URL in a JSP or Java Servlet, you must escape all 8-bit bytes using their hexadecimal values prefixed by a percent sign. The URLEncoder.encode(String s, String enc) function provided in JDK 1.4 and later enables you to escape the URL in a given HTML page encoding. You need to specify the proper Java encoding name that corresponds to the page encoding in the second argument. See Table 4-1 for the Java encoding names of some commonly used page encodings.

If you are using JDK 1.3, then only the URLEncoder.encode(String s) function is available. It only encodes a URL in the Java default encoding. To make this function work for URLs in any encoding, you must add code to escape any non-ASCII characters in a URL into their hexadecimal representation, based on the encoding of your choice.

The following code shows an example of how to encode a URL based on the UTF-8 encoding:

String unreserved = new String("/\\-  _.!~*'()
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz 0123456789");
StringBuffer out = new StringBuffer(url.length());
for (int i = 0; i < url.length(); i++)
{
      int c = (int) url.charAt(i);
   if (unreserved.indexOf(c) != -1) {
        if (c == ' ') c = '+';
        out.append((char)c);
        continue;
   }
   byte [] ba;
   try {
        ba = url.substring(i, i+1).getBytes("UTF8");
   } catch (UnsupportedEncodingException e) {
        ba = url.getBytes();
   }
   for (int j=0; j < ba.length; j++)
   {
        out.append("%" + Long.toHexString((long)(ba[j]&0xff)).toUpperCase());
   }
}
String encodedUrl = out.toString();

4.4.2 Encoding URLs in PL/SQL

In Oracle Application Server, you can encode a URL in PL/SQL by calling the ESCAPE() function in the UTL_URL package. You can call the ESCAPE() function as follows:

encodedURL varchar2(100);
url varchar2(100); 
charset varchar2(40); 
...
encodedURL := UTL_URL.ESCAPE(url, FALSE, charset);

The url argument is the URL that you want to encode. The charset argument specifies the character encoding used for the encoded URL. Use a valid Oracle character set name for the charset argument. To encode a URL in the database character set, always specify the charset argument as NULL.


See Also:

Table 4-1 for a list of commonly used Oracle character set names

4.4.3 Encoding URLs in Perl

You can encode a URL in Perl by using the escape_uri() function of the Apache::Util module as follows:

use Apache::Util qw(escape_uri);
...
$escaped_url   = escape_uri( $url );
...

The escape_uri() function takes the bytes from the $url input argument and encodes them into the %XX format. If you want to encode a URL in a different character encoding, then you need to convert the URL to the target encoding before calling the escape_uri() function. Perl provides some modules for character conversion.


See Also:

http://www.cpan.org for Perl character conversion modules

4.5 Handling HTML Form Input

Applications generate HTML forms to get user input. For Netscape and Microsoft Internet Explorer browsers, the encoding of the input always corresponds to the encoding of the forms for both POST and GET requests. If the encoding of a form is UTF-8, then input text the browser returns is encoded in UTF-8. Internet applications can control the encoding of the form input by specifying the corresponding encoding in the HTML form that requests information.

How a browser passes input in a POST request is different from how it passes input in a GET request:

HTML standards allow named and numbered entities. These special codes allow users to specify characters. For example, &aelig; and &#230; both refer to the character æ. Tables of these entities are available at

http://www.w3.org/TR/REC-html40/sgml/entities.html

Some browsers generate numbered or named entities for any input character that cannot be encoded in the encoding of an HTML form. For example, the Euro character and the character à (Unicode values 8364 and 224 respectively) cannot be encoded in Big5 encoding and are sent as &#8364; and &agrave; when the HTML encoding is Big5. However, the browser does not need to generate numbered or named entities if the page encoding of the HTML form is UTF-8 because all characters can be encoded in UTF-8. Internet applications that support page encoding other than UTF-8 need to be able to handle numbered and named entities.

4.5.1 Handling HTML Form Input in Java

In most JSP and Servlet containers the Servlet API implementation assumes that incoming form input is in ISO-8859-1 encoding. As a result, when the HttpServletRequest.getParameter( ) API is called, all embedded %XX data in the input text is decoded, the decoded input is converted from ISO-8859-1 to Unicode, and returned as a Java string. The Java string returned is incorrect if the encoding of the HTML form is not ISO-8859-1. However, you can work around this problem by converting the form input data. When a JSP or Java Servlet receives form input, it converts it back to the original form in bytes, and then converts the original form to a Java string based on the correct encoding.

The following code converts a Java string to the correct encoding. The Java string real is initialized to store the correct characters from a UTF-8 form:

String original = request.getParameter("name");
try 
{
    String real = new String(original.getBytes("8859_1"),"UTF8");
} 
catch (UnsupportedEncodingException e) 
{
    String real = original;
} 

In addition to Java encoding names, you can use IANA encoding names as aliases in Java functions.


See Also:

Table 4-1 for mapping between commonly used IANA and Java encoding names

OC4J implements Servlet API 2.3, from which you can get the correct input by setting the CharEncoding attribute of the HTTP request object before calling the getParameter() function. Use the following code:

request.setCharacterEncoding("UTF8");
String real = request.getParameter("name");

4.5.2 Handling HTML Form Input in PL/SQL

The browser passes form input to PL/SQL procedures as PL/SQL procedure arguments. When a browser issues a POST or a GET request, it first sends the form input to the mod_plsql module in the encoding of the requesting HTML form. The mod_plsql module then decodes all %XX escape sequences in the input to their actual binary representations. It then passes the input to the PL/SQL procedure serving the request.

You should construct PL/SQL arguments you use to accept form input with the VARCHAR2 datatype. Data in VARCHAR2 are always encoded in the database character set. For example, the following PL/SQL procedure accepts two parameters in VARCHAR2:

procedure test(name VARCHAR2, gender VARCHAR2)
begin
...
end;

By default, the mod_plsql module assumes that the arguments of a PL/SQL procedure are in VARCHAR2 datatype when it binds them. Using VARCHAR2 as the argument datatype means that the module uses Oracle Character Set Conversion facility provided in Oracle Callable Library to convert form input data properly from the NLS_LANG character set, which is also your page encoding, to the database character set. The corresponding DAD specifies the NLS_LANG character set. As a result, the arguments passed as VARCHAR2 should already be encoded in the database character set and be ready to use within the PL/SQL procedures.

4.5.2.1 Handling HTML Form Input in PL/SQL for Monolingual Applications

For monolingual application deployment, the NLS_LANG character set specified in the DAD is the same as the character set of the form input and the page encoding chosen for the locale. As a result, form input passed as VARCHAR2 arguments should be transparently converted to the database character set and ready for use.

4.5.2.2 Handling HTML Form Input in PL/SQL for Multilingual Applications

For multilingual application deployment, form input can be encoded in different character sets depending on the page encodings you choose for the corresponding locales. You cannot use Oracle Character Set Conversion facility because the character set of the form input is not always the same as the NLS_LANG character set. Relying on this conversion corrupts the input. To resolve this problem, disable Oracle Character Set Conversion facility by specifying the same NLS_LANG character set in the corresponding DAD as the database character set. Once you disable the conversion, PL/SQL procedures receive form input as VARCHAR2 arguments. You must convert the arguments from the form input encoding to the database character set before using them. You can use the following code to convert the argument from ISO-8859-1 character set to UTF-8:

procedure test(name VARCHAR2, gender VARCHAR2)
begin
   name := CONVERT(name, 'AMERICAN_AMERICA.UTF8',
                 AMERICAN_AMERICA.WE8MSWIN1252')
   gender := CONVERT(gender, 'AMERICAN_AMERICA.UTF8',
                 AMERICAN_AMERICA.WE8MSWIN1252')
...
end;

4.5.3 Handling HTML Form Input in Perl

In the Oracle HTTP Server mod_perl environment, GET requests pass input to a Perl script differently than POST requests. It is good practice to handle both types of requests in the script. The following code gets the input value of the name parameter from an HTML form:

my $r = shift;
my %params = $r->method eq 'POST' ? $r->content : $r->args ;
my $name = $params{'name'} ;

For multilingual Perl scripts, the page encoding of an HTML form may be different from the UTF-8 encoding used in the Perl scripts. In this case, input data should be converted from the page encoding to UTF-8 before being processed. The following example illustrates how the Unicode::MapUTF8 Perl module converts strings from Shift_JIS to UTF-8:

use Unicode::MapUTF8 qw(to_utf8);
# This is to show how the UTF8 Perl pragma is specified, 
# and is NOT required by the from_utf8 function.
use utf8; 
...
my $page_encoding = 'Shift_JIS';
my $r = shift;
my %params = $r->method eq 'POST' ? $r->content : $r->args ;
my $name = to_utf8({-string=>$params{'name'}, -charset=>$page_encoding});
...

The to_utf8() function converts any input string from the specified encoding to UTF-8.

4.5.4 Handling Form Input in Oracle Application Server Mobile Services Applications

When a mobile service is registered to Oracle Application Server Wireless using the Wireless Tools administration tool, the Input Encoding parameter of the service must be specified. Oracle Application Server Wireless encodes URL parameters using the encoding specified in the Input Encoding parameter of the service. The mobile service application should be written so that it uses the same encoding as the Input Encoding parameter to interpret input from the target mobile devices. The HelloGlobeReply.jsp example illustrates how to handle the response from the service HelloGlobe.jsp, which is described in Example 4-1.

Example 4-2 HelloGlobeReply.jsp

<%xml version="1.0" encoding="UTF-8"?>
<%@ page contentType="text/vnd.oracle.mobilexml; charset=UTF-8"%>
<%
   request.setCharacterEncoding("UTF-8");                             (1)
   String name = request.getParameter("UserName");
%>
<SimpleResult>
   <SimpleContainer>
      <SimpleText>
         <SimpleTextItem>Hello <%=name%> !</SimpleTextItem>
      </SimpleText>
   </SimpleContainer>
</SimpleResult>

In this example, line (1) specifies that parameters are encoded using UTF-8.

This assumes that the Input Encoding parameter is specified as UTF-8 when the Master Service of HelloGlobe.jsp is created. The mobile service application should specify the same encoding for all input parameters that are received from the target device.

4.6 Decoding HTTP Headers

In all HTTP headers specific to Oracle Application Server, any value containing non-ASCII characters is MIME encoded according to the RFC 2047 specification. The encoded headers must be properly decoded before being used in an application. Applications deployed on Oracle Application Server may receive these HTTP headers.

4.6.1 Decoding HTTP Headers from Oracle Application Server Single Sign-On

When applications are using Oracle Application Server Single Sign-On to authenticate a user, they need to decode the headers that Oracle Application Server Single Sign-On sends. The headers whose values may contain encoded non-ASCII characters include:

  • REMOTE_USER

  • Osso-User-Dn

  • Osso-Subscriber

  • Osso-Subscriber-Dn

For Java-based Web applications deployed on OC4J, the REMOTE_USER header is already interpreted in the HTTPServletRequest.getRemoteUser() method, and the REMOTE_USER header is removed from HTTP requests. For other types of Web applications, the REMOTE_USER header is present and should be properly decoded along with other headers.

To decode a header value, you may use the javax.mail.internet.MimeUtility package of the Java Mail API. See Example 4-3, "Decoding a User's Display Name" for an example of decoding.

For PL/SQL applications, you need to write your own code to decode these header values.

4.6.2 Decoding String-type Mobile Context Information Headers in Oracle Application Server Wireless Services

String-type mobile context information, such as Login User Name (X-Oracle-User.name), User Display Name (X-Oracle-User.DisplayName), and Address Line of the Location (X-Oracle.User.Location.AddressLine1) are MIME encoded in the HTTP headers. Applications must decode them after they are retrieved from the HTTP request. Example 4-3 shows how an JSP application may retrieve and decode the user's display name.

Example 4-3 Decoding a User's Display Name


<@ page import="java.io.*" %>
<@ page import="javax.mail.internet.MimeUtility" %>
<%
   String rawDisplayName = request.getHeader("X-Oracle-User.DisplayName");
   String displayName = null;
   try
   {
      displayName = MimeUtility.decodeText(rawDisplayName);
   }
   catch (UnsupportedEncodingException e)
   {
      // don't care
      displayName = rawDisplayName;
   }
%>

4.7 Organizing the Content of HTML Pages for Translation

You should translate the user interface (UI) and content presented in HTML pages. Translatable sources for the content of an HTML page belong to the following categories:

This section contains the following topics:

4.7.1 Translation Guidelines for HTML Page Content

When creating translatable content, developers should follow these translation guidelines:

  • Externalize to resource files all static and translatable UI strings used in programs such as Java Servlets, Java Server Pages, Perl scripts, PL/SQL procedures, and PL/SQL Server Pages. These resource files can then be translated independent of program code.

  • All dynamic text in an HTML page must be able to expand by at least 30% without overlapping adjacent objects to allow for text expansion that can result from translation. The HTML page should look acceptable after expanding strings by 30%.

  • Avoid concatenating strings to form sentences at runtime. The concatenated translated strings might not have the same meaning as the original strings. Use the string formatting functions provided by different programming languages to substitute runtime values for placeholders.

  • Avoid embedding text into images and graphics because they are often not easy to translate.

  • JavaScript code must not include any translatable strings. JavaScript is hard to translate. Instead, applications should externalize translatable strings, if any, into resource files or message tables. Applications should construct JavaScript code at runtime and replace the dynamic text with text corresponding to the user's locale.

  • Because translations are often not available in the initial release of an application, it is important to make the application work when the corresponding translation is not available by putting a fallback mechanism in the application. The fallback mechanism can be as simple as using English information or as complex as using the closest language available. For example, the fr-CA locale is French Canadian. The fallback for this language can be fr (French) or en (English). A simple way to find the closest possible language is to remove the territory part of the ISO locale name.The behavior of the fallback mechanism is up to the application.

4.7.2 Organizing Static Files for Translation

You should organize translatable HTML, images, and CSS files into different directories from non-translatable static files so that you can zip files under the locale-specific directory for translation. There are many possible ways to define the directory structure to hold these files. For example:

/docroot/images         - Non-translatable images
/docroot/html           - HTML pages common to all languages
/docroot/css            - Style sheets common to all languages
/docroot/lang         - Locale directory such as en, fr, ja, and so on.
/docroot/lang/images  - Images specific for lang
/docroot/lang/html    - HTML pages specific for lang
/docroot/lang/css     - Style sheets specific for lang

You can replace the <lang> placeholder with the ISO locale names. Based on the preceding structure, you must write a utility function called getLocalizedURL ()to take a URL as a parameter and look for the available language file from this structure. Whenever you reference an HTML, image, or CSS file in an HTML page, the Internet application should call this function to construct the path of the translated file corresponding to the current locale and fall back appropriately if the translation does not exist. For example, if the path /docroot/html/welcome.html is passed to the getLocalizedURL() function and the current locale is fr_CA, then the function looks for the following files in the order shown:

/docroot/fr_CA/html/welcome.html
/docroot/fr/html/welcome.html
/docroot/en/html/welcome.html
/docroot/html/welcome.html

The function returns the first file that exists. This function always reverts to English when the translated version corresponding to the current locale does not exist.

For Internet applications that use UTF-8 as the page encoding, the encoding of the static HTML files should also be UTF-8. However, translators usually encode translated HTML files in the native encoding of the target language. To convert the translated HTML into UTF-8, you can use the JDK native2ascii utility shipped with Oracle Application Server.

For example, the following steps describe how to convert a Japanese HTML file encoded in Shift_JIS into UTF-8:

  1. Replace the value of the charset parameter in the Content-Type HTML header in the <meta> tag with UTF-8.

  2. Use the native2ascii utility to copy the Japanese HTML file to a new file called japanese.unicode:

        native2ascii -encoding MS932 japanese.html japanese.unicode
    
    
  3. Use the native2ascii utility to convert the new file to Unicode:

        native2ascii -reverse -encoding UTF8 japanese.unicode japanese.html
    

See Also:

JDK documentation at http://java.sun.com for more information about the native2ascii utility

4.7.3 Organizing Translatable Static Strings for Java Servlets and Java Server Pages

You should externalize translatable strings within Java Servlets and JSPs into Java resource bundles so that these resource bundles can be translated independent of the Java code. After translation, the resource bundles carry the same base class names as the English bundles, but with the Java locale name as the suffix. You should place the bundles in the same directory as the English resource bundles for the Java resource bundle look-up mechanism to function properly.


See Also:

JDK documentation at http://java.sun.com for more information about Java resource bundles

Some people may hesitate about externalizing JSP strings to resource bundles because it seems to defeat the purpose of using JSPs. There are two reasons for externalizing JSPs strings:

  • Translating JSPs is error-prone because they consist of Java code that is not familiar to translators

  • The translation process should be separated from the development process so that translation can take place in parallel to development on JSPs. This eliminates the huge effort of merging the translated JSPs with the most up-to-date JSPs that contain bug fixes to the embedded Java code.

You can use resource bundles in your Java programs by providing your own subclass of the ResourceBundle class. Additionally, Java provides two subclasses of the ResourceBundle abstract class: ListResourceBundle and PropertyResourceBundle. It is good practice to provide your implementation of the ResourceBundle class as a subclass of ListResourceBundle. The main reasons are:

  • List resource bundles are essentially Java programs that must be compiled. Translation errors can be caught at compile time. Property resource bundles are text files read directly from Java. Translation errors can only be caught at runtime.

  • Property resource bundles expose all string data in your Internet application to users. There are potential security and support issues for your application.

The following is an example of a list resource bundle:

import java.util.ListResourceBundle;
public class Resource extends ListResourceBundle {
    public Object[][] getContents() {
        return contents;
    }
    static final Object[][] contents =
    {
       {"hello", "Hello World"},  
       ...
    
    };
}

Translators usually translate list resource bundles in the native encoding of the target language. Japanese list resource bundles encoded in Shift_JIS cannot be compiled on an English system because the Java compiler expects source files that are encoded in ISO-8859-1. In order to build translated list resource bundles in a platform-independent manner, you need to run the JDK native2ascii utility to escape all non-ASCII characters to Unicode escape sequences in the \uXXXX format, where XXXX is the Unicode value in hexadecimal. For example:

native2ascii -encoding MS932 resource_ja.java resource_ja.tmp

Java provides a default fallback mechanism for resource bundles when translated resource bundles are not available. An application only needs to make sure that a base resource bundle without any locale suffix always exists in the same directory. The base resource bundle should contains strings in the fallback language. As an example, Java looks for a resource bundle in the following order when the fr_CA Java locale is specified to the getBundle() function:

resource_fr_CA
resource_fr
resource_en_US /* where en_US is the default Java locale */
resource_en
resource (base resource bundle)

Retrieving Strings in Monolingual Applications

At runtime, monolingual applications can get strings from a resource bundle of the default Java locale as follows:

ResourceBundle rb = ResourceBundle.getBundle("resource");
String helloStr = rb.getString("hello");

Retrieving Strings in Multilingual Applications

Because the user's locale is not fixed in multilingual applications, they should call the getBundle() method by explicitly specifying a Java locale object that corresponds to the user's locale. The Java locale object is called user_locale in the following example:

ResourceBundle rb = ResourceBundle.getBundle("resource", user_locale);
String helloStr = rb.getString("hello");

4.7.4 Organizing Translatable Static Strings in C/C++ and Perl

For C/C++ programs and Perl scripts running on UNIX platforms, externalize static strings in C/C++ or Perl scripts to POSIX message files. For programs running on Microsoft Windows platforms, externalize static strings to message tables in a database because Microsoft Windows does not support POSIX message files.

Message files (with the .po file extension) associated with a POSIX locale are identified by their domain names. You need to compile them into binary objects with the .mo file extension and place them into the directory corresponding to the POSIX locale. The path name for the POSIX locale is implementation-specific. For example, the Solaris msgfmt utility compiles a French Canadian message file, resource.po, and places it into the /usr/lib/locale/fr_CA/LC_MESSAGES directory on Solaris.


See Also:

Operating system documentation for gettext, msgfmt, and xgettext

The following is an example of a resource.po message file:

domain "resource"
msgid "hello"
msgstr "Hello World"
...

The encoding used for the message files must match the encoding used for the corresponding POSIX locale.

Instead of putting binary message files into an implementation-specific directory, you should put them into an application-specific directory and use the binddomain() function to associate a domain with a directory. The following piece of Perl script uses the Locale::gettext Perl module to get a string from a POSIX message file:

use Locale::gettext;
use POSIX;
...
setlocale( LC_ALL, "fr_CA" );
textdomain( "resource" );
binddomain( "resource", "/usr/local/share");
print gettext( "hello" );

The domain name for the resource file is resource, the ID of the string to be retrieved is hello, the translation to be used is French Canadian (fr_ca), and the directory for the binary.mo files is /usr/local/share/fr_CA/LC_MESSAGES.


See Also:

http://www.cpan.org to download the Locale:gettext Perl module

4.7.5 Organizing Translatable Static Strings in Message Tables

Message tables mainly store static translatable strings used by PL/SQL procedures and PSPs. You can also use them for some C/C++ programs and Perl scripts. The tables should have a language column to identify the language of static strings so that accessing applications can retrieve messages based on the user's locale. The table structure should be similar to the following:

CREATE TABLE messages 
( msgid   NUMBER(5)
, langid  VARCHAR2(10)
, message VARCHAR2(4000)
);

The primary key for this table consists of the msgid and langid columns. One good choice for the values in these columns is the Oracle language abbreviations of corresponding locales. Using the Oracle language abbreviation allows applications to retrieve translated information transparently by issuing a query on the message table.


See Also:

Oracle Database Globalization Support Guide 10g Release 1 (10.1) in the Oracle Database Documentation Library for a list of Oracle language abbreviations

To provide a fallback mechanism when the translation of a message is not available, create the following views on top of the message table defined in the previous example:

-- fallback language is English which is abbreviated as 'US'.
CREATE VIEW default_message_view AS
 SELECT msgid, message
 FROM messages
 WHERE langid = 'US';
/
-- create view for services, with fall-back mechanism
CREATE VIEW messages_view AS
SELECT d.msgid,
       CASE WHEN t.message IS NOT NULL
            THEN t.message
            ELSE d.message
       END AS message
FROM default_view d,
     translation  t
WHERE t.msgid (+) = d.msgid AND
      t.langid (+) = sys_context('USERENV', 'LANG');

Messages should be retrieved from the messages_view view that provides the logic for a fallback message in English by joining the default_message_view view with the messages table. The sys_context() SQL function returns the Oracle language abbreviation of the locale for the current database session. This locale should be initialized to the user's locale at the time when the session is created.

To retrieve a message, an application should use the following query:

SELECT message FROM message_view WHERE msgid = 'hello';

The NLS_LANGUAGE parameter of a database session defines the language of the message that the query retrieves. Note that there is no language information needed for the query with this message table schema.

In order to minimize the load to the database, you should set up all message tables and their associated views on an Oracle Application Server instance as a front end to the database where PL/SQL procedures and PSPs run.

4.7.6 Organizing Translatable Dynamic Content in Application Schema

An application schema stores translatable dynamic information the application uses, such as product names and product descriptions. The following shows an example of a table that stores all the products of an Internet store. The translatable information for the table is the product name and the product description.

CREATE TABLE product_information
    ( product_id          NUMBER(6)
    , product_name        VARCHAR2(50)
    , product_description VARCHAR2(2000)
    , category_id         NUMBER(2)
    , warranty_period     INTERVAL YEAR TO MONTH
    , supplier_id         NUMBER(6)
    , product_status      VARCHAR2(20)
    , list_price          NUMBER(8,2)
    );

To store product names and product descriptions in different languages, create the following table so that the primary key consists of the product_id and language_id columns:

CREATE TABLE product_descriptions
    ( product_id             NUMBER(6)
    , language_id            VARCHAR2(3)
    , translated_name        NVARCHAR2(50)
    , translated_description NVARCHAR2(2000)
    );

Create a view on top of the tables to provide fallback when information is not available in the language that the user requests. For example:

CREATE VIEW product AS
SELECT i.product_id
,      d.language_id
,      CASE WHEN d.language_id IS NOT NULL
            THEN d.translated_name
            ELSE i.product_name
       END    AS product_name
,      i.category_id
,      CASE WHEN d.language_id IS NOT NULL
            THEN d.translated_description
            ELSE i.product_description
       END    AS product_description
,      i.warranty_period
,      i.supplier_id
,      i.product_status
,      i.list_price
FROM   product_information  i
,      product_descriptions d
WHERE  d.product_id  (+) = i.product_id
AND    d.language_id (+) = sys_context('USERENV','LANG');

This view performs an outer join on the product_information and production_description tables and selects the rows with the language_id equal to the Oracle language abbreviation of the current database session.

To retrieve a product name and product description from the product view, an application should use the following query:

SELECT product_name, product_description FROM product 
      WHERE product_id = '1234';

This query retrieves the translated product name and production description corresponding to the value of the NLS_LANGUAGE session parameter. Note that you do not need to specify any language information in the query because the query uses sys_context ('USERENV', 'LANG'), which returns the session language.