11
Managing National Language Support (NLS)

Oracle Internet Directory National Language Support (NLS) enables you to store, process and retrieve data in native languages. It ensures that Oracle Internet Directory utilities and error messages automatically adapt to the native language and locale.

This chapter discusses NLS as used by Oracle Internet Directory and tells you the required NLS_LANG environment variables for the various components and tools in an Oracle Internet Directory environment.

Oracle Corporation recommends that, prior to configuring NLS, you review the conceptual discussion in "National Language Support".

This chapter covers topics in the following sections:

The NLS_LANG Environment Variable

The NLS_LANG parameter has three components--language, territory, and charset--in the form:

NLS_LANG = language_territory.charset

Each component controls the operation of a subset of NLS features.

Component Description

language

Specifies conventions such as the language used for Oracle messages, day names, and month names. Each supported language has a unique name--for example, American, French, or German. The language argument specifies default values for the territory and character set arguments, so either (or both) territory or charset can be omitted.
If language is not specified, the value defaults to American. For a complete list of languages, see Oracle8i National Language Support Guide.

territory

Specifies conventions such as the default calendar, collation, date, monetary, and numeric formats. Each supported territory has a unique name; for example, America, France, or Canada.
If territory is not specified, the value defaults to America. For a complete list of territories, see Oracle8i National Language Support Guide

charset

Specifies the character set used by the client application (normally that of the user's terminal). Each supported character set has a unique acronym, for example, US7ASCII, WE8ISO8859P1, WE8DEC, WE8EBCDIC500, or JA16EUC. Each language has a default character set associated with it. Default values for the languages available on your system are listed in the installation or user's guide. For a complete list of character sets, see Oracle8i National Language Support Guide.

Component	Description
language	Specifies conventions such as the language used for Oracle messages, day names, and month names. Each supported language has a unique name--for example, American, French, or German. The language argument specifies default values for the territory and character set arguments, so either (or both) `territory` or `charset` can be omitted. If language is not specified, the value defaults to American. For a complete list of languages, see Oracle8i National Language Support Guide.
territory	Specifies conventions such as the default calendar, collation, date, monetary, and numeric formats. Each supported territory has a unique name; for example, America, France, or Canada. If territory is not specified, the value defaults to America. For a complete list of territories, see Oracle8i National Language Support Guide
charset	Specifies the character set used by the client application (normally that of the user's terminal). Each supported character set has a unique acronym, for example, US7ASCII, WE8ISO8859P1, WE8DEC, WE8EBCDIC500, or JA16EUC. Each language has a default character set associated with it. Default values for the languages available on your system are listed in the installation or user's guide. For a complete list of character sets, see Oracle8i National Language Support Guide.

Note:
All components of the NLS_LANG definition are optional, that is, any item left out will default.
Also, if you specify territory or charset, you must include the preceding delimiter [underscore ( _ ) for territory, and
period ( . ) for charset], otherwise the value will be parsed as a language name.

You can set NLS_LANG as an environment variable at the command line. For example, you could specify the value of NLS_LANG by entering either of the following lines at the prompt:

AMERICAN_AMERICA.UTF8
JAPANESE_JAPAN.UTF8

Using NLS with LDIF Files

Attribute types are always ASCII strings that cannot contain multibyte characters. Oracle Internet Directory does not support multibyte characters in attribute type names. However, Oracle Internet Directory does support attribute values containing multibyte characters such as those in the simplified Chinese (.ZHS16GBK) character set.

Attribute values can be encoded in different ways to allow Oracle Internet Directory tools to interpret them properly. There are two scenarios:

An LDIF file Containing Only ASCII Strings

In this scenario, character strings for attribute values are also in ASCII.

Because all tools use the UTF-8 character set by default, and ASCII is subset of UTF8, all tools can interpret these files. The same is true of keyboard input of values that are simply ASCII strings.

An LDIF file Containing UTF-8 Encoded Strings

In this scenario, character strings for attribute values are also in UTF-8.

Because all tools use the UTF-8 character set by default, all tools can interpret these files. The same is true of keyboard input of values which are UTF-8 strings.

In such a file, some characters may be multibyte. Multibyte characters strings, including UTF-8 strings, can be present in the LDIF files as attribute values or given as keyboard input. Multibyte strings can be encoded in their native character set or in UTF-8. Multibyte strings can also be a BASE64 encoded representation of either the native or the UTF8 string.

Consider the following cases, each of which is described more fully below:

Because the LDAP server understands and expects only UTF-8 encoded strings, cases 1, 3, and 4 need to undergo conversion to UTF-8 strings before they can be sent to the LDAP server.

CASE 1: Native Strings (Non-UTF8)

Use the -E argument in the command line tools, ldifwrite, and bulkmodify. Use the -encode argument in the bulkload and bulkdelete tools.

For example:

ldapsearch -h my_host -p 389 -E ".ZHS16GBK" -b base_DN  -s base  'objectclass=*'

This example converts simplified Chinese native strings to UTF-8. The baseDN can be a simplified Chinese string.

CASE 2: UTF-8 Strings

No conversion required.

CASE 3: BASE64 Encoding of UTF8 Strings

You do not need to use the -E argument in the command line tools, ldifwrite, and bulkmodify, nor the -encode argument in bulkload and bulkdelete. Oracle Internet Directory tools automatically decode BASE64 encoded UTF8 strings to UTF8 strings.

CASE 4: BASE64 Encoding of Native Strings

Use the -E argument in the command line tools, ldifwrite, and bulkmodify. Use the -encode argument in the bulkload and bulkdelete tools.

Oracle Internet Directory tools automatically decode BASE64 encoded native strings to native strings and the native strings are then converted to UTF8 strings.

Note:
In any given input file, all language set values should be for the same language set.

Using NLS with Command Line Tools

Command line tools are non-Java clients that support reading keyboard input or LDIF file input in the following ways:

ASCII characters only from keyboard input or LDIF file
Non-ASCII input from keyboard or LDIF file (native language character set)
LDIF file containing BASE64 encoded values (of UTF-8 or native language character set)

If the character set being given as input from an LDIF file or keyboard is not UTF-8, the command line tools need to convert the input into UTF-8 format before sending it to the LDAP server.

You enable the command line tools to convert the input into UTF-8 by specifying the -E argument when using each tool.

Specifying the -E Argument When Using Each Tool

Specifying the -E argument ensures that proper character set conversion can occur from the character set you specify for the -E argument (-E ".xxxx") to the .UTF8 character set.

The command line tools use the -E argument to process the input in the character set specified for the -E argument. They use the NLS_LANG environment variable to process the output in the character set specified by NLS_LANG.

For example, to add an LDIF file encoded in the .ZHS16GBK (simplified Chinese) character set by using ldapadd, you would type:

ldapadd -h myhost -p 389 -E ".ZHS16GBK" -f my_ldif_file

In this example, the characters are converted from ".ZHS16GBK" (simplified Chinese character set) to ".UTF8" (UTF-8 character set) before they are sent across the wire to the LDAP server.

The client tools always assume UTF-8 to be the character set unless otherwise specified by the -E argument. The BASE64-encoded values are decoded, and then the decoded buffer is converted to UTF-8 if the -E argument is specified. For example, if you specify -E ".ZHS16GBK", then the decoded buffer is converted from simplified Chinese to UTF-8 before being sent to the LDAP server.

Examples: Using the -E Argument with Command Line Tools

The following table provides examples of how to use this additional argument correctly for each command line tool. In each example, the command converts data from simplified Chinese, as specified by the value ".ZHS16GBK", to UTF-8. For example, in each command, the values for the -D and -w options are in simplified Chinese. Specifying the -E argument converts them to UTF-8.

Note that, in the examples in the following table, we do not show any actual characters belonging to .ZHS16GBK character set. These examples would, therefore, work without the -E argument. However, if the argument values contained actual characters in the .ZHS16GBK character set, then we would need to use the -E argument.

See Also:
Appendix A for syntax and usage notes for each of the command line tools

Tool	Example
ldapbind	`ldapbind -h my_host -p 389 -E ".ZHS16GBK" -D o=acme,c=us -w my_password`
ldapsearch	`ldapsearch -h my_host -p 389 -E ".ZHS16GBK" -D o=acme,c=us -w my_password`
ldapadd	`ldapadd -h my_host -p 389 -E ".ZHS16GBK" -D o=acme,c=us -w my_password`
ldapaddmt	`ldapaddmt -h my_host -p 389 -E ".ZHS16GBK" -D o=acme,c=us -w my_password`
ldapmodify	`ldapmodify -h my_host -p 389 -E ".ZHS16GBK" -D o=acme,c=us -w my_password`
ldapmodifymt	`ldapmodifymt -h my_host -p 389 -E ".ZHS16GBK" -D o=acme,c=us -w my_password`
ldapdelete	`ldapdelete -h my_host -p 389 -E ".ZHS16GBK" -D o=acme,c=us -w my_password`
ldapcompare	`ldapcompare -h my_host -p 389 -E ".ZHS16GBK" -D o=acme,c=us -w my_password -b ou=Construction,ou=Manufacturing,o=acme,c=us -a title -v manager`
ldapmoddn	`ldapmoddn -h my_host -p 389 -E ".ZHS16GBK" -D o=acme,c=us -w my_password -b cn=Franklin Badlwins,ou=Construction,ou=Manufacturing,o=acme,c=us -N ou=Contracting,ou=Manufacturing,o=acme,c=us -r`

Setting NLS_LANG in the Client Environment

If the output required by the client is UTF-8, then you do not need to set the NLS_LANG environment variable. In this case, the NLS_LANG environment variable defaults to ".UTF8", and both the input path from client to server, and the output path from server to client, do not require any character set conversion.

If the output required by the client is not UTF-8, then you must set the NLS_LANG environment variable. This ensures that proper character set conversion can occur from the UTF-8 character set to the character set you specify for the NLS_LANG environment variable.

For example, if the NLS_LANG environment variable is set to the simplified Chinese character set, then the command line tool displays output in that character set. Otherwise the output defaults to the UTF-8 character set.

Using NLS with Bulk Tools

Oracle Internet Directory ensures that the reading and writing of text data from and to the LDIF files are done in UTF-8 encoding as specified by LDAP.

This section provides an example of the argument you use for each of the following bulk tools:

bulkload

Add to the command the argument -encode ".character_set" where the input LDIF file is encoded in ".character_set".

For example:

bulkload.sh -connect net_service_name -encode ".ZHS16GBK" -check -generate -load 
my_ldif_file

ldifwrite

The ldifwrite utility always writes BASE64 encoded values for multibyte strings.

The BASE64 encoding could be of the UTF8 strings as they are stored in the database, or of native strings as specified by the NLS_LANG environment variable setting when running ldifwrite.

For example:

ldifwrite -c net_service_name -b baseDN -f output_file

In this example, if the NLS_LANG environment variable is not set, or is set to language_territory.UTF8, then the output LDIF file will contain a BASE64 encoded value of UTF-8 strings.

To reload this LDIF file into the directory by using ldapaddmt, use the following syntax:

ldapaddmt -h host -p port -f output_file

In the above case, the -E argument is not required because the decoded BASE64 strings are already in UTF8 and can be readily sent to the server.

If the NLS_LANG environment variable is set to a character set other than UTF-8--for example, ".ZHS16GBK"--then the output LDIF file will contain a BASE64 encoded value of simplified Chinese (.ZHS16GBK) strings.

To Reload this LDIF file into the directory using ldapaddmt, use the following syntax:

ldapaddmt -h host -p port -E ".ZHS16GBK" -f output_file

In the above case the -E argument is required because the decoded BASE64 strings are simplified Chinese, which need to be converted to UTF8 strings before being sent to the server.

bulkdelete

Add to the command the argument -encode ".character_set".

For example:

bulkdelete.sh -connect net_service_name -encode ".ZHS16GBK" -base 
ou=manufacturing,o=acme,c=us -size 100

In this case the value for the -base option could be in the ZHS16GBK native character set, that is, simplified Chinese.

bulkmodify

Add to the command the argument -E ".character_set".

For example:

bulkmodify -c net_service_name -E ".ZHS16GBK" -b ou=manufacturing,o=acme,c=us -r 
title -v Foreman -f filter -s 100

The above values for the -b, -v, and -f arguments could be specified in native character set.

11Managing National Language Support (NLS)

The NLS_LANG Environment Variable

Using NLS with LDIF Files

An LDIF file Containing Only ASCII Strings

An LDIF file Containing UTF-8 Encoded Strings

CASE 1: Native Strings (Non-UTF8)

CASE 2: UTF-8 Strings

CASE 3: BASE64 Encoding of UTF8 Strings

CASE 4: BASE64 Encoding of Native Strings

Using NLS with Command Line Tools

Specifying the -E Argument When Using Each Tool

Examples: Using the -E Argument with Command Line Tools

Setting NLS_LANG in the Client Environment

Using NLS with Bulk Tools

bulkload

ldifwrite

bulkdelete

bulkmodify

11
Managing National Language Support (NLS)