11
Oracle Locale Builder Utility

This chapter describes the Oracle Locale Builder Utility. It includes the following topics:

Overview of the Locale Builder Utility

The Locale Builder offers an easy and efficient way to access and define NLS locale data definitions. It provides a graphical user interface through which you can easily view, modify, and define locale-specific data. It extracts data from the text and binary definition files and presents them in a readable format, so you can process the information without worrying about the specific definition formats used in these files.

The Locale Builder handles four types of locale definitions: language, territory, character set, and linguistic sort. It also supports user-defined characters and customized linguistic rules. You can view definitions in existing text and binary definition files and make changes to them or create your own definitions.

Configuring Unicode Fonts for the Locale Builder

The Locale builder uses Unicode characters in many of its functions. For example, it shows the mapping of local character codepoints to Unicode codepoints.Therefore, Oracle Corporation recommends that you use a Unicode font to fully support the Locale Builder. If a character cannot be rendered with your local fonts, it will probably be displayed as an empty box.

Font Configuration on Windows

There are many Windows TrueType and OpenType fonts that support Unicode. Oracle Corporation recommends using the Arial Unicode MS from Microsoft, because it includes about 51,000 glyphs and covers most of the characters in Unicode 3.0.

After installing the Unicode font, add the font to the Java Runtime so it can be used by the Oracle Locale Builder. The Java Runtime uses a font configuration file to map predefined Java virtual fonts to fonts that are available on Windows. The name of the configuration file is font.properties and it is located in the $JAVAHOME/lib directory. For example, to include the installed Arial Unicode MS font, add the following entry to the font.properties file:

dialog.n = Arial Unicode MS, DEFAULT_CHARSET

where n is next available sequence number to which you want to assign the Arial Unicode MS font in the font list. Java Runtime looks through the font mapping list for each virtual font and use the first font available on your system.

Add an entry for the new font to each font mapping list that you want the new font to be used for. After editing the font.properties file, restart the Locale Builder so it can use the new fonts.

Note:

For a detailed description of the font.properties file format, visit Sun's internationalization website.

Font Configuration on Other Platforms

In general, there are fewer choices of Unicode fonts for non-Windows platforms than for Windows platforms. If you cannot find a Unicode font with satisfactory character coverage, you can use multiple fonts to cover the different languages. For each font that you want to add to the Java Runtime, install the font and add the font entries into the font.properties file using the steps described above for the Windows platform.

For example, to display Japanese characters on Sun Solaris using the font ricoh-hg mincho, add an entry to the existing font.properties file in $JAVAHOME/lib.

serif.plain.0=-monotype-times new roman-regular-r---*-%d-*-*-p-*-iso8859-1
serif.plain.1=-urw-itc 
zapfdingbats-medium-r-normal--*-%d-*-*-p-*-sun-fontspecific
serif.plain.2=-*-symbol-medium-r-normal-*-*-%d-*-*-p-*-sun-fontspecific
serif.plain.3=-ricoh-hg mincho l-medium-r-normal--*-%d-*-*-m-*-jisx0201.1976-0

For font availability, refer to your operating system specific documentation.

The Locale Builder Interface

Ensure that the ORACLE_HOME initialization parameter is set before starting the Builder.

Start the Locale Builder at the Unix prompt by issuing the following command:

% lbuilder

After you start the Locale Builder, the screen illustrated in Figure 11-1 appears.

Figure 11-1 Locale Builder Utility

Text description of the illustration pic1.gif

Locale Builder General Screens

Before beginning with specific tasks, you might want to become familiar with the general screens that you can use at different times. These screens are:

Existing Definitions Dialog Box

under the General tab
Session Log Dialog Box

under the Tools menu
Previewing the NLT File Dialog Box

as a tab in many tasks
Open File Dialog Box

under the File menu

Note:
Oracle Locale Builder includes online help.

Restrictions

The following restrictions apply when choosing locale object names:

Names must be all ASCII characters
Names must start with a letter

Language, territory, and character set names cannot contain underscores

Note:

Only certain ID ranges are valid values for the user-defined LANGUAGE, TERRITORY, CHARACTER SET, MONOLINGUAL COLLATION, and MULTILINGUAL COLLATION definitions. They are listed in the text about relevant screenshots.

Figure 11-2 Existing Definitions Dialog Box

Text description of the illustration pic17.gif

The Existing Definitions dialog box allows you to open locale objects by name. If you know a specific language, territory, linguistic sort (collation), or character set that you want to start with, click on the displayed value. For example, you can open the AMERICAN language definition file, as shown in Figure 11-2. In this case, you will open the lx00001.nlb file.

Abbreviations are for reference only and cannot be opened.

Figure 11-3 Session Log Dialog Box

Text description of the illustration pic22.gif

The Session Log dialog box shows what actions have been taken in a given session. This way, you can keep a record of all changes and, if necessary, undo or modify past changes. Figure 11-3 illustrates a typical example.

Figure 11-4 Previewing the NLT File Dialog Box

Text description of the illustration pic6.gif

Figure 11-4 illustrates viewing an NLT file. It is a text file with the file extension .ntl which shows the settings for a specific language, territory, character set, or linguistic sort are kept. The NLT file is not modifiable from this dialog box. Instead, the purpose is to present an easily readable form of the file for you to see if your changes look correct. You must use the specific elements of the Locale Builder to modify the NLT file.

Figure 11-5 Open File Dialog Box

Text description of the illustration pic16.gif

The Open File dialog box opens an NLB file so you can modify it or use it as a template. The NLB file is a binary file with the file extension .nlb that contains the binary equivalent of the information in the NLT file. Figure 11-5 illustrates opening lx00001.nlb, which is for the language definition for AMERICAN. By highlighting Preview, you can see what type of NLB file you have selected.

Setting the Language Definition with the Locale Builder

This section will use a sample scenario of creating a new language based on French. This new language will be called AMERICAN FRENCH. First, you need to open FRENCH from the Existing Definitions dialog box. Figure 11-6 illustrates the first screen.

Figure 11-6 Language General Information

Text description of the illustration pic2.gif

Figure 11-6 illustrates a user-defined setting of AMERICAN FRENCH and a user-defined abbreviation of AF. The ISO Abbreviation field is not limited to standard ISO abbreviations, so you can create your own: AF, in this case. The Default settings are inherited and optional. You can build upon an inherited setting and modify it to add additional properties.

The valid range for the language ID field for a user-defined language is 1,000 to 10,000.

Figure 11-7 Language Definition Month Information

Text description of the illustration pic3.gif

Figure 11-7 illustrates how to set month names using the Month Names tab. All names are shown as they appear in the NLT file. If you set NLS_LANG to AMERICAN FRENCH, the rules shown in the figure apply.

Figure 11-8 Language Definition Type Information

Text description of the illustration pic4.gif

Figure 11-8 illustrates the Day Names tab, which allows you to choose default day names. All names are shown as they appear in the NLT file. If you set NLS_LANG to AMERICAN FRENCH, the rules in the figure apply.

Setting the Territory Definition with the Locale Builder

This section will use a sample scenario of creating a new territory called REDWOOD SHORES, and use RS as an abbreviation for it. In this case, we will create a new definition that is not based on an existing one.

The basic tasks are to assign a name and choose calendar, number, date/time, and currency formats. Figure 11-9 illustrates how to begin.

Figure 11-9 Territory Definition General Information

Text description of the illustration pic7.gif

In Figure 11-9, we have manually inserted REDWOOD SHORES and RS for a new territory.

The valid range for the territory ID field for a user-defined territory is 1,000 to 10,000.

Figure 11-10 Territory Definition Calendar

Text description of the illustration pic8.gif

Figure 11-10 illustrates how to set Calendar characteristics. Clicking on a radio button causes the Calendar Sample to display sample output. In this case, Tuesday is the first day of the week.

Figure 11-11 Territory Definition Date and Time Conventions

Text description of the illustration pic9.gif

Figure 11-11 illustrates typical date and time settings. Sample formats are displayed when you choose a setting from the drop-down menus. In this case, we set the default date format for REDWOOD SHORES to YY/MM/DD instead of the typical territory default of DD-MM-YY.

You can also create your own formats instead of using the selection from the drop-down menus.

Figure 11-12 Territory Definition Number Conventions

Text description of the illustration pic10.gif

Figure 11-12 illustrates typical number settings. Sample formats are displayed when you choose a setting from the drop-down menus. The default for number grouping is 3, but 4 is used in this case.

You can type your own values instead of using the drop-down menus.

Figure 11-13 Territory Definition Monetary Conventions

Text description of the illustration pic11.gif

Figure 11-13 illustrates how to set monetary conventions for territories. Note that the default International Currency Separator is a blank space, so it is not visible in the screen. In this case, we chose the Euro as an alternate currency symbol.

You can type your own values instead of using the drop-down menus.

Setting the Character Set Definition with the Locale Builder

In some cases, you may wish to tailor a character set to meet specific user needs. In Oracle9i, you can extend an existing encoded character set definition to suit your needs. User-defined characters are often used to encode special characters representing:

Proper names
Historical Han characters that are not defined in an existing character set standard
Vendor-specific characters
New symbols or characters you define

This section describes how Oracle supports user-defined character. It describes:

Character Sets with User-Defined Characters

User-defined characters are typically supported within East Asian character sets. These East Asian character sets have at least one range of reserved codepoints for use as user-defined characters. For example, Japanese Shift JIS preserves 1880 codepoints for user-defined characters as follows:

Table 11-1 Shift JIS Codepoint Example

Japanese Shift JIS UDC Range	Number of Codepoints
F040-F07E, F080-F0FC	188
F140-F17E, F180-F1FC	188
F240-F27E, F280-F2FC	188
F340-F37E, F380-F3FC	188
F440-F47E, F480-F4FC	188
F540-F57E, F580-F5FC	188
FF640-F67E, F680-F6FC	188
F740-F77E, F780-F7FC	188
F840-F87E, F880-F8FC	188
F940-F97E, F980-F9FC	188

The Oracle character sets listed in Table 11-2 contain pre-defined ranges that allow you to support user-defined characters:

Table 11-2 Oracle Character Sets with UDC

Character Set Name	Number of UDC Codepoints Available
JA16DBCS	4370
JA16EBCDIC930	4370
JA16SJIS	1880
JA16SJISYEN	1880
KO16DBCS	1880
KO16MSWIN949	1880
ZHS16DBCS	1880
ZHS16GBK	2149
ZHT16DBCS	6204
ZHT16MSWIN950	6217

Oracle's Character Set Conversion Architecture

The codepoint value that represents a particular character may vary among different character sets. For example, the Japanese kanji character:

Figure 11-14 Kanji Example

Text description of the illustration char2.gif

is encoded as follows in different Japanese character sets:

Table 11-3 Kanji Example with Character Conversion

Character Set	Unicode	JA16SJIS	JA16EUC	JA16DBCS
Character Value of Text description of the illustration char2.gif	4E9C	889F	B0A1	4867

In Oracle, all character sets are defined in terms of a Unicode 3.0 code point. That is, each character is defined as a Unicode 3.0 code value. Character conversion takes place transparently to users by using Unicode as the intermediate form. For example, when a JA16SJIS client connects to a JA16EUC database, the character shown in Figure 11-14, "Kanji Example" (value 889F) entered from the JA16SJIS client is internally converted to Unicode (value 4E9C), and then converted to JA16EUC(value B0A1).

Unicode 3.1 Private Use Area

Unicode 3.0 reserves the range E000-F8FF for the Private Use Area (PUA). The PUA is intended for private use character definition by end users or vendors.

User-defined characters can be converted between two Oracle character sets by using Unicode 3.0 PUA as the intermediate form, the same as standard characters.

UDC Cross References

User-defined character cross references between Japanese character sets, Korean character sets, Simplified Chinese character sets and Traditional Chinese character sets are contained in the following distribution sets:

${ORACLE_HOME}/ocommon/nls/demo/udc_ja.txt
${ORACLE_HOME}/ocommon/nls/demo/udc_ko.txt
${ORACLE_HOME}/ocommon/nls/demo/udc_zhs.txt
${ORACLE_HOME}/ocommon/nls/demo/udc_zht.txt

These cross references are useful when registering user-defined characters across operating systems. For example, when registering a new user-defined character on both a Japanese Shift-JIS operating system and a Japanese IBM Host operating system, you may want to pick up F040 on Shift-JIS operating system and 6941 on IBM Host operating system for the new user-defined character so that Oracle can convert correctly between JA16SJIS and JA16DBCS. You can find out that both Shift-JIS UDC value F040 and IBM Host UDC value 6941 are mapped to the same Unicode PUA value E000 in the user-defined character cross reference.

See Also:

Appendix B, "Unicode Character Code Assignments" for more information about customizing a character set definition file

Character Set Definition File Conventions

By default, the Locale Builder generates the next available character set name for you. You can, however, generate your own character set name. You should follow certain conventions when creating a character set. In particular, the convention used for naming character set definition NLT files is the format: lx2dddd.nlt, where dddd = 4 digit Character Set ID in hex.

A few things to note when editing a character set definition file:

You should not remap existing characters.
All character mappings must be unique.
New characters should be mapped into the Unicode private use range: e000-f4ff. (Note that the actual Unicode 3.0 private use range is e000-f8ff. However, Oracle reserves f500-f8ff for its own private use.)
No line in the character set definition file can be longer than 80 characters.

If a character set is derived from an existing Oracle character set, Oracle Corporation recommends using the following character set naming convention:

<Oracle_character_set_name><organization_name>EXT<version>

For example, if a company such as Sun Microsystems were adding user-defined characters to the JA16EUC character set, the following character set name might be appropriate:

JA16EUCSUNWEXT1

where:

JA16EUC

Is the character set name defined by Oracle
SUNW

Represents the organization name (company stock trading abbreviation for Sun Microsystems)
EXT

Specifies that this is an extension to the JA16EUC character set
1

Specifies the version

Locale Builder Character Set Scenario

This section show how to create a new character set called MYCHARSET and use 10001 for its recommended ID number. The scenario will start with an ASCII character set and add 10 Chinese characters. First, open US7ASCII from the Existing Definitions dialog box. Figure 11-15 illustrates how to begin.

Figure 11-15 Character Set General Information

Text description of the illustration pic12.gif

In Figure 11-15, the ISO Character Set ID and Base Character Set ID fields are optional. The Base Character Set ID is used for inheriting values so that the base character set's properties are used as a starting template. The Character Set ID is automatically generated, although you can override it. The valid range for a user-defined character set ID is 10,000 to 20,000.

Figure 11-16 Character Set Type Specifications

Text description of the illustration pic13.gif

Figure 11-16 illustrates how to change certain character set specifications. This should not normally be necessary.

When you open a character set, all possible settings for this tab should already be set to appropriate settings. You should keep these settings unless you have a specific reason for changing them. If you need to change the settings, use the following guidelines:

FIXED_WIDTH is to identify character sets whose characters have a uniform length. AL16UTF16 is one example.
BYTE_UNIQUE means the single byte range of codepoints is distinct from multibyte range. An example is JA16EUC.
DISPLAY identifies character sets that have certain character mode characteristics. Arabic and Devanagari character sets are examples.
SHIFT is for certain character sets that require extra shift characters to distinguish between single-byte characters and multibyte characters.

See Also:
Chapter 2, "Choosing a Character Set" for more information about SHIFT In SHIFT Out character sets

Figure 11-17 Character Set User-Defined General Information

Text description of the illustration pic14.gif

Figure 11-17 illustrates how to add user-defined characters. In this case, you can add characters after 0xfe. You can add one character at a time or use a text file to import a large number of characters. In this example, we first import a file containing the following characters:

88a2 963f
88a3 54c0
88a4 611b
88a5 6328
88a6 59f6
88a7 9022
88a8 8475
88a9 831c
88aa 7a50
88ab 60aa

Figure 11-18 Character Set Characters

Text description of the illustration pic15.gif

Figure 11-18 illustrates the new characters added after 0xfe. We imported the characters in this case from a file having two columns, with the left column being the local code value and the right column being its Unicode mapping.

Sorting with the Locale Builder

This section shows how to create a new multilingual linguistic sort called MY_GENERIC_M, and use 10001 for its ID number. The choice of sort name is based on the convention GENERIC_M representing a multilingual ISO sort. In this case, we use GENERIC_M as a starting point. Figure 11-15 illustrates how to begin.

Figure 11-19 Collation General Information

Text description of the illustration pic18.gif

Typical settings for the flags are automatically derived. SWAP_WITH_NEXT is relevant for Thai and Lao sorts. REVERSE_SECONDARY is for French sorts. CANONICAL_EQUIVALENCE determines whether canonical rules will be used.

Collation ID (sort ID) valid ranges for a user-defined sort are 1,000 to 2,000 for monolingual collation and 10,000 to 11,000 for multilingual collation.

See Also:

Figure 11-23, "Collation-Canonical Rules" for more information about canonical rules
Chapter 4, "Linguistic Sorting"

Figure 11-20 Collation Unicode Collation

Text description of the illustration pic19.gif

In this scenario, we will move digits so they sort after letters. To do this, we will delete their codepoint values and paste them after the codepoint values of the letters.

Figure 11-20 illustrates selecting a value. Click Delete and paste the value where you want it. Clicking Paste brings up the Collation Pasting Dialog Box, shown in Figure 11-21.

Figure 11-21 Collation Pasting Dialog Box

Text description of the illustration pic20.gif

In Figure 11-21, choose where to put the deleted node and at what sort level you want it.

Figure 11-22 Collation Unicode Collation After Pasting

Text description of the illustration pic21.gif

In Figure 11-22, we selected the digits 0-7 were moved from their original place before letters a-z to a place after the letters a-z. For multibyte linguistic sorts, the Locale Builder cannot display accented characters, but you can change their sort order.

Changing the Sort Order for Accented Characters

The next scenario is to change the sort order for accented characters. You can do this by changing the sort for all characters containing a particular accent mark or by changing one character at a time. In this example, we change the sort of all characters with a circumflex (for example, û) to go after all characters containing a tilde.

First, we verify the current sort order by choosing Canonical Rules under the Tools menu. This brings up the Canonical Rules dialog box, illustrated in Figure 11-23.

Figure 11-23 Collation-Canonical Rules

Text description of the illustration ex1.gif

Figure 11-23 illustrates how characters are decomposed into their canonical equivalents and their current sorting orders. For example, ä is represented as a plus an umlaut. In this case, we change the sort for all characters with a circumflex so they follow characters with tildes. This example uses a base character of u.

See Also:

Chapter 4, "Linguistic Sorting" for more information about canonical rules

Click on the Non-Spacing tab. If you use the Non-Spacing tab, changes for accent marks apply to all characters.

Figure 11-24 Collation-Changing Several Characters

Text description of the illustration ex2.gif

After selecting the circumflex, click Cut and accept the confirmation. Then all characters with a circumflex will have their sort order changed.

Figure 11-25 Collation-Changing Several Characters

Text description of the illustration ex3.gif

Figure 11-25 illustrates the new order.

Changing the Sort Order for One Accented Character

To change the order of a specific accented character, you need to insert the character directly into the appropriate order position. In this scenario, we will change the sort order for ä so that it sorts after Z. First, we select the Unicode Collation tab. Next, we highlight the character next to the one we want, Z in this case. Finally, we click Add, which brings up a Paste dialog box.

Figure 11-26 Collation-Changing One Character

Text description of the illustration ex5.gif

As illustrated in Figure 11-26, we choose After and Primary and manually type in \x00e4, which is the code point for ä.

We chose Primary for the level because that is the Unicode standard for differentiating between characters having different base letters. A Secondary or Tertiary level sort would also have the same practical results.

Figure 11-27 Collation-Changing a Single Character

Text description of the illustration ex4.gif

Figure 11-27 shows the final result, and displays the ä correctly.

Generating NLB Files

After you have defined a new language, territory, character set, or linguistic sort, generate new NLB files from the NLT files:

Choose Tools > Generate NLB or click the Generate NLB icon in the left side bar.
Click Browse to find the directory where the NLT file is located. The location dialog box is shown in Figure 11-28.

Figure 11-28 Generate NLB File

Text description of the illustration ex6.gif

Do not try to specify an NLT file. Oracle Locale Builder generates an NLB file for each NLT file.

Click OK to generate the NLB files.

Using the New NLB Files

The new NLB files do not take effect until you perform the following steps:

Copy the NLB files and the lxlboot.nlb file into the path that is specified by the ORA_NLS33 initialization parameter, typically $ORACLE_HOME/OCOMMON/nls/admin/data.
Restart the database.

Figure 11-29 illustrates the final notification that you have successfully generated NLB files for all NLT files in the directory.

Figure 11-29 NLB Generation Confirmation

Text description of the illustration ex7.gif

11 Oracle Locale Builder Utility

Overview of the Locale Builder Utility

Configuring Unicode Fonts for the Locale Builder

Font Configuration on Windows

Font Configuration on Other Platforms

The Locale Builder Interface

Figure 11-1 Locale Builder Utility

Locale Builder General Screens

Restrictions

Figure 11-2 Existing Definitions Dialog Box

Figure 11-3 Session Log Dialog Box

Figure 11-4 Previewing the NLT File Dialog Box

Figure 11-5 Open File Dialog Box

Setting the Language Definition with the Locale Builder

Figure 11-6 Language General Information

Figure 11-7 Language Definition Month Information

Figure 11-8 Language Definition Type Information

Setting the Territory Definition with the Locale Builder

Figure 11-9 Territory Definition General Information

Figure 11-10 Territory Definition Calendar

Figure 11-11 Territory Definition Date and Time Conventions

Figure 11-12 Territory Definition Number Conventions

Figure 11-13 Territory Definition Monetary Conventions

Setting the Character Set Definition with the Locale Builder

Character Sets with User-Defined Characters

Table 11-1 Shift JIS Codepoint Example

Table 11-2 Oracle Character Sets with UDC

Oracle's Character Set Conversion Architecture

Figure 11-14 Kanji Example

Table 11-3 Kanji Example with Character Conversion

Unicode 3.1 Private Use Area

UDC Cross References

Character Set Definition File Conventions

Locale Builder Character Set Scenario

Figure 11-15 Character Set General Information

Figure 11-16 Character Set Type Specifications

Figure 11-17 Character Set User-Defined General Information

Figure 11-18 Character Set Characters

Sorting with the Locale Builder

Figure 11-19 Collation General Information

Figure 11-20 Collation Unicode Collation

Figure 11-21 Collation Pasting Dialog Box

Figure 11-22 Collation Unicode Collation After Pasting

Changing the Sort Order for Accented Characters

Figure 11-23 Collation-Canonical Rules

Figure 11-24 Collation-Changing Several Characters

Figure 11-25 Collation-Changing Several Characters

Changing the Sort Order for One Accented Character

Figure 11-26 Collation-Changing One Character

Figure 11-27 Collation-Changing a Single Character

Generating NLB Files

Figure 11-28 Generate NLB File

Using the New NLB Files

Figure 11-29 NLB Generation Confirmation

11
Oracle Locale Builder Utility