Oracle9i Globalization Support Guide
Release 1 (9.0.1)

Part Number A90236-02
Go To Documentation Library
Home
Go To Product List
Book List
Go To Table Of Contents
Contents
Go To Index
Index

Master Index

Feedback

Go to previous page Go to next page

11
Oracle Locale Builder Utility

This chapter describes the Oracle Locale Builder Utility. It includes the following topics:

Overview of the Locale Builder Utility

The Locale Builder offers an easy and efficient way to access and define NLS locale data definitions. It provides a graphical user interface through which you can easily view, modify, and define locale-specific data. It extracts data from the text and binary definition files and presents them in a readable format, so you can process the information without worrying about the specific definition formats used in these files.

The Locale Builder handles four types of locale definitions: language, territory, character set, and linguistic sort. It also supports user-defined characters and customized linguistic rules. You can view definitions in existing text and binary definition files and make changes to them or create your own definitions.

Configuring Unicode Fonts for the Locale Builder

The Locale builder uses Unicode characters in many of its functions. For example, it shows the mapping of local character codepoints to Unicode codepoints.Therefore, Oracle Corporation recommends that you use a Unicode font to fully support the Locale Builder. If a character cannot be rendered with your local fonts, it will probably be displayed as an empty box.

Font Configuration on Windows

There are many Windows TrueType and OpenType fonts that support Unicode. Oracle Corporation recommends using the Arial Unicode MS from Microsoft, because it includes about 51,000 glyphs and covers most of the characters in Unicode 3.0.

After installing the Unicode font, add the font to the Java Runtime so it can be used by the Oracle Locale Builder. The Java Runtime uses a font configuration file to map predefined Java virtual fonts to fonts that are available on Windows. The name of the configuration file is font.properties and it is located in the $JAVAHOME/lib directory. For example, to include the installed Arial Unicode MS font, add the following entry to the font.properties file:

dialog.n = Arial Unicode MS, DEFAULT_CHARSET

where n is next available sequence number to which you want to assign the Arial Unicode MS font in the font list. Java Runtime looks through the font mapping list for each virtual font and use the first font available on your system.

Add an entry for the new font to each font mapping list that you want the new font to be used for. After editing the font.properties file, restart the Locale Builder so it can use the new fonts.


Note:

For a detailed description of the font.properties file format, visit Sun's internationalization website. 


Font Configuration on Other Platforms

In general, there are fewer choices of Unicode fonts for non-Windows platforms than for Windows platforms. If you cannot find a Unicode font with satisfactory character coverage, you can use multiple fonts to cover the different languages. For each font that you want to add to the Java Runtime, install the font and add the font entries into the font.properties file using the steps described above for the Windows platform.

For example, to display Japanese characters on Sun Solaris using the font ricoh-hg mincho, add an entry to the existing font.properties file in $JAVAHOME/lib.

serif.plain.0=-monotype-times new roman-regular-r---*-%d-*-*-p-*-iso8859-1
serif.plain.1=-urw-itc 
zapfdingbats-medium-r-normal--*-%d-*-*-p-*-sun-fontspecific
serif.plain.2=-*-symbol-medium-r-normal-*-*-%d-*-*-p-*-sun-fontspecific
serif.plain.3=-ricoh-hg mincho l-medium-r-normal--*-%d-*-*-m-*-jisx0201.1976-0

For font availability, refer to your operating system specific documentation.

The Locale Builder Interface

Ensure that the ORACLE_HOME initialization parameter is set before starting the Builder.

Start the Locale Builder at the Unix prompt by issuing the following command:

% lbuilder

After you start the Locale Builder, the screen illustrated in Figure 11-1 appears.

Figure 11-1 Locale Builder Utility


Text description of pic1.gif follows.
Text description of the illustration pic1.gif

Locale Builder General Screens

Before beginning with specific tasks, you might want to become familiar with the general screens that you can use at different times. These screens are:

Restrictions

The following restrictions apply when choosing locale object names:

Figure 11-2 Existing Definitions Dialog Box


Text description of pic17.gif follows.
Text description of the illustration pic17.gif

The Existing Definitions dialog box allows you to open locale objects by name. If you know a specific language, territory, linguistic sort (collation), or character set that you want to start with, click on the displayed value. For example, you can open the AMERICAN language definition file, as shown in Figure 11-2. In this case, you will open the lx00001.nlb file.

Abbreviations are for reference only and cannot be opened.

Figure 11-3 Session Log Dialog Box


Text description of pic22.gif follows.
Text description of the illustration pic22.gif

The Session Log dialog box shows what actions have been taken in a given session. This way, you can keep a record of all changes and, if necessary, undo or modify past changes. Figure 11-3 illustrates a typical example.

Figure 11-4 Previewing the NLT File Dialog Box


Text description of pic6.gif follows.
Text description of the illustration pic6.gif

Figure 11-4 illustrates viewing an NLT file. It is a text file with the file extension .ntl which shows the settings for a specific language, territory, character set, or linguistic sort are kept. The NLT file is not modifiable from this dialog box. Instead, the purpose is to present an easily readable form of the file for you to see if your changes look correct. You must use the specific elements of the Locale Builder to modify the NLT file.

Figure 11-5 Open File Dialog Box


Text description of pic16.gif follows.
Text description of the illustration pic16.gif

The Open File dialog box opens an NLB file so you can modify it or use it as a template. The NLB file is a binary file with the file extension .nlb that contains the binary equivalent of the information in the NLT file. Figure 11-5 illustrates opening lx00001.nlb, which is for the language definition for AMERICAN. By highlighting Preview, you can see what type of NLB file you have selected.

Setting the Language Definition with the Locale Builder

This section will use a sample scenario of creating a new language based on French. This new language will be called AMERICAN FRENCH. First, you need to open FRENCH from the Existing Definitions dialog box. Figure 11-6 illustrates the first screen.

Figure 11-6 Language General Information


Text description of pic2.gif follows.
Text description of the illustration pic2.gif

Figure 11-6 illustrates a user-defined setting of AMERICAN FRENCH and a user-defined abbreviation of AF. The ISO Abbreviation field is not limited to standard ISO abbreviations, so you can create your own: AF, in this case. The Default settings are inherited and optional. You can build upon an inherited setting and modify it to add additional properties.

The valid range for the language ID field for a user-defined language is 1,000 to 10,000.

Figure 11-7 Language Definition Month Information


Text description of pic3.gif follows.
Text description of the illustration pic3.gif

Figure 11-7 illustrates how to set month names using the Month Names tab. All names are shown as they appear in the NLT file. If you set NLS_LANG to AMERICAN FRENCH, the rules shown in the figure apply.

Figure 11-8 Language Definition Type Information


Text description of pic4.gif follows.
Text description of the illustration pic4.gif

Figure 11-8 illustrates the Day Names tab, which allows you to choose default day names. All names are shown as they appear in the NLT file. If you set NLS_LANG to AMERICAN FRENCH, the rules in the figure apply.

Setting the Territory Definition with the Locale Builder

This section will use a sample scenario of creating a new territory called REDWOOD SHORES, and use RS as an abbreviation for it. In this case, we will create a new definition that is not based on an existing one.

The basic tasks are to assign a name and choose calendar, number, date/time, and currency formats. Figure 11-9 illustrates how to begin.

Figure 11-9 Territory Definition General Information


Text description of pic7.gif follows.
Text description of the illustration pic7.gif

In Figure 11-9, we have manually inserted REDWOOD SHORES and RS for a new territory.

The valid range for the territory ID field for a user-defined territory is 1,000 to 10,000.

See Also:

Chapter 3, "Setting Up a Globalization Support Environment" 

Figure 11-10 Territory Definition Calendar


Text description of pic8.gif follows.
Text description of the illustration pic8.gif

Figure 11-10 illustrates how to set Calendar characteristics. Clicking on a radio button causes the Calendar Sample to display sample output. In this case, Tuesday is the first day of the week.

Figure 11-11 Territory Definition Date and Time Conventions


Text description of pic9.gif follows.
Text description of the illustration pic9.gif

Figure 11-11 illustrates typical date and time settings. Sample formats are displayed when you choose a setting from the drop-down menus. In this case, we set the default date format for REDWOOD SHORES to YY/MM/DD instead of the typical territory default of DD-MM-YY.

You can also create your own formats instead of using the selection from the drop-down menus.

Figure 11-12 Territory Definition Number Conventions


Text description of pic10.gif follows.
Text description of the illustration pic10.gif

Figure 11-12 illustrates typical number settings. Sample formats are displayed when you choose a setting from the drop-down menus. The default for number grouping is 3, but 4 is used in this case.

You can type your own values instead of using the drop-down menus.

Figure 11-13 Territory Definition Monetary Conventions


Text description of pic11.gif follows.
Text description of the illustration pic11.gif

Figure 11-13 illustrates how to set monetary conventions for territories. Note that the default International Currency Separator is a blank space, so it is not visible in the screen. In this case, we chose the Euro as an alternate currency symbol.

You can type your own values instead of using the drop-down menus.

Setting the Character Set Definition with the Locale Builder

In some cases, you may wish to tailor a character set to meet specific user needs. In Oracle9i, you can extend an existing encoded character set definition to suit your needs. User-defined characters are often used to encode special characters representing:

This section describes how Oracle supports user-defined character. It describes:

Character Sets with User-Defined Characters

User-defined characters are typically supported within East Asian character sets. These East Asian character sets have at least one range of reserved codepoints for use as user-defined characters. For example, Japanese Shift JIS preserves 1880 codepoints for user-defined characters as follows:

Table 11-1 Shift JIS Codepoint Example  
Japanese Shift JIS UDC Range  Number of Codepoints 

F040-F07E, F080-F0FC 

188 

F140-F17E, F180-F1FC 

188 

F240-F27E, F280-F2FC 

188 

F340-F37E, F380-F3FC 

188 

F440-F47E, F480-F4FC 

188 

F540-F57E, F580-F5FC 

188 

FF640-F67E, F680-F6FC 

188 

F740-F77E, F780-F7FC 

188 

F840-F87E, F880-F8FC 

188 

F940-F97E, F980-F9FC 

188 

The Oracle character sets listed in Table 11-2 contain pre-defined ranges that allow you to support user-defined characters:

Table 11-2 Oracle Character Sets with UDC  
Character Set Name  Number of UDC Codepoints Available 

JA16DBCS 

4370 

JA16EBCDIC930 

4370 

JA16SJIS 

1880 

JA16SJISYEN 

1880 

KO16DBCS 

1880 

KO16MSWIN949 

1880 

ZHS16DBCS 

1880 

ZHS16GBK 

2149 

ZHT16DBCS 

6204 

ZHT16MSWIN950 

6217 

Oracle's Character Set Conversion Architecture

The codepoint value that represents a particular character may vary among different character sets. For example, the Japanese kanji character:

Figure 11-14 Kanji Example


Text description of char2.gif follows.
Text description of the illustration char2.gif

is encoded as follows in different Japanese character sets:

Table 11-3 Kanji Example with Character Conversion
Character Set  Unicode  JA16SJIS  JA16EUC  JA16DBCS 

Character Value of Text description of char2.gif follows.
Text description of the illustration char2.gif
 

4E9C 

889F 

B0A1 

4867 

In Oracle, all character sets are defined in terms of a Unicode 3.0 code point. That is, each character is defined as a Unicode 3.0 code value. Character conversion takes place transparently to users by using Unicode as the intermediate form. For example, when a JA16SJIS client connects to a JA16EUC database, the character shown in Figure 11-14, "Kanji Example" (value 889F) entered from the JA16SJIS client is internally converted to Unicode (value 4E9C), and then converted to JA16EUC(value B0A1).

Unicode 3.1 Private Use Area

Unicode 3.0 reserves the range E000-F8FF for the Private Use Area (PUA). The PUA is intended for private use character definition by end users or vendors.

User-defined characters can be converted between two Oracle character sets by using Unicode 3.0 PUA as the intermediate form, the same as standard characters.

UDC Cross References

User-defined character cross references between Japanese character sets, Korean character sets, Simplified Chinese character sets and Traditional Chinese character sets are contained in the following distribution sets:

${ORACLE_HOME}/ocommon/nls/demo/udc_ja.txt
${ORACLE_HOME}/ocommon/nls/demo/udc_ko.txt
${ORACLE_HOME}/ocommon/nls/demo/udc_zhs.txt
${ORACLE_HOME}/ocommon/nls/demo/udc_zht.txt

These cross references are useful when registering user-defined characters across operating systems. For example, when registering a new user-defined character on both a Japanese Shift-JIS operating system and a Japanese IBM Host operating system, you may want to pick up F040 on Shift-JIS operating system and 6941 on IBM Host operating system for the new user-defined character so that Oracle can convert correctly between JA16SJIS and JA16DBCS. You can find out that both Shift-JIS UDC value F040 and IBM Host UDC value 6941 are mapped to the same Unicode PUA value E000 in the user-defined character cross reference.

See Also:

Appendix B, "Unicode Character Code Assignments" for more information about customizing a character set definition file 

Character Set Definition File Conventions

By default, the Locale Builder generates the next available character set name for you. You can, however, generate your own character set name. You should follow certain conventions when creating a character set. In particular, the convention used for naming character set definition NLT files is the format: lx2dddd.nlt, where dddd = 4 digit Character Set ID in hex.

A few things to note when editing a character set definition file:

If a character set is derived from an existing Oracle character set, Oracle Corporation recommends using the following character set naming convention:

<Oracle_character_set_name><organization_name>EXT<version>

For example, if a company such as Sun Microsystems were adding user-defined characters to the JA16EUC character set, the following character set name might be appropriate:

JA16EUCSUNWEXT1

where:

Locale Builder Character Set Scenario

This section show how to create a new character set called MYCHARSET and use 10001 for its recommended ID number. The scenario will start with an ASCII character set and add 10 Chinese characters. First, open US7ASCII from the Existing Definitions dialog box. Figure 11-15 illustrates how to begin.

Figure 11-15 Character Set General Information


Text description of pic12.gif follows.
Text description of the illustration pic12.gif

In Figure 11-15, the ISO Character Set ID and Base Character Set ID fields are optional. The Base Character Set ID is used for inheriting values so that the base character set's properties are used as a starting template. The Character Set ID is automatically generated, although you can override it. The valid range for a user-defined character set ID is 10,000 to 20,000.

Figure 11-16 Character Set Type Specifications


Text description of pic13.gif follows.
Text description of the illustration pic13.gif

Figure 11-16 illustrates how to change certain character set specifications. This should not normally be necessary.

When you open a character set, all possible settings for this tab should already be set to appropriate settings. You should keep these settings unless you have a specific reason for changing them. If you need to change the settings, use the following guidelines:

Figure 11-17 Character Set User-Defined General Information


Text description of pic14.gif follows.
Text description of the illustration pic14.gif

Figure 11-17 illustrates how to add user-defined characters. In this case, you can add characters after 0xfe. You can add one character at a time or use a text file to import a large number of characters. In this example, we first import a file containing the following characters:

88a2 963f
88a3 54c0
88a4 611b
88a5 6328
88a6 59f6
88a7 9022
88a8 8475
88a9 831c
88aa 7a50
88ab 60aa

Figure 11-18 Character Set Characters


Text description of pic15.gif follows.
Text description of the illustration pic15.gif

Figure 11-18 illustrates the new characters added after 0xfe. We imported the characters in this case from a file having two columns, with the left column being the local code value and the right column being its Unicode mapping.

Sorting with the Locale Builder

This section shows how to create a new multilingual linguistic sort called MY_GENERIC_M, and use 10001 for its ID number. The choice of sort name is based on the convention GENERIC_M representing a multilingual ISO sort. In this case, we use GENERIC_M as a starting point. Figure 11-15 illustrates how to begin.

Figure 11-19 Collation General Information


Text description of pic18.gif follows.
Text description of the illustration pic18.gif

Typical settings for the flags are automatically derived. SWAP_WITH_NEXT is relevant for Thai and Lao sorts. REVERSE_SECONDARY is for French sorts. CANONICAL_EQUIVALENCE determines whether canonical rules will be used.

Collation ID (sort ID) valid ranges for a user-defined sort are 1,000 to 2,000 for monolingual collation and 10,000 to 11,000 for multilingual collation.

See Also:

 

Figure 11-20 Collation Unicode Collation


Text description of pic19.gif follows.
Text description of the illustration pic19.gif

In this scenario, we will move digits so they sort after letters. To do this, we will delete their codepoint values and paste them after the codepoint values of the letters.

Figure 11-20 illustrates selecting a value. Click Delete and paste the value where you want it. Clicking Paste brings up the Collation Pasting Dialog Box, shown in Figure 11-21.

Figure 11-21 Collation Pasting Dialog Box


Text description of pic20.gif follows.
Text description of the illustration pic20.gif

In Figure 11-21, choose where to put the deleted node and at what sort level you want it.

Figure 11-22 Collation Unicode Collation After Pasting


Text description of pic21.gif follows.
Text description of the illustration pic21.gif

In Figure 11-22, we selected the digits 0-7 were moved from their original place before letters a-z to a place after the letters a-z. For multibyte linguistic sorts, the Locale Builder cannot display accented characters, but you can change their sort order.

Changing the Sort Order for Accented Characters

The next scenario is to change the sort order for accented characters. You can do this by changing the sort for all characters containing a particular accent mark or by changing one character at a time. In this example, we change the sort of all characters with a circumflex (for example, û) to go after all characters containing a tilde.

First, we verify the current sort order by choosing Canonical Rules under the Tools menu. This brings up the Canonical Rules dialog box, illustrated in Figure 11-23.

Figure 11-23 Collation-Canonical Rules


Text description of ex1.gif follows.
Text description of the illustration ex1.gif

Figure 11-23 illustrates how characters are decomposed into their canonical equivalents and their current sorting orders. For example, ä is represented as a plus an umlaut. In this case, we change the sort for all characters with a circumflex so they follow characters with tildes. This example uses a base character of u.

See Also:

Chapter 4, "Linguistic Sorting" for more information about canonical rules 

Click on the Non-Spacing tab. If you use the Non-Spacing tab, changes for accent marks apply to all characters.

Figure 11-24 Collation-Changing Several Characters


Text description of ex2.gif follows.
Text description of the illustration ex2.gif

After selecting the circumflex, click Cut and accept the confirmation. Then all characters with a circumflex will have their sort order changed.

Figure 11-25 Collation-Changing Several Characters


Text description of ex3.gif follows.
Text description of the illustration ex3.gif

Figure 11-25 illustrates the new order.

Changing the Sort Order for One Accented Character

To change the order of a specific accented character, you need to insert the character directly into the appropriate order position. In this scenario, we will change the sort order for ä so that it sorts after Z. First, we select the Unicode Collation tab. Next, we highlight the character next to the one we want, Z in this case. Finally, we click Add, which brings up a Paste dialog box.

Figure 11-26 Collation-Changing One Character


Text description of ex5.gif follows.
Text description of the illustration ex5.gif

As illustrated in Figure 11-26, we choose After and Primary and manually type in \x00e4, which is the code point for ä.

We chose Primary for the level because that is the Unicode standard for differentiating between characters having different base letters. A Secondary or Tertiary level sort would also have the same practical results.

Figure 11-27 Collation-Changing a Single Character


Text description of ex4.gif follows.
Text description of the illustration ex4.gif

Figure 11-27 shows the final result, and displays the ä correctly.

Generating NLB Files

After you have defined a new language, territory, character set, or linguistic sort, generate new NLB files from the NLT files:

  1. Choose Tools > Generate NLB or click the Generate NLB icon in the left side bar.

  2. Click Browse to find the directory where the NLT file is located. The location dialog box is shown in Figure 11-28.

Figure 11-28 Generate NLB File


Text description of ex6.gif follows.
Text description of the illustration ex6.gif

Do not try to specify an NLT file. Oracle Locale Builder generates an NLB file for each NLT file.

  • Click OK to generate the NLB files.

    Using the New NLB Files

    The new NLB files do not take effect until you perform the following steps:

    1. Copy the NLB files and the lxlboot.nlb file into the path that is specified by the ORA_NLS33 initialization parameter, typically $ORACLE_HOME/OCOMMON/nls/admin/data.

    2. Restart the database.

    Figure 11-29 illustrates the final notification that you have successfully generated NLB files for all NLT files in the directory.

    Figure 11-29 NLB Generation Confirmation


    Text description of ex7.gif follows.
    Text description of the illustration ex7.gif


  • Go to previous page Go to next page
    Oracle
    Copyright © 1996-2001, Oracle Corporation.

    All Rights Reserved.
    Go To Documentation Library
    Home
    Go To Product List
    Book List
    Go To Table Of Contents
    Contents
    Go To Index
    Index

    Master Index

    Feedback