Oracle9i Globalization Support Guide Release 1 (9.0.1) Part Number A90236-02 |
|
This chapter describes the Oracle Locale Builder Utility. It includes the following topics:
The Locale Builder offers an easy and efficient way to access and define NLS locale data definitions. It provides a graphical user interface through which you can easily view, modify, and define locale-specific data. It extracts data from the text and binary definition files and presents them in a readable format, so you can process the information without worrying about the specific definition formats used in these files.
The Locale Builder handles four types of locale definitions: language, territory, character set, and linguistic sort. It also supports user-defined characters and customized linguistic rules. You can view definitions in existing text and binary definition files and make changes to them or create your own definitions.
The Locale builder uses Unicode characters in many of its functions. For example, it shows the mapping of local character codepoints to Unicode codepoints.Therefore, Oracle Corporation recommends that you use a Unicode font to fully support the Locale Builder. If a character cannot be rendered with your local fonts, it will probably be displayed as an empty box.
There are many Windows TrueType
and OpenType
fonts that support Unicode. Oracle Corporation recommends using the Arial
Unicode
MS
from Microsoft, because it includes about 51,000 glyphs and covers most of the characters in Unicode 3.0.
After installing the Unicode font, add the font to the Java Runtime so it can be used by the Oracle Locale Builder. The Java Runtime uses a font configuration file to map predefined Java virtual fonts to fonts that are available on Windows. The name of the configuration file is font.properties
and it is located in the $JAVAHOME/lib
directory. For example, to include the installed Arial
Unicode
MS
font, add the following entry to the font.properties
file:
dialog.n
= Arial Unicode MS, DEFAULT_CHARSET
where n
is next available sequence number to which you want to assign the Arial
Unicode
MS
font in the font list. Java Runtime looks through the font mapping list for each virtual font and use the first font available on your system.
Add an entry for the new font to each font mapping list that you want the new font to be used for. After editing the font.properties
file, restart the Locale Builder so it can use the new fonts.
In general, there are fewer choices of Unicode fonts for non-Windows platforms than for Windows platforms. If you cannot find a Unicode font with satisfactory character coverage, you can use multiple fonts to cover the different languages. For each font that you want to add to the Java Runtime, install the font and add the font entries into the font.properties
file using the steps described above for the Windows platform.
For example, to display Japanese characters on Sun Solaris using the font ricoh-hg
mincho
, add an entry to the existing font.properties
file in $JAVAHOME/lib
.
serif.plain.0=-monotype-times new roman-regular-r---*-%d-*-*-p-*-iso8859-1 serif.plain.1=-urw-itc zapfdingbats-medium-r-normal--*-%d-*-*-p-*-sun-fontspecific serif.plain.2=-*-symbol-medium-r-normal-*-*-%d-*-*-p-*-sun-fontspecific serif.plain.3=-ricoh-hg mincho l-medium-r-normal--*-%d-*-*-m-*-jisx0201.1976-0
For font availability, refer to your operating system specific documentation.
Ensure that the ORACLE_HOME
initialization parameter is set before starting the Builder.
Start the Locale Builder at the Unix prompt by issuing the following command:
% lbuilder
After you start the Locale Builder, the screen illustrated in Figure 11-1 appears.
Before beginning with specific tasks, you might want to become familiar with the general screens that you can use at different times. These screens are:
under the General tab
under the Tools menu
as a tab in many tasks
under the File menu
The following restrictions apply when choosing locale object names:
The Existing Definitions dialog box allows you to open locale objects by name. If you know a specific language, territory, linguistic sort (collation), or character set that you want to start with, click on the displayed value. For example, you can open the AMERICAN
language definition file, as shown in Figure 11-2. In this case, you will open the lx00001.nlb file.
Abbreviations are for reference only and cannot be opened.
The Session Log dialog box shows what actions have been taken in a given session. This way, you can keep a record of all changes and, if necessary, undo or modify past changes. Figure 11-3 illustrates a typical example.
Figure 11-4 illustrates viewing an NLT file. It is a text file with the file extension .ntl
which shows the settings for a specific language, territory, character set, or linguistic sort are kept. The NLT file is not modifiable from this dialog box. Instead, the purpose is to present an easily readable form of the file for you to see if your changes look correct. You must use the specific elements of the Locale Builder to modify the NLT file.
The Open File dialog box opens an NLB file so you can modify it or use it as a template. The NLB file is a binary file with the file extension .nlb
that contains the binary equivalent of the information in the NLT file. Figure 11-5 illustrates opening lx00001.nlb
, which is for the language definition for AMERICAN
. By highlighting Preview, you can see what type of NLB file you have selected.
This section will use a sample scenario of creating a new language based on French. This new language will be called AMERICAN FRENCH
. First, you need to open FRENCH
from the Existing Definitions dialog box. Figure 11-6 illustrates the first screen.
Figure 11-6 illustrates a user-defined setting of AMERICAN FRENCH
and a user-defined abbreviation of AF
. The ISO Abbreviation field is not limited to standard ISO abbreviations, so you can create your own: AF
, in this case. The Default settings are inherited and optional. You can build upon an inherited setting and modify it to add additional properties.
The valid range for the language ID field for a user-defined language is 1,000 to 10,000.
Figure 11-7 illustrates how to set month names using the Month Names tab. All names are shown as they appear in the NLT file. If you set NLS_LANG
to AMERICAN
FRENCH
, the rules shown in the figure apply.
Figure 11-8 illustrates the Day Names tab, which allows you to choose default day names. All names are shown as they appear in the NLT file. If you set NLS_LANG
to AMERICAN
FRENCH
, the rules in the figure apply.
This section will use a sample scenario of creating a new territory called REDWOOD SHORES
, and use RS
as an abbreviation for it. In this case, we will create a new definition that is not based on an existing one.
The basic tasks are to assign a name and choose calendar, number, date/time, and currency formats. Figure 11-9 illustrates how to begin.
In Figure 11-9, we have manually inserted REDWOOD
SHORES
and RS
for a new territory.
The valid range for the territory ID field for a user-defined territory is 1,000 to 10,000.
Figure 11-10 illustrates how to set Calendar characteristics. Clicking on a radio button causes the Calendar Sample to display sample output. In this case, Tuesday is the first day of the week.
Figure 11-11 illustrates typical date and time settings. Sample formats are displayed when you choose a setting from the drop-down menus. In this case, we set the default date format for REDWOOD SHORES
to YY/MM/DD
instead of the typical territory default of DD-MM-YY
.
You can also create your own formats instead of using the selection from the drop-down menus.
Figure 11-12 illustrates typical number settings. Sample formats are displayed when you choose a setting from the drop-down menus. The default for number grouping is 3, but 4 is used in this case.
You can type your own values instead of using the drop-down menus.
Figure 11-13 illustrates how to set monetary conventions for territories. Note that the default International Currency Separator is a blank space, so it is not visible in the screen. In this case, we chose the Euro as an alternate currency symbol.
You can type your own values instead of using the drop-down menus.
In some cases, you may wish to tailor a character set to meet specific user needs. In Oracle9i, you can extend an existing encoded character set definition to suit your needs. User-defined characters are often used to encode special characters representing:
This section describes how Oracle supports user-defined character. It describes:
User-defined characters are typically supported within East Asian character sets. These East Asian character sets have at least one range of reserved codepoints for use as user-defined characters. For example, Japanese Shift JIS preserves 1880 codepoints for user-defined characters as follows:
The Oracle character sets listed in Table 11-2 contain pre-defined ranges that allow you to support user-defined characters:
The codepoint value that represents a particular character may vary among different character sets. For example, the Japanese kanji character:
is encoded as follows in different Japanese character sets:
Character Set | Unicode | JA16SJIS | JA16EUC | JA16DBCS |
---|---|---|---|---|
Character Value of
|
4E9C |
889F |
B0A1 |
4867 |
In Oracle, all character sets are defined in terms of a Unicode 3.0 code point. That is, each character is defined as a Unicode 3.0 code value. Character conversion takes place transparently to users by using Unicode as the intermediate form. For example, when a JA16SJIS client connects to a JA16EUC database, the character shown in Figure 11-14, "Kanji Example" (value 889F) entered from the JA16SJIS client is internally converted to Unicode (value 4E9C), and then converted to JA16EUC(value B0A1).
Unicode 3.0 reserves the range E000-F8FF for the Private Use Area (PUA). The PUA is intended for private use character definition by end users or vendors.
User-defined characters can be converted between two Oracle character sets by using Unicode 3.0 PUA as the intermediate form, the same as standard characters.
User-defined character cross references between Japanese character sets, Korean character sets, Simplified Chinese character sets and Traditional Chinese character sets are contained in the following distribution sets:
${ORACLE_HOME}/ocommon/nls/demo/udc_ja.txt ${ORACLE_HOME}/ocommon/nls/demo/udc_ko.txt ${ORACLE_HOME}/ocommon/nls/demo/udc_zhs.txt ${ORACLE_HOME}/ocommon/nls/demo/udc_zht.txt
These cross references are useful when registering user-defined characters across operating systems. For example, when registering a new user-defined character on both a Japanese Shift-JIS operating system and a Japanese IBM Host operating system, you may want to pick up F040 on Shift-JIS operating system and 6941 on IBM Host operating system for the new user-defined character so that Oracle can convert correctly between JA16SJIS and JA16DBCS. You can find out that both Shift-JIS UDC value F040 and IBM Host UDC value 6941 are mapped to the same Unicode PUA value E000 in the user-defined character cross reference.
See Also:
Appendix B, "Unicode Character Code Assignments" for more information about customizing a character set definition file |
By default, the Locale Builder generates the next available character set name for you. You can, however, generate your own character set name. You should follow certain conventions when creating a character set. In particular, the convention used for naming character set definition NLT files is the format: lx2
dddd
.nlt
, where dddd
= 4 digit Character Set ID in hex.
A few things to note when editing a character set definition file:
If a character set is derived from an existing Oracle character set, Oracle Corporation recommends using the following character set naming convention:
<Oracle_character_set_name
><organization_name
>EXT<version
>
For example, if a company such as Sun Microsystems were adding user-defined characters to the JA16EUC character set, the following character set name might be appropriate:
JA16EUCSUNWEXT1
where:
Is the character set name defined by Oracle
Represents the organization name (company stock trading abbreviation for Sun Microsystems)
Specifies that this is an extension to the JA16EUC character set
Specifies the version
This section show how to create a new character set called MYCHARSET
and use 10001 for its recommended ID number. The scenario will start with an ASCII character set and add 10 Chinese characters. First, open US7ASCII from the Existing Definitions dialog box. Figure 11-15 illustrates how to begin.
In Figure 11-15, the ISO Character Set ID and Base Character Set ID fields are optional. The Base Character Set ID is used for inheriting values so that the base character set's properties are used as a starting template. The Character Set ID is automatically generated, although you can override it. The valid range for a user-defined character set ID is 10,000 to 20,000.
Figure 11-16 illustrates how to change certain character set specifications. This should not normally be necessary.
When you open a character set, all possible settings for this tab should already be set to appropriate settings. You should keep these settings unless you have a specific reason for changing them. If you need to change the settings, use the following guidelines:
FIXED_WIDTH
is to identify character sets whose characters have a uniform length. AL16UTF16 is one example.
BYTE_UNIQUE
means the single byte range of codepoints is distinct from multibyte range. An example is JA16EUC.
DISPLAY
identifies character sets that have certain character mode characteristics. Arabic and Devanagari character sets are examples.
SHIFT
is for certain character sets that require extra shift characters to distinguish between single-byte characters and multibyte characters.
Chapter 2, "Choosing a Character Set" for more information about SHIFT In SHIFT Out character sets
See Also:
Figure 11-17 illustrates how to add user-defined characters. In this case, you can add characters after 0xfe. You can add one character at a time or use a text file to import a large number of characters. In this example, we first import a file containing the following characters:
88a2 963f
88a3 54c0
88a4 611b
88a5 6328
88a6 59f6
88a7 9022
88a8 8475
88a9 831c
88aa 7a50
88ab 60aa
Figure 11-18 illustrates the new characters added after 0xfe. We imported the characters in this case from a file having two columns, with the left column being the local code value and the right column being its Unicode mapping.
This section shows how to create a new multilingual linguistic sort called MY_GENERIC_M
, and use 10001 for its ID number. The choice of sort name is based on the convention GENERIC_M
representing a multilingual ISO sort. In this case, we use GENERIC_M
as a starting point. Figure 11-15 illustrates how to begin.
Typical settings for the flags are automatically derived. SWAP_WITH_NEXT
is relevant for Thai and Lao sorts. REVERSE_SECONDARY
is for French sorts. CANONICAL_EQUIVALENCE
determines whether canonical rules will be used.
Collation ID (sort ID) valid ranges for a user-defined sort are 1,000 to 2,000 for monolingual collation and 10,000 to 11,000 for multilingual collation.
See Also:
|
In this scenario, we will move digits so they sort after letters. To do this, we will delete their codepoint values and paste them after the codepoint values of the letters.
Figure 11-20 illustrates selecting a value. Click Delete and paste the value where you want it. Clicking Paste brings up the Collation Pasting Dialog Box, shown in Figure 11-21.
In Figure 11-21, choose where to put the deleted node and at what sort level you want it.
In Figure 11-22, we selected the digits 0-7 were moved from their original place before letters a-z to a place after the letters a-z. For multibyte linguistic sorts, the Locale Builder cannot display accented characters, but you can change their sort order.
The next scenario is to change the sort order for accented characters. You can do this by changing the sort for all characters containing a particular accent mark or by changing one character at a time. In this example, we change the sort of all characters with a circumflex (for example, û
) to go after all characters containing a tilde.
First, we verify the current sort order by choosing Canonical Rules under the Tools menu. This brings up the Canonical Rules dialog box, illustrated in Figure 11-23.
Figure 11-23 illustrates how characters are decomposed into their canonical equivalents and their current sorting orders. For example, ä
is represented as a
plus an umlaut. In this case, we change the sort for all characters with a circumflex so they follow characters with tildes. This example uses a base character of u
.
Click on the Non-Spacing tab. If you use the Non-Spacing tab, changes for accent marks apply to all characters.
After selecting the circumflex, click Cut and accept the confirmation. Then all characters with a circumflex will have their sort order changed.
Figure 11-25 illustrates the new order.
To change the order of a specific accented character, you need to insert the character directly into the appropriate order position. In this scenario, we will change the sort order for ä
so that it sorts after Z
. First, we select the Unicode Collation tab. Next, we highlight the character next to the one we want, Z
in this case. Finally, we click Add, which brings up a Paste dialog box.
As illustrated in Figure 11-26, we choose After and Primary and manually type in \x00e4
, which is the code point for ä
.
We chose Primary for the level because that is the Unicode standard for differentiating between characters having different base letters. A Secondary or Tertiary level sort would also have the same practical results.
Figure 11-27 shows the final result, and displays the ä
correctly.
After you have defined a new language, territory, character set, or linguistic sort, generate new NLB files from the NLT files:
Do not try to specify an NLT file. Oracle Locale Builder generates an NLB file for each NLT file.
The new NLB files do not take effect until you perform the following steps:
lxlboot.nlb
file into the path that is specified by the ORA_NLS33
initialization parameter, typically $ORACLE_HOME/OCOMMON/nls/admin/data
.
Figure 11-29 illustrates the final notification that you have successfully generated NLB files for all NLT files in the directory.
|
Copyright © 1996-2001, Oracle Corporation. All Rights Reserved. |
|