|Oracle9i Globalization Support Guide
Release 1 (9.0.1)
Part Number A90236-02
This chapter contains information useful for SQL programming in a globalization support environment. It includes the following topics:
All SQL functions whose behavior depends on NLS conventions allow NLS parameters to be specified. These functions are:
Explicitly specifying the optional NLS parameters for these functions allows the function evaluations to be independent of the NLS parameters in force for the session. This feature may be important for SQL statements that contain numbers and dates as string literals.
For example, the following query is evaluated correctly if the language specified for dates is
Such a query can be made independent of the current date language by using these statements:
SELECT ENAME FROM EMP WHERE HIREDATE > TO_DATE('1-JAN-91','DD-MON-YY', 'NLS_DATE_LANGUAGE = AMERICAN');
In this way, language-independent SQL statements can be defined where necessary. For example, such statements might be necessary when string literals appear in SQL statements in views,
CHECK constraints, or triggers.
All character functions support both single-byte and multibyte characters. Except where explicitly stated, character functions operate character by character, rather than byte by byte.
When evaluating views and triggers, default values for NLS function parameters are taken from the values currently in force for the session. When evaluating
CHECK constraints, default values are set by the NLS parameters that were specified at database creation.
The syntax that specifies NLS parameters in SQL functions is:
The following NLS parameters can be specified:
Only certain NLS parameters are valid for particular SQL functions, as shown in Table 7-1:
|SQL Function||Valid NLS Parameters|
Examples of the use of NLS parameters are:
TO_DATE ('1-JAN-89', 'DD-MON-YY', 'nls_date_language = American') TO_CHAR (hiredate, 'DD/MON/YYYY', 'nls_date_language = French') TO_NUMBER ('13.000,00', '99G999D99', 'nls_numeric_characters = '',.''') TO_CHAR (sal, '9G999D99L', 'nls_numeric_characters = '',.'' nls_currency = '' Dfl''') TO_CHAR (sal, '9G999D99C', 'nls_numeric_characters = ''.,'' nls_iso_currency = Japan') NLS_UPPER (ename, 'nls_sort = Swiss') NLSSORT (ename, 'nls_sort = German')
NLS_TERRITORY are not accepted as parameters in SQL functions, except for
NLSSORT. Only NLS parameters that explicitly define the specific data items required for unambiguous interpretation of a format are accepted.
NLS_DATE_FORMAT is also not accepted as a parameter for the reason described below.
If an NLS parameter is specified in
TO_DATE, then a format mask must also be specified as the second parameter. For example, the following specification is legal:
The following specifications are illegal:
TO_CHAR (hiredate, 'nls_date_language = French') TO_CHAR (hiredate, 'nls_date_language = French', 'DD/MON/YY')
This restriction requires that a date format always be specified if an NLS parameter is in a
TO_DATE function. As a result,
NLS_DATE_FORMAT is not a valid NLS parameter for these functions.
CONVERT SQL function allows conversion of character data between character sets.
CONVERT function converts the binary representation of a character string in one character set to another. It uses exactly the same technique described previously for the conversion between database and client character sets. Hence, it uses replacement characters and has the same limitations.
The syntax for
source_char_set is the source character set and
dest_char_set is the destination character set. If the
source_char_set parameter is not specified, then it defaults to the database character set.
In client/server environments using different character sets, use the
USING...) statement to perform conversions instead of
CONVERT. The conversion to client character sets will then properly know the server character set of the result of the
When using string manipulation functions, you can get different results depending upon the database character set. In particular, the
SUBSTR functions can return incorrect results because of the difference between single and multibyte character sets. To guarantee correct results, you should use variants of these functions designed for multibyte character sets. These functions are variations of:
The instring functions search strings for the presence of a substring. In addition to the traditional
INSTR, you can use
INSTRC. The functions allow you to return part of a string based on whether characters are calculated in bytes, UCS2 codepoints (where a surrogate pair is considered as two codepoints), or UCS4 codepoints (treats a surrogate pair as one codepoint) or complete Unicode characters (same as
INSTR4 with additional support for treating composed Unicode characters as one code point).
The following examples highlights the differences between
INSTRB on a database where the database character set is UTF8.
This searches the string "Städte und Länder", beginning with the fifth character, for the character "d". It returns the character position in "Städte und Länder" at which the second occurrence of "d" begins.
The following example searches the string "Städte und Länder", beginning with the fifth byte, for the character "d". It returns the byte position in "Städte und Länder" at which the second occurrence of "d" begins.
The length functions return the length of a string. In addition to the traditional
LENGTH, you can use
LENGTHC. The functions allow you to return part of a string based on whether characters are calculated in bytes, UCS2 codepoints (where a surrogate pair is considered as two codepoints), or UCS4 codepoints (treats a surrogate pair as one codepoint) or complete Unicode characters (same as
LENGTH4 with additional support for treating composed Unicode characters as one code point).
The following examples highlight the differences between
LENGTHB on a database where the database character set is UTF8.
SELECT LENGTH ('Télévision') LENGTH FROM DUAL; LENGTH ---------- 10 SELECT LENGTHB ('Télévision') LENGTHB FROM DUAL; LENGTHB ---------- 12
The substring functions return the requested portion of a substring. In addition to the traditional
SUBSTR, you can use
SUBSTRC.The functions allow you to return part of a string based on whether characters are calculated in bytes, UCS2 codepoints (where a surrogate pair is considered as two codepoints), or UCS4 codepoints (treats a surrogate pair as one codepoint) or complete Unicode characters (same as
SUBSTR4 with additional support for treating composed Unicode characters as one code point).
The following examples highlight the differences between
SUBSTRB on a database where the database character set is AL32UTF8.
SELECT SUBSTR ('Fußball', 2 , 4) SUBSTR FROM DUAL; SUBS ---- ußba SELECT SUBSTRB ('Fußball', 2 , 4) SUBSTRB FROM DUAL; SUB --- ußb
LIKE conditions specify a test involving pattern matching. Whereas the equality operator (=) exactly matches one character value to another, the
LIKE conditions match a portion of one character value to another by searching the first value for the pattern specified by the second.
LIKE calculates strings using characters as defined by the input character set.
LIKEC uses unicode complete characters.
LIKE2 uses UCS2 codepoints.
LIKE4 uses USC4 codepoints.
There is no
Two SQL functions,
NLS_CHARSET_ID, are provided to convert between character set ID numbers and character set names. They are used by programs that need to determine character set ID numbers for binding variables through OCI.
NLS_CHARSET_NAME(n) function returns the name of the character set corresponding to ID number n. The function returns
NULL if n is not a recognized character set ID value.
TEXT) returns the character set ID corresponding to the name specified by
TEXT is defined as a run-time
VARCHAR2 quantity, a character set name. Values for
TEXT can be
NLSRTL names that resolve to sets other than the database character set or the national character set.
If the value
CHAR_CS is entered for
TEXT, then the function returns the ID of the server's database character set. If the value
NCHAR_CS is entered for
TEXT, then the function returns the ID of the server's national character set. The function returns
TEXT is not a recognized name. The value for
TEXT must be entered in all uppercase.
CSID) returns the declaration length (in number of characters) for an
NCHAR column. The
BYTECNT argument is the byte length of the column. The
CSID argument is the character set ID of the column.
NLSSORT function replaces a character string with the equivalent sort string used by the linguistic sort mechanism. For a binary sort, the sort string is the same as the input string. The linguistic sort technique operates by replacing each character string with some other binary values, chosen so that sorting the resulting string produces the desired sorting sequence. When a linguistic sort is being used,
NLSSORT returns the binary values that replace the original string.
BY clause in a SQL statement is determined by the
NLS_SORT session parameter, but it can be overridden by explicitly using the
NLSSORT function, as the following example shows.
The preceding example uses a German sort, but the following example uses a French one.
ALTER SESSION SET NLS_SORT = GERMAN; SELECT * FROM table1 ORDER BY NLSSORT(col1, 'NLS_SORT = FRENCH');
WHERE clause normally uses binary comparison rather than linguistic comparison. But this can be overridden by the following methods.
NLSSORTfunction in the
NLS_COMPto ANSI, in which case the
NLS_SORTsession parameter is used in the
There are four ways to use
NLSSORT(), which relies on the
NLSSORT(column1, 'NLS_LANG= xxxx')
NLS_LANG parameter of the
NLS_SORT function is not the same as the
NLS_LANG client environment setting. In the
NLS_LANG specifies the abbreviated language name, for example,
US for American or
PL for Polish. An example is:
NLSSORT allows applications to perform string matching that follows alphabetic conventions. Normally, character strings in a
WHERE clause are compared using the characters' binary values. A character is "greater than" another if it has a higher binary value in the database character set. Because the sequence of characters based on their binary values might not match the alphabetic sequence for a language, such comparisons often do not follow alphabetic conventions. For example, if a column (
col1) contains the values ABC, ABZ, BCD, and ÄBC in the ISO 88591 8-bit character set, the following query:
Ä has a higher numeric value than
B. However, in German,
Ä is sorted alphabetically before
B. Such conventions are language-dependent even when the same character is used. In Swedish,
Ä is sorted after
Z. Linguistic comparisons can be made using
NLSSORT in the
WHERE clause, as follows:
NLSSORT has to be on both sides of the comparison operator. For example:
If a German linguistic sort is being used, this does not return strings beginning with
Ä because, in the German alphabet,
Ä comes before
B. If a Swedish linguistic sort is being used, such names are returned because, in the Swedish alphabet,
Ä comes after
Normally, comparison in the
WHERE clause or PL/SQL blocks is binary. You can use the
NLSSORT function for linguistic comparison. Sometimes this can be tedious, especially when the linguistic sort needed has already been specified in the
NLS_SORT session parameter. You can use
NLS_COMP in such cases to indicate that the comparisons must be linguistic according to the
NLS_SORT session parameter. This is done by altering the session:
To specify that comparison in the
WHERE clause is always binary, issue the following statement:
NLS_COMP is set to
ANSI, a linguistic index improves the performance of the linguistic comparison.To enable a linguistic index, use the syntax:
String comparison for partition
THAN collation for DDL and DML always follows
If a linguistic sorting sequence is in use, then
NLSSORT is used implicitly on each character item in the
BY clause. As a result, the sort mechanism (linguistic or binary) for an
BY is transparent to the application. However, if the
NLSSORT function is explicitly specified for a character item in an
BY item, then the implicit
NLSSORT is not done.
In other words, the
NLSSORT linguistic replacement is only applied once, not twice. The
NLSSORT function is generally not needed in an
BY clause when the default sort mechanism is a linguistic sort. However, when the default sort mechanism is
BINARY, then a query such as:
uses a binary sort. A German linguistic sort can be obtained using:
Several format masks are provided with the
TO_NUMBER functions to format dates and numbers according to the relevant conventions.
A format element RM (Roman Month) returns a month as a Roman numeral. One can specify either uppercase or lowercase using RM or rm respectively. For example, for the date
7 Sep 1998,
DD-rm-YYYY will return
DD-RM-YYYY will return
Note that the
DY format masks explicitly support month and day abbreviations that may not be three characters in length. For example, the abbreviations "Lu" and "Ma" can be specified for the French "Lundi" and "Mardi", respectively.
The week numbers returned by the WW format mask are calculated according to the algorithm
int((day-ijan1)/7). This week number algorithm does not follow the ISO standard (
To support the ISO standard, a format element IW is provided that returns the ISO week number. In addition, format elements
IYYY, equivalent in behavior to the format elements
YYYY, return the year relating to the ISO week number.
In the ISO standard, the year relating to an ISO week number can be different from the calendar year. For example, 1st Jan 1988 is in ISO week number 53 of 1987. A week always starts on a Monday and ends on a Sunday.
For example, January 1, 1991, is a Tuesday, so Monday, December 31, 1990, to Sunday, January 6, 1991, is week 1. Thus, the ISO week number and year for December 31, 1990, is 1, 1991. To get the ISO week number, use the format mask
IW for the week number and one of the
IY formats for the year.
Several additional format elements are provided for formatting numbers:
D(decimal) returns the decimal point character.
G(group) returns the group separator.
L(local currency) returns the local currency symbol.
C(international currency) returns the ISO currency symbol.
RN(Roman numeral) returns the number as its Roman numeral equivalent.
For Roman numerals, one can specify either uppercase or lowercase, using
rn, respectively. The number being converted must be an integer in the range 1 to 3999.
Oracle9i SQL Reference for a complete description on using date and number masks
The following topic is also important for SQL programming:
If the database character set replaces the vertical bar
| with a national character, then all SQL statements that use the concatenation operator (ASCII 124) will fail. For example, creating a procedure will fail because it generates a recursive SQL statement that uses concatenation. When you use a 7-bit replacement character set such as D7DEC, F7DEC, or SF7ASCII for the database character set, then the national character which replaces the vertical bar is not allowed in object names because the vertical bar is interpreted as the concatenation operator.
On the user side, one can use a 7-bit replacement character set if the database character set is the same or compatible, that is, if both character sets replace the vertical bar with the same national character.