4
SQL Programming

This chapter contains information useful for SQL programming in an NLS environment, including:

Locale-Dependent SQL Functions

All SQL functions whose behavior depends on NLS conventions allow NLS parameters to be specified. These functions are:

TO_CHAR
TO_DATE
TO_NUMBER
NLS_UPPER
NLS_LOWER
NLS_INITCAP
NLSSORT

Explicitly specifying the optional NLS parameters for these functions allows the function evaluations to be independent of the NLS parameters in force for the session. This feature may be important for SQL statements that contain numbers and dates as string literals.

For example, the following query is evaluated correctly if the language specified for dates is American:

SQL> SELECT ENAME FROM EMP
   > WHERE HIREDATE > '1-JAN-91';

Such a query can be made independent of the current date language by using these statements:

SQL> SELECT ENAME FROM EMP
   > WHERE HIREDATE > TO_DATE('1-JAN-91','DD-MON-YY',
   > 'NLS_DATE_LANGUAGE = AMERICAN');

In this way, language-independent SQL statements can be defined where necessary. For example, such statements might be necessary when string literals appear in SQL statements in views, CHECK constraints, or triggers.

All character functions support both single-byte and multi-byte characters. Except where explicitly stated, character functions operate character-by-character, rather than byte-by-byte.

Default Specifications

When evaluating views and triggers, default values for NLS function parameters are taken from the values currently in force for the session. When evaluating CHECK constraints, default values are set by the NLS parameters that were specified at database creation.

Specifying Parameters

The syntax that specifies NLS parameters in SQL functions is:

'parameter = value'

The following NLS parameters can be specified:

Only certain NLS parameters are valid for particular SQL functions, as shown in Table 4-1:

Table 4-1 SQL Functions and Their Parameters

SQL Function	Valid NLS Parameters
TO_DATE	NLS_DATE_LANGUAGE NLS_CALENDAR
TO_NUMBER:	NLS_NUMERIC_CHARACTERS NLS_CURRENCY NLS_DUAL_CURRENCY NLS_ISO_CURRENCY
TO_CHAR	NLS_DATE_LANGUAGE NLS_NUMERIC_CHARACTERS NLS_CURRENCY NLS_ISO_CURRENCY NLS_DUAL_CURRENCY NLS_CALENDAR
NLS_UPPER	NLS_SORT
NLS_LOWER	NLS_SORT
NLS_INITCAP	NLS_SORT
NLSSORT	NLS_SORT

Examples of the use of NLS parameters are:

TO_DATE ('1-JAN-89', 'DD-MON-YY',
   'nls_date_language = American')

TO_CHAR (hiredate, 'DD/MON/YYYY',
   'nls_date_language = French')

TO_NUMBER ('13.000,00', '99G999D99',
   'nls_numeric_characters = '',.''')

TO_CHAR (sal, '9G999D99L', 'nls_numeric_characters = '',.''
   nls_currency = '' Dfl''')

TO_CHAR (sal, '9G999D99C', 'nls_numeric_characters = ''.,''
   nls_iso_currency = Japan')

NLS_UPPER (ename, 'nls_sort = Swiss')

NLSSORT (ename, 'nls_sort = German')

Note:
For some languages, various lowercase characters correspond to a sequence of uppercase characters, or vice versa. As a result, the length of the output from the functions NLS_UPPER, NLS_LOWER, and NLS_INITCAP can differ from the input.

Unacceptable Parameters

Note that NLS_LANGUAGE and NLS_TERRITORY are not accepted as parameters in SQL functions, except for NLSSORT. Only NLS parameters that explicitly define the specific data items required for unambiguous interpretation of a format are accepted. NLS_DATE_FORMAT is also not accepted as a parameter for the reason described below.

If an NLS parameter is specified in TO_CHAR, TO_NUMBER, or TO_DATE, a format mask must also be specified as the second parameter. For example, the following specification is legal:

TO_CHAR (hiredate, 'DD/MON/YYYY', 'nls_date_language = French')

The following specifications are illegal:

TO_CHAR (hiredate, 'nls_date_language = French')
TO_CHAR (hiredate, 'nls_date_language = French',
   'DD/MON/YY')

This restriction requires that a date format always be specified if an NLS parameter is in a TO_CHAR or TO_DATE function. As a result, NLS_DATE_FORMAT is not a valid NLS parameter for these functions.

CONVERT Function

The SQL function CONVERT allows for conversion of character data between character sets.

The CONVERT function converts the binary representation of a character string in one character set to another. It uses exactly the same technique described previously for the conversion between database and client character sets. Hence, it uses replacement characters and has the same limitations.

If the CONVERT function is used in a stored procedure, the stored procedure runs independently of the client character set (that is, it uses the server's character set), which sometimes results in the last converted character being truncated.

The syntax for CONVERT is:

Figure 4-1 CONVERT Syntax

Text description of the illustration ch4a.gif.

where source_char_set is the source character set and dest_char_set is the destination character set.

In client/server environments using different character sets, use the TRANSLATE (...USING...) statement to perform conversions instead of CONVERT. The conversion to client character sets will then properly know the server character set of the result of the TRANSLATE statement.

For more information on CONVERT, see Oracle8i SQL Reference.

Character Set SQL Functions

Two SQL functions, NLS_CHARSET_NAME and NLS_CHARSET_ID, are provided to convert between character set ID numbers and character set names. They are used by programs that need to determine character set ID numbers for binding variables through OCI.

The NLS_CHARSET_DECL_LEN function returns the declaration length (in number of characters) for an NCHAR column.

For more information on these functions, see Oracle8i SQL Reference.

Converting from Character Set Number to Character Set Name

The NLS_CHARSET_NAME(n) function returns the name of the character set corresponding to ID number n. The function returns NULL if n is not a recognized character set ID value.

Converting from Character Set Name to Character Set Number

NLS_CHARSET_ID(TEXT) returns the character set ID corresponding to the name specified by TEXT. TEXT is defined as a run-time VARCHAR2 quantity, a character set name. Values for TEXT can be NLSRTL names that resolve to sets other than the database character set or the national character set.

If the value CHAR_CS is entered for TEXT, the function returns the ID of the server's database character set. If the value NCHAR_CS is entered for TEXT, the function returns the ID of the server's national character set. The function returns NULL if TEXT is not a recognized name. The value for TEXT must be entered in all uppercase.

Returning the Length of an NCHAR Column

NLS_CHARSET_DECL_LEN(BYTECNT, CSID) returns the declaration length (in number of characters) for an NCHAR column. The BYTECNT argument is the byte length of the column. The CSID argument is the character set ID of the column.

NLSSORT Function

The NLSSORT function replaces a character string with the equivalent sort string used by the linguistic sort mechanism. For a binary sort, the sort string is the same as the input string. The linguistic sort technique operates by replacing each character string with some other binary values, chosen so that sorting the resulting string produces the desired sorting sequence. When a linguistic sort is being used, NLSSORT returns the binary values that replace the original string.

The ORDER BY clause in a SQL statement is determined by the NLS_SORT session parameter, but it can be overridden by explicitly using the NLSSORT() function, as the following example shows.

SQL> ALTER SESSION SET NLS_SORT = GERMAN;
   > SELECT *
   > FROM table1
   > ORDER BY col1;

The preceding example uses a German sort, but the following example uses a French one.

SQL> ALTER SESSION SET NLS_SORT = GERMAN;
   > SELECT *
   > FROM table1
   > ORDER BY NLSSORT(col1, 'NLS_SORT = FRENCH');

The WHERE clause normally uses binary comparison rather than linguistic comparison. But this can be overridden by two methods.

Use of the NLSSORT() function in the WHERE clause.

SQL> SELECT *
   > FROM table1
   > WHERE NLSSORT(col1, 'NLS_SORT = FRENCH')>

    > NLSSORT(col2, 'NLS_SORT = FRENCH');

Setting the session parameter NLS_COMP to ANSI, in which case the NLS_SORT session parameter is used in the WHERE clause.
```
SQL> ALTER SESSION SET NLS_COMP = ANSI;
```

NLSSORT Syntax

There are four ways to use NLSSORT:

NLSSORT()--which relies on the NLS_SORT parameter
NLSSORT(column1, 'NLS_SORT=xxxx')
NLSSORT(column1, 'NLS_LANG= xxxx')
NLSSORT(column1, 'NLS_LANGUAGE=xxxx')

The NLS_LANG parameter of the NLS_SORT function is not the same as the NLS_LANG client environment setting. In the NLSSORT function, NLS_LANG specifies the abbreviated language name, for example, US for American or PL for Polish. An example is:

SQL> SELECT * FROM emps
   > ORDER BY NLSSORT(col1, 'NLS_LANG=PL');

String Comparisons in a WHERE Clause

NLSSORT allows applications to perform string matching that follows alphabetic conventions. Normally, character strings in a WHERE clause are compared using the characters' binary values. A character is "greater than" another if it has a higher binary value in the database character set. Because the sequence of characters based on their binary values might not match the alphabetic sequence for a language, such comparisons often do not follow alphabetic conventions. For example, if a column (COL1) contains the values ABC, ABZ, BCD, and ÄBC in the ISO 8859/1 8-bit character set, the following query:

SQL> SELECT col1 FROM tab1 WHERE col1 > 'B';

returns both BCD and ÄBC because Ä has a higher numeric value than B. However, in German, an Ä is sorted alphabetically before B. Such conventions are language dependent even when the same character is used. In Swedish, an Ä is sorted after Z. Linguistic comparisons can be made using NLSSORT in the WHERE clause, as follows:

WHERE NLSSORT(col) comparison_operator NLSSORT(comparison_string)

Note that NLSSORT has to be on both sides of the comparison operator. For example:

SELECT col1 FROM tab1 WHERE NLSSORT(col1) > NLSSORT('B')

If a German linguistic sort is being used, this does not return strings beginning with Ä because, in the German alphabet, Ä comes before B. If a Swedish linguistic sort is being used, such names are returned because, in the Swedish alphabet, Ä comes after Z.

NLS_COMP

Normally, comparison in the WHERE clause is binary. To use linguistic comparison, the NLSSORT function can be used. Sometimes this can be tedious, especially when the linguistic sort needed has already been specified in the NLS_SORT session parameter. One can use NLS_COMP in such cases to indicate that the comparisons must be linguistic according to the NLS_SORT session parameter. This is done by altering the session:

SQL> ALTER SESSION SET NLS_COMP = ANSI;

To specify that comparison in the WHERE clause is always binary, issue the following statement:

SQL> ALTER SESSION SET NLS_COMP = BINARY;

As a final note, when NLS_COMP is set to ANSI, a linguistic index improves the performance of the linguistic comparison.

To enable a linguistic index, use the syntax:

SQL> CREATE INDEX i ON t(NLSSORT(col, 'NLS_SORT=FRENCH'));

Partitioned Tables and Indexes

String comparison for partition VALUES LESS THAN collation for DDL and DML always follows BINARY order.

Controlling an ORDER BY Clause

If a linguistic sorting sequence is in use, then NLSSORT is used implicitly on each character item in the ORDER BY clause. As a result, the sort mechanism (linguistic or binary) for an ORDER BY is transparent to the application. However, if the NLSSORT function is explicitly specified for a character item in an ORDER BY item, then the implicit NLSSORT is not done.

In other words, the NLSSORT linguistic replacement is only applied once, not twice. The NLSSORT function is generally not needed in an ORDER BY clause when the default sort mechanism is a linguistic sort. However, when the default sort mechanism is BINARY, then a query such as:

SELECT ename FROM emp
ORDER BY ename

uses a binary sort. A German linguistic sort can be obtained using:

SELECT ename FROM emp
ORDER BY NLSSORT(ename, 'NLS_SORT = GERMAN')

Pattern Matching Characters for Fixed-Width Multi-Byte Character Sets

The LIKE operator is used in character string comparisons with pattern matching. Its syntax requires the use of two special pattern matching characters: the underscore (_) and the percent sign(%). The space character is used to pad CHAR values to the declared column length.

When the LIKE operator is applied to a national character set column (NCHAR or NVARCHAR2) or a value of an NCHAR column must be padded to its declared length, and the national character set is fixed-byte multi-width, a problem arises because the (single-byte) underscore, percent, and space characters are not present in the character set. Table 4-2 lists characters that should be used instead.

Table 4-2 Encoding for the Underscore, Percent Sign, and Pad Character

For This Character Set	Use These Code Point Values
For This Character Set	Underscore	Percent Sign	Pad Character (Space)
JA16SJISFIXED	0x8151	0x8193	0x8140
JA16EUCFIXED	0xa1b2	0xa1f3	0xa1a1
JA16DBCSFIXED	0x426d	0x426c	0x4040
ZHT32TRISFIXED	0x8eb1a1df	0x8eb1a1a5	0x8ebla1a0

Time/Date/Calendar Formats

Several format masks are provided with the TO_CHAR, TO_DATE, and TO_NUMBER functions to format dates and numbers according to the relevant conventions.

Date Formats

A format element RM (Roman Month) returns a month as a Roman numeral. One can specify either uppercase or lowercase using RM or rm respectively. For example, for the date 7 Sep 1998, "DD-rm-YYYY" will return "07-ix-1998" and "DD-RM-YYYY" will return "07-IX-1998".

Note that the MON and DY format masks explicitly support month and day abbreviations that may not be three characters in length. For example, the abbreviations "Lu" and "Ma" can be specified for the French "Lundi" and "Mardi", respectively.

Week and Day Number Conventions

The week numbers returned by the WW format mask are calculated according to the algorithm int((day-ijan1)/7). This week number algorithm does not follow the ISO standard (2015, 1992-06-15).

To support the ISO standard, a format element IW is provided that returns the ISO week number. In addition, format elements I IY IYY and IYYY, equivalent in behavior to the format elements Y, YY, YYY, and YYYY, return the year relating to the ISO week number.

In the ISO standard, the year relating to an ISO week number can be different from the calendar year. For example, 1st Jan 1988 is in ISO week number 53 of 1987. A week always starts on a Monday and ends on a Sunday.

If January 1 falls on a Friday, Saturday, or Sunday, then the week including January 1 is the last week of the previous year, because most of the days in the week belong to the previous year.
If January 1 falls on a Monday, Tuesday, Wednesday, or Thursday, then the week is the first week of the new year, because most of the days in the week belong to the new year.

For example, January 1, 1991, is a Tuesday, so Monday, December 31, 1990, to Sunday, January 6, 1991, is week 1. Thus, the ISO week number and year for December 31, 1990, is 1, 1991. To get the ISO week number, use the format mask "IW" for the week number and one of the "IY" formats for the year.

Numeric Formats

Several additional format elements are provided for formatting numbers:

D (Decimal) returns the decimal point character.
G (Group) returns the group separator.
L (Local currency) returns the local currency symbol.
C (International Currency) returns the ISO currency symbol.
RN (Roman Numeral) returns the number as its Roman numeral equivalent.

For Roman numerals, one can specify either uppercase or lowercase, using RN or rn, respectively. The number being converted must be an integer in the range 1 to 3999.

For complete information on using date and number masks, see Oracle8i SQL Reference.

Miscellaneous Topics

The Concatenation Operator

If the database character set replaces the vertical bar ("|") with a national character, then all SQL statements that use the concatenation operator (ASCII 124) will fail. For example, creating a procedure will fail because it generates a recursive SQL statement that uses concatenation. When you use a 7-bit replacement character set such as D7DEC, F7DEC, or SF7ASCII for the database character set, then the national character which replaces the vertical bar is not allowed in object names because the vertical bar is interpreted as the concatenation operator.

On the user side, one can use a 7-bit replacement character set if the database character set is the same or compatible, that is, if both character sets replace the vertical bar with the same national character.

4 SQL Programming