A script-enabled browser is required for this page to function properly.

Sorting Character Data

Conventionally, when character data is sorted, the sort sequence is based on the numeric values of the characters defined by the character encoding scheme. Such a sort is called a binary sort and produces reasonable results for the English alphabet because the ASCII and EBCDIC standards define the letters A to Z in ascending numeric value. In the ASCII standard all uppercase letters appear before any lowercase letters. In the EBCDIC standards, the opposite is true: all lowercase letters appear before any uppercase letters.

Linguistic sorting is not supported on multi-byte character sets.

However, when characters used in other languages are present, a binary sort generally does not produce reasonable results. For example, an ascending ORDER BY query would return the character strings ABC, ABZ, BCD, ÄBC, in that sequence, when the Ä has a higher numeric value than B in the character encoding scheme.

To produce a sort sequence that matches the alphabetic sequence of characters for a particular language, another sort technique must be used that sorts characters independently of their numeric values in the character encoding scheme. This technique is called a linguistic sort. A linguistic sort operates by replacing characters with other binary values that reflect the character's proper linguistic order so that a binary sort returns the desired result.

Linguistic special cases

Linguistic special cases are character sequences that need to be treated as a single character when sorting. Such special cases are handled automatically when using a linguistic sort. For example, one of the linguistic sort sequences for Spanish specifies that the double characters ch and ll are sorted as single characters appearing between c and d and between l and m respectively.

Another example is the German language sharp s (ß). The linguistic sort sequence German can sort this sequence as the two characters SS, while the linguistic sort sequence American sorts it as SZ.

Special cases like these are also handled when converting uppercase characters to lowercase, and vice versa. For example, in German the uppercase of the sharp s is the two characters SS. Such case-conversion issues are handled by the NLS_UPPER, NLS_LOWER, and NLS_INITCAP functions, according to the conventions established by the linguistic sort sequence. (The standard functions UPPER, LOWER, and INITCAP do not handle these special cases.)

NLSSORT function

You can use the SQL function NLSSORT to sort character data. This function replaces a character string with the equivalent sort string used by the sort mechanism. For a binary sort, the sort string is the same as the input string. When a linguistic sort is being used, NLSSORT returns the binary values used to replace the original string.