Linguistic Sort Rules Support Linguistic Conventions

Different languages have different sorting rules. Text is conventionally sorted inside a database according to the binary codes used to encode the characters. Typically, this does not produce a sort order that is linguistically meaningful. A linguistic sort handles the complex sorting requirements of different languages and cultures.

It enables text in character data types, such as CHAR, VARCHAR2, NCHAR, and NVARCHAR2, to be sorted according to specific linguistic conventions.

A linguistic sort operates by replacing characters with numeric values that reflect each character's proper linguistic order. TimesTen offers two kinds of linguistic sorts: monolingual and multilingual.

This section includes the following topics:

Monolingual Linguistic Sorts

TimesTen compares character strings in two steps for monolingual sorts.

The first step compares the major value of the entire string from a table of major values. Usually, letters with the same appearance have the same major value. The second step compares the minor value from a table of minor values. The major and minor values are defined by TimesTen. TimesTen defines letters with accent and case differences as having the same major value but different minor values.

Monolingual linguistic sorting is available only for single-byte and Unicode database character sets. If a monolingual linguistic sort is specified when the database character set is non-Unicode multibyte, then the default sort order is the binary sort order of the database character set.

For a list of supported sorts, see NLS_SORT in the Oracle TimesTen In-Memory Database Reference.

Multilingual Linguistic Sorts

TimesTen provides multilingual linguistic sorts so that you can sort data for multiple languages in one sort.

Multilingual linguistic sort is based on the ISO/OEC 14651 - International String Ordering and the Unicode Collation algorithm standards. This framework enables the database to handle languages that have complex sorting rules, such as those in Asian languages, as well as providing linguistic support for databases with multilingual data.

In addition, multilingual sorts can handle canonical equivalence and supplementary characters. Canonical equivalence is a basic equivalence between characters or sequences of characters. For example, ç is equivalent to the combination of c and ,.

For example, TimesTen supports a monolingual French sort (FRENCH), but you can specify a multilingual French sort (FRENCH_M). _M represents the ISO 14651 standard for multilingual sorting. The sorting order is based on the GENERIC_M sorting order and can sort accents from right to left. TimesTen recommends using a multilingual linguistic sort if the tables contain multilingual data. If the tables contain only French, then a monolingual French sort may have better performance because it uses less memory. It uses less memory because fewer characters are defined in a monolingual French sort than in a multilingual French sort. There is a trade-off between the scope and the performance of a sort.

For a list of supported multilingual sorts, see NLS_SORT in the Oracle TimesTen In-Memory Database Reference.

Case-Insensitive and Accent-Insensitive Linguistic Sorts

Operations inside a database are sensitive to the case and the accents of the characters. Sometimes you might need to perform case-insensitive or accent-insensitive comparisons.

To specify a case-insensitive or accent-insensitive sort:

  • Append _CI to a TimesTen sort name for a case-insensitive sort. For example:

    BINARY_CI: accent-sensitive and case-insensitive binary sort

    GENERIC_M_CI: accent-sensitive and case-insensitive GENERIC_M sort

  • Append _AI to a TimesTen sort name for an accent-insensitive and case-insensitive sort. For example:

    BINARY_AI: accent-insensitive and case-insensitive binary sort

    FRENCH_M_AI: accent-insensitive and case-insensitive FRENCH_M sort

Performing a Linguistic Sort

The NLS_SORT data store connection attribute indicates which collating sequence to use for linguistic comparisons. The NLS_SORT value affects the SQL string comparison operators and the ORDER BY clause.

You can use the ALTER SESSION statement to change the value of NLS_SORT:

Command> ALTER SESSION SET NLS_SORT=SWEDISH;
Command> SELECT product_name FROM product ORDER BY product_name;

PRODUCT NAME
------------
aerial
Antenne
Lcd
ächzen
Ähre

You can also override the NLS_SORT setting by using the NLSSORT SQL function to perform a linguistic sort:

SELECT * FROM test ORDER BY NLSSORT(name,'NLS_SORT=SPANISH');

Note:

For materialized views and cache groups, TimesTen recommends that you explicitly specify the collating sequence using the NLSSORT() SQL function rather than using this attribute in the connection string or DSN definition.

See NLS_SORT in the Oracle TimesTen In-Memory Database Reference. For more extensive examples of using NLSSORT, see NLSSORT in the Oracle TimesTen In-Memory Database SQL Reference.

Using Linguistic Indexes

You can create a linguistic index to achieve better performance during linguistic comparisons. A linguistic index requires storage for the sort key values.

To create a linguistic index, use a statement similar to the following:

CREATE INDEX german_index ON employees 
    (NLSSORT(employee_id, 'NLS_SORT=GERMAN'));

The optimizer chooses the appropriate index based on the values for NLSSORT and NLS_SORT.

You must create multiple linguistic indexes if you want more than one linguistic sort on a column. For example, if you want both GERMAN and GERMAN_CI sorts against the same column, create two linguistic indexes.

See CREATE INDEX in the Oracle TimesTen In-Memory Database SQL Reference.