Understanding Sort Orders

This section discusses sorting.

Sorting data in English is reasonably simple given the well-defined sorting rules of the language. Additionally, most character sets are based on the ASCII standard, which allocates characters to numerical codes in English alphabetical order. Therefore, when sorting ASCII data by its binary representation, you automatically get a sort that makes sense in English; data is sorted from A to Z and numerics are sorted from 0 to 9.

However, sorting non-English languages is significantly more complex; some languages have special rules for sorting characters with diacritic marks; others, such as Japanese and Thai, can be sorted in several different orders depending on the usage or context of the sort.

In linguistic circles, sorting is also known as collation. In this book, the two terms are used interchangeably.

A sort order identifies how PeopleTools assembles, compares, and displays data. For example, a sort order specifies whether A is less than, equal to, or greater than Z. The simplest way of sorting data in a computer system is to sort it in the order that the characters appear in the character set. This is known as a binary sort, because it sorts the numerical codes of each character as they are stored in memory without any special sorting cases or linguistic considerations. A binary sort works well for sorting English language data; English sorting rules can be implemented as a binary sort as long as the underlying character set is laid out from A to Z. English characters in US-ASCII, EBCDIC, and Unicode are all laid out in this fashion, so a binary sort on data in any of these character sets is sufficient for sorting English data.

However, when sorting data in languages other than English, you must consider how to sort:

  • Characters with diacritic marks (such as á, ñ, and ö): after the letter Z, or after the base character without the diacritic.

    In most character sets, these characters appear after the letter Z in binary order; however, in most languages they sort after the base form of the character.

  • Special characters and ligatures (such as æ and œ) and representative characters (such as ß) in some European languages.

    In most cases, these characters must be expanded to their full form before being sorted. For example, æ is often expanded to ae when sorted.

  • Ideographic languages.

    The Chinese, Japanese, and Korean languages use a large repertoire of characters in their written languages, such that it would be impossible for the average person to remember an arbitrary sort order. Instead, several schemes exist for sorting Chinese, Japanese, and Korean characters, including sorting based on core, common parts of each character (radicals), or by counting the number of brush or pen strokes that it takes to write the character (stroke count).

Unfortunately, most of these sorting schemes are language-specific and sometimes even specific to a country in which a language is spoken. For example, the German sharp-S character (ß) is sorted in Germany as if it were written as SS, but in Austria it is sorted as if it were written as SZ. Other times, multiple sort orders can be in use in a single country. In Spain, it is common to sort the ch character sequence after cz but before d. However, in more recent times many Spanish organizations have reverted to sorting the ch sequence as individual characters between cg and ci. Which is correct depends on personal or organizational preference.

In the majority of cases where data is sorted in PeopleTools, the host database management system performs the sort through an ORDER BY clause in SQL, and the result is displayed directly to the user. It is therefore critical, when you create a database, that you select an appropriate sort order based on the languages that you plan to maintain in the database, the countries in which you plan to operate, and any specific preferences or policies that your organization maintains relating to sorting (such as which Spanish ch sort is preferred). Most database systems require you to choose a sort order during database creation, because it affects the way that SQL indexes are stored on disk to optimize sorting performance.

Note: Due to limitations in most database packages and for performance reasons, PeopleTools supports only one sort order per database.

In some cases, PeopleTools also sorts data in memory and must use internal tables to determine the appropriate order for character data. This is particularly prevalent when sorting drop-down lists on pages displaying translate values and within PeopleSoft Query. It is therefore important that you not only create the database with the appropriate sort order, but you also tell PeopleTools which sort order you have chosen for the database so it can emulate this sort for lists that it sorts in memory.

Note: Sorting in PeopleTools is case sensitive (for example, the lowercase letter a sorts after the uppercase letter Z) and accent sensitive (for example, the accented letter á’ is considered distinct from the unaccented letter a). Sorting in PeopleTools is also kana sensitive for Japanese data, meaning that certain forms of Japanese characters (Hiragana and Katakana) are considered distinct rather than equivalent. PeopleTools does not support case-insensitive, accent-insensitive, or kana-insensitive sorting.

Many different components of PeopleTools sort character data, most of which rely on the database system to perform sorting by using a SQL ORDER BY clause. However, PeopleTools does perform some sorting in memory.

See PeopleTools In-Memory Sorts.

Other parts of PeopleTools support only a binary sort for logistical or performance reasons. The following table indicates each of the common areas of PeopleTools that presents sorted lists of data to the user and the mechanism that each uses to perform the sort:

Functional Area

Sorting Engine

Component search dialog boxes.

Database-level SQL ORDER BY sort.

Scroll buffers.

Database-level SQL ORDER BY sort.

SQL operations in PeopleSoft Query, PS/n Vision, and SQR.

Database-level SQL ORDER BY sort.

SQL functions that are called from PeopleCode (SQL Objects, ExecSQL) containing BETWEEN, <, >, MIN, MAX, and so on.

Database-level SQL ORDER BY sort.

PeopleCode ScrollSelect( ) / ScrollSelectNew( ), RowScrollSelect( ), and RollScrollSelectNew( ) functions.

Database-level SQL ORDER BY sort.

Column-specific re-sorts in PeopleSoft Pure Internet Architecture (by clicking on a column heading).

PeopleTools in-memory sort.

PeopleCode SortScroll( ).

PeopleTools in-memory sort.

Drop-down list boxes of translate values.

PeopleTools in-memory sort.

PeopleCode binary comparison functions ( <, >, =).

Binary sort.

%BINARYSORT meta-SQL functions.

Binary sort.

Greater than and less than COBOL operations.

Binary sort.

Greater than and less than comparisons in SQR.

Binary sort.

PeopleTools relies on the sorting capabilities of the host database system for each functional area that is identified in the table in the previous section as using a database-level SQL ORDER BY sort. In this case, the SQL statement that is used to retrieve the data to be displayed to the user is coded to include an ORDER BY statement, and the sort order of the database determines in what order data is retrieved. Obviously, this relies on the database system being correctly configured for the appropriate linguistic sort that you determined is required for your database.

Each database management system has a different mechanism of determining the sort order for SQL ORDER BY statements. For example, Microsoft SQL Server requires the user to select a sort order when the server is installed, or when each database is created. Oracle enables the database administrator to specify the appropriate sort order in the init.ora parameter file.

See the PeopleTools Installation for your database platformand the product documentation for your database delivered by your database vendor.

To avoid round-trips to the database, PeopleTools performs some sorting in memory, typically of short lists, such as a list of translate values. PeopleTools supports a significant number of linguistic sorts for in-memory sorting. The sort order that is used for PeopleTools in-memory sorting is controlled by the sort order option on the PeopleTools Options page. While PeopleTools cannot emulate exactly each sort order that is offered by your database vendor, it provides an appropriate sort order for most popular business languages. Select the sort order that most closely corresponds to the sort order of your database. Conflicting sort order selection between the database and PeopleTools results in lists of values that are sorted by PeopleTools appearing in a significantly different order than lists of values that are sorted by the database.

See Setting the Sort Order.

When comparing values in PeopleCode, SQR, and COBOL that use less than (<), greater than (>), or other character comparison operators, a binary sort is used. The specific sort that is performed depends on the character set of data in memory at the time. For example, all binary sorts that are performed in PeopleCode, SQR syntax, and Unicode COBOL take place based on the Unicode binary order. Binary sorts that are performed in non-Unicode COBOL take place in the non-Unicode character set of the batch server. Binary sorts that are performed by the %BINARYSORT PeopleCode meta-SQL function take place in the character set of the database engine.

Therefore, you should be careful not to write code that depends on the sorting of data by using binary operators matching the sorting of data by the database as the result of sorting operators in SQL statements.

For example, the following PeopleCode statement is performed as a Unicode binary comparison, and á is located in the Unicode tables after the character z, so it always returns True:

if 'z' < 'á' then. . .

However, when the same comparison is run in SQL (as in the following example), the database’s sort order determines which character is greater. In a database that is configured for binary sorting, it returns true; however, in a database that is configured for French sorting (where á is sorted after a but before b), it returns false.

SELECT . . .
WHERE 'z' < 'á'

It is important that these functional areas use binary sorting instead of attempting to perform a linguistically-sensitive sort because:

  • Any linguistically-sensitive sort that is performed in memory can be only an approximation of the sort that the database system would perform in the same situation given the large number of sort orders that are provided by database vendors and the significant variations to these orders in minor versions of the database software.

    Do not assume that a binary sort that is performed in PeopleTools will match the sort of the same characters in a SQL statement that uses the less than, greater than, or BETWEEN operators.

  • Performing a guaranteed database-compatible sort for each comparison in PeopleCode would require a round-trip to the database to perform the sorting and would affect the performance of PeopleCode operations.