Forcing a Binary Sort in SQL

When writing PeopleCode and other procedural logic, it is sometimes expected that sorting a list of data in memory produces the same results as sorting the same list in the database through a SQL ORDER BY statement. When working in some languages, such as English, whose sorting logic is relatively simple, this equivalence may be taken for granted.

However, when running PeopleTools against a database that is using a linguistic sort, it is likely that a greater than, less than, or between comparison of values in memory may produce different results than if the same comparison were performed by the database.

Take, for instance, the following PeopleCode syntax:

If START_NAME < END_NAME then 
   Error("Start name must be less than end name");
End-If

Because all character comparisons in PeopleCode are performed based on the binary value of the character in Unicode (and not on the linguistic weight of the character), this code may produce unexpected results in languages where binary sorting is not sufficient. For example, if START_NAME had a value of Über and END_NAME had a value of Zeifang, this code produces the error as in the Unicode character set, the uppercase U with umlaut (Ü) appears after the uppercase Z. However, in a German sort, Ü should precede Z. If the database were created with a German sort order, this would be reflected by the database's sort if the same statement were reflected in SQL that is called from PeopleCode:

SQLExec("SELECT 'X' FROM PSLOCK WHERE :1 < :2"
        ,START_NAME, END_NAME, &X);
 
If &X != 'X' then
   Error("Start name must be less than end name");
End-If

The example shows that when you use a linguistically sorted database, these string comparisons may return different results when they are run through the database by using SQL than when they are performed in PeopleCode. Of course numerical values and dates always sort equivalently—this behavior is limited to the sorting of characters and character strings.

In some situations, you may want to disable the linguistic sort that is performed by the database on a statement-by-statement level and have the database perform the comparison at a binary level.

To force a SQL query to return sort-sensitive results based on a binary sort instead of a linguistic sort, a meta-SQL token, %BINARYSORT, is provided. Use this token is to wrap each column in an ORDER BY, less than, greater than, or BETWEEN operation where a binary comparison is required. For example, to return all employee names, ordered by last name in binary ordering, use the following SQL:

SELECT NAME FROM PS_PERSONAL_DATA
ORDER BY %BINARYSORT(NAME)

This may be useful if you are building an ordered array of names in memory that you plan to parse or manipulate with PeopleCode less than or greater than operators.

Similarly, to fetch a list of names from the database where the NAME field is greater than START_NAME by using a binary comparison that is parallel to that used in PeopleTools, use the following syntax:

SELECT NAME FROM PS_PERSONAL_DATA
WHERE %BINARYSORT(NAME) < %BINARYSORT(:1)

Similar constructs can be used with the BETWEEN predicate.

Note: The %BINARYSORT meta-SQL token ensures that the database evaluates the column that is wrapped by the token based on its binary value in the character set of the database. The sorting of this representation matches its binary sort position in PeopleTools only if the character set of the database contains the character in the same order as Unicode, which is used for binary representation of data in PeopleTools memory. Therefore, if you are running a US-ASCII, Latin-1 (ISO 8859-1), or Unicode database, the sorting of all alphabetic characters matches the Unicode sort in memory. However, if you are running a database that is encoded in EBCDIC or in Japanese Shift-JIS, the order may be markedly different, and you cannot rely on the %BINARYSORT meta-SQL token to match the binary order that is used within PeopleTools.