Asian Application Developer's Guide

EUC Definition

EUC is composed of one primary codeset and three supplementary codesets. The primary codeset, codeset 0, is used for ASCII. The three supplementary codesets (codesets 1, 2, and 3) can be assigned to different character sets by the user. There is a system default assignment for these codesets.

The primary codeset is defined to use single bytes with the most significant bit (MSB) set to zero. The supplementary codesets can use multiple bytes, and the MSB of each byte is set to one. Codesets 2 and 3 have a preceding single-shift character, known as SS2 (0x8E) in codeset 2 and SS3 (0x8F) in codeset 3. Differentiating between codesets is done as follows: If the MSB is 0, the code is one-byte ASCII. If the MSB is 1, the byte is checked (SS2 or SS3) to determine which codeset it belongs to. The length in bytes of characters from that codeset is retrieved from an ANSI localization table governing character classification, and that number of bytes is read in.

Table 2-1 EUC Codeset Representations

Codeset  

EUC Representation  

codeset 0  

0xxxxxxx  

codeset 1  

1xxxxxxx -or-  

1xxxxxxx 1xxxxxxx -or-  

1xxxxxxx 1xxxxxxx 1xxxxxxx  

codeset 2  

SS2 1xxxxxxx -or-  

SS2 1xxxxxxx 1xxxxxxx -or-  

SS2 1xxxxxxx 1xxxxxxx 1xxxxxxx  

codeset 3  

SS3 1xxxxxxx -or-  

SS3 1xxxxxxx 1xxxxxxx -or-  

SS3 1xxxxxxx 1xxxxxxx 1xxxxxxx