Stateful and Stateless Conversions (Common Desktop Environment: Internationalization Programmer's Guide)

Common Desktop Environment: Internationalization Programmer's Guide

Stateful and Stateless Conversions

Code sets can be classified into two categories: stateful encodings and stateless encodings.

Stateful Encodings

Stateful encoding uses sequences of control codes, such as shift-in/shift-out, to change character sets associated with specific code values.

For instance, under compound text, the control sequence "ESC$(B" can be used to indicate the start of Japanese 16-bit data in a data stream of characters, and "ESC(B" can be used to indicate the end of this double-byte character data and the start of 8-bit ASCII data. Under this stateful encoding, the bit value 0x43 could not be interpreted without knowing the shift state. The EBCDIC Asian code sets use shift-in/shift-out controls to swap between double- and single-byte encodings, respectively.

Converters that are written to do the conversion of stateful encodings to other code sets tend to be a little complex due to the extra processing needed.

Stateless Encodings

Stateless code sets are those that can be classified as one of two types:

Single-byte code sets, such as the ISO8859 family
Multibyte code sets, such as PC codes for Japanese and Shift-JIS (SJIS)

The term multibyte code sets is also used to refer to any code set that needs one or more bytes to encode a character; multibyte code sets are considered stateless.

Note -

Conversions are meaningful only if the code sets represent the same character set.