Common Desktop Environment: Internationalization Programmer's Guide

Interchange Concepts

This section describes the way 8-bit user names and 8-bit data can be communicated on a network for communications utilities, such as ftp, mail, or interclient communication between the desktop clients.

There are three primary considerations for communicating data:

If the remote host uses the same code set as the local host, the following is true:

If the remote host's code set is different from that of the local host, the following two cases may apply. The conversion needed is dependent on the specific protocol used.

iconv Interface

In a network environment, the code sets of the communicating systems and the protocols of communication determine the transformation of user-specified data so that it can be sent to the remote system in a meaningful way. The user data (not user names) may need to be transformed from the sender's code set to the receiver's code set, or 8-bit data may need to be transformed into a 7-bit form to conform to protocols. A uniform interface is needed to accomplish this.

In the following examples, using the iconv() interface is illustrated by explaining how to use iconv_open(), iconv(), and iconv_close(). To do the conversion, iconv_open() must be followed by iconv(). The terms 7-bit interchange and 8-bit interchange are used to refer to any interchange encoding used for 7-bit and 8-bit data, respectively.

Sender and Receiver Use the Same Code Sets:
Sender and Receiver Use Different Code Sets:

Stateful and Stateless Conversions

Code sets can be classified into two categories: stateful encodings and stateless encodings.

Stateful Encodings

Stateful encoding uses sequences of control codes, such as shift-in/shift-out, to change character sets associated with specific code values.

For instance, under compound text, the control sequence "ESC$(B" can be used to indicate the start of Japanese 16-bit data in a data stream of characters, and "ESC(B" can be used to indicate the end of this double-byte character data and the start of 8-bit ASCII data. Under this stateful encoding, the bit value 0x43 could not be interpreted without knowing the shift state. The EBCDIC Asian code sets use shift-in/shift-out controls to swap between double- and single-byte encodings, respectively.

Converters that are written to do the conversion of stateful encodings to other code sets tend to be a little complex due to the extra processing needed.

Stateless Encodings

Stateless code sets are those that can be classified as one of two types:

The term multibyte code sets is also used to refer to any code set that needs one or more bytes to encode a character; multibyte code sets are considered stateless.


Note -

Conversions are meaningful only if the code sets represent the same character set.