Simplified Chinese Solaris User's Guide

Simplified Chinese Conversion Utilities

This section describes functions for wide character and string input and output, character classification, and conversion functions for the Simplified Chinese character sets. Solaris 2.7 software implements a wide character library for handling Simplified Chinese character codes according to industry standards.

Routines that have Chinese language-specific dependency are in their own language-specific library, which is linked with the corresponding C compiler option. Simplified Chinese Solaris libcle is linked with -lcle

Refer to the appropriate man pages for more information.

Asian Solaris software defines WC as a constant-width, four-byte code. WC uses the ANSI C data type wchar_t, which Solaris software defines in wchar.h as follows:

typedef long wchar_h;

In Solaris software, long is four bytes.

Conversion Utilities

The conversion functions described in this section are available, but you should use iconv() as a standard function.

Simplified Chinese Solaris software provides facilities for various conversions, for example:

Characters within a code set, such as converting uppercase ASCII to lowercase.
Between different conventions for national standard character sets, such as GB and EUC.
Between code formats (such as converting between EUC and WC).

Programs using the general multibyte conversion utilities should include the header files widec.h and wctype.h. Simplified Chinese Solaris specific routines (such as iscxxx) are declared in zh/xctype.h.

Programs using general multibyte conversion utilities should include three header files: wctype.h, widec.h, and zh/xctype.h.

The locale/xctype.h file declares the Chinese locale-specific routines, which have names of the form iscxxxx:

As with the classification functions described in the previous section, the use of these functions can be controlled by the setlocale function (described elsewhere in this and other chapters).

Locale-specific conversion routines (such as Chinese cgbtoeuc) are contained in the libcle library:

This library can be linked during compilation using the C compiler option -lcle.

Conversion Within a Code Set

The multibyte conversion functions are similar to the one-byte conversion functions toupper and tolower. These functions convert wide-characters to other wide characters. For more information on conversion routines, see the man pages for wconv(3) and cconv(3).

The following routines are in the regular Chinese C library.

Table 12–3 Simplified Chinese Case Conversion Functions (declared in zh/xctype.h)


Function	Description
`tocupper`	Converts code set1 Roman lowercase to uppercase
`toclower`	Converts code set1 Roman uppercase to lowercase

Conversion Between Simplified Chinese Code Sets

In the Simplified Chinese character sets, the Roman characters and numbers in code set 0 are repeated in code set 1. The following functions test wide characters.

Table 12–4 Simplified Chinese Code Set Conversion Functions


Function	Description
`atocgb`	Converts alphabetic or numeric characters in ASCII (code set0) to the corresponding characters in GB-2312-80 (code set1).
`cgbtoa`	Converts alphabetic or numeric characters in GB-2312-80 (code set1) to the corresponding characters in ASCII (code set0).

For further information on these functions, see the man page for cconv()(3x).

Conversion for Simplified Chinese Character Codes

The following routines do character-based code conversion on the GB-2312-80 character set. They convert characters and strings between EUC format and GB-2312-80 format. To use these routines, the library libcle must be linked using the C compiler option -lcle. For further information, see the cconv(3) man page.

Table 12–5 Simplified Chinese Character-Based Functions


Function	Description
`cgbtoeuc`	Converts a character in GB-2312-80 format (7 bit) to EUC format
`scgbtoeuc`	Converts a string in GB-2312-80 format (7 bit) to EUC format
`sncgbtoeuc`	Converts part of a string in GB-2312-80 format (7 bit) to EUC format
`euctocgb`	Converts a character in EUC format to GB-2312-80 format (7 bit)
`seuctocgb`	Converts a string in EUC format in GB-2312-80 format (7 bit)
`sneuctocgb`	Converts a part of a string in EUC to GB-2312-80 format (7 bit)