Simplified Chinese Solaris User's Guide

Simplified Chinese Conversion Utilities

This section describes functions for wide character and string input and output, character classification, and conversion functions for the Simplified Chinese character sets. Solaris 2.7 software implements a wide character library for handling Simplified Chinese character codes according to industry standards.

Routines that have Chinese language-specific dependency are in their own language-specific library, which is linked with the corresponding C compiler option. Simplified Chinese Solaris libcle is linked with -lcle

Refer to the appropriate man pages for more information.

Asian Solaris software defines WC as a constant-width, four-byte code. WC uses the ANSI C data type wchar_t, which Solaris software defines in wchar.h as follows:

typedef long wchar_h;

In Solaris software, long is four bytes.

Conversion Utilities

The conversion functions described in this section are available, but you should use iconv() as a standard function.

Simplified Chinese Solaris software provides facilities for various conversions, for example:

Programs using the general multibyte conversion utilities should include the header files widec.h and wctype.h. Simplified Chinese Solaris specific routines (such as iscxxx) are declared in zh/xctype.h.

Programs using general multibyte conversion utilities should include three header files: wctype.h, widec.h, and zh/xctype.h.

The locale/xctype.h file declares the Chinese locale-specific routines, which have names of the form iscxxxx:

As with the classification functions described in the previous section, the use of these functions can be controlled by the setlocale function (described elsewhere in this and other chapters).

Locale-specific conversion routines (such as Chinese cgbtoeuc) are contained in the libcle library:

This library can be linked during compilation using the C compiler option -lcle.

Conversion Within a Code Set

The multibyte conversion functions are similar to the one-byte conversion functions toupper and tolower. These functions convert wide-characters to other wide characters. For more information on conversion routines, see the man pages for wconv(3) and cconv(3).

The following routines are in the regular Chinese C library.

Table 12–3 Simplified Chinese Case Conversion Functions (declared in zh/xctype.h)

Function 

Description 

tocupper

Converts code set1 Roman lowercase to uppercase 

toclower

Converts code set1 Roman uppercase to lowercase 

Conversion Between Simplified Chinese Code Sets

In the Simplified Chinese character sets, the Roman characters and numbers in code set 0 are repeated in code set 1. The following functions test wide characters.

Table 12–4 Simplified Chinese Code Set Conversion Functions

Function 

Description 

atocgb

Converts alphabetic or numeric characters in ASCII (code set0) to the corresponding characters in GB-2312-80 (code set1). 

cgbtoa

Converts alphabetic or numeric characters in GB-2312-80 (code set1) to the corresponding characters in ASCII (code set0). 

For further information on these functions, see the man page for cconv()(3x).

Conversion for Simplified Chinese Character Codes

The following routines do character-based code conversion on the GB-2312-80 character set. They convert characters and strings between EUC format and GB-2312-80 format. To use these routines, the library libcle must be linked using the C compiler option -lcle. For further information, see the cconv(3) man page.

Table 12–5 Simplified Chinese Character-Based Functions

Function 

Description 

cgbtoeuc

Converts a character in GB-2312-80 format (7 bit) to EUC format  

scgbtoeuc

Converts a string in GB-2312-80 format (7 bit) to EUC format  

sncgbtoeuc

Converts part of a string in GB-2312-80 format (7 bit) to EUC format 

euctocgb

Converts a character in EUC format to GB-2312-80 format (7 bit) 

seuctocgb

Converts a string in EUC format in GB-2312-80 format (7 bit) 

sneuctocgb

Converts a part of a string in EUC to GB-2312-80 format (7 bit)