Asian Application Developer's Guide

Asian-Specific Utilities

This section describes functions for wide character and string input and output, character classification, and conversion functions for the Korean or Chinese character sets. Asian Solaris software implements a wide character library for handling Korean or Chinese character codes according to industry standards.

Routines that have Korean or Chinese language-specific dependency are in their own language-specific library, which is linked with the corresponding C compiler option:

Refer to the appropriate man pages for more information.

Asian Solaris software defines WC as a constant-width, four-byte code. WC uses the ANSI C data type wchar_t, which Solaris software defines in wchar.h as follows:

typedef long wchar_h;

In Solaris software, long is four bytes.

Conversion Utilities

The conversion functions described in this section are available, but you should use iconv() as a standard function.

Asian Solaris software provides facilities for various conversions, for example:

Programs using the general multibyte conversion utilities should include the header files widec.h and wctype.h.

Programs using general multibyte conversion utilities should include three header files: wctype.h, widec.h, plus one of the following locale-specific files:

The locale/xctype.h file declares the Korean or Chinese locale-specific routines, which have names of the form:

As with classification functions described in the previous section, the use of these previously mentioned functions can be controlled by the setlocale function (described elsewhere in this and other chapters).

Locale-specific conversion routines (such as Korean comptopack or Chinese cgbtoeuc are contained in a locale-specific library:

This library can be linked during compilation using the C compiler option:

Conversion Within a Codeset

The multibyte conversion functions are similar to the one-byte conversion functions toupper and tolower. These functions convert wide-characters to other wide characters. For more information on conversion routines, see the man pages for wconv(3) for all locales and for:

The following routines are in the regular Chinese C library:

Table B-5 Simplified Chinese Case Conversion Functions (declared in zh/xctype.h)

Function 

Description 

tocupper

Converts codeset 1 Roman lowercase to uppercase 

toclower

Converts codeset 1 Roman uppercase to lowercase 

Table B-6 Traditional Chinese Case Conversion Functions (declared in zh_TW/xctype.h)

Function 

Description 

tohupper

Converts codeset 1 Roman lowercase to uppercase 

tohlower

Converts codeset 1 Roman uppercase to lowercase 

Conversion Between Simplified Chinese Codesets

In the Simplified Chinese character sets, the Roman characters and numbers in codeset 0 are repeated in codeset 1. The following functions test wide characters.

Table B-7 Simplified Chinese Codeset Conversion Functions

Function 

Description 

atocgb

Converts alphabetic or numeric characters in ASCII (codeset 0) to the corresponding characters in GB-2312-80 (codeset 1). 

cgbtoa

Converts alphabetic or numeric characters in GB-2312-80 (codeset 1) to the corresponding characters in ASCII (codeset 0). 

For further information on these functions, see the man page for cconv()(3x).

Conversion for Korean Character Codes

The following routines perform character-based code conversion on the KS C 5601 character set. They convert characters between Completion code (or EUC format) and Combination code (or Packed code).

To use these routines, the library kle must be linked using the C compiler option -lkle. For more information, see the kconv(3x) man page.

Table B-8 Korean Code Conversion Functions

Routine 

Description 

comptopack

Converts a character in Completion code to Combination (Packed) code of KS C 5601-1987. 

packtocomp

Converts a character in Combination (Packed) code of KS C 5601-1987 to Completion code.  

wansuntojohap

Converts a character in Completion code to Combination (Packed) code of KS C 5601-1992. 

packtocomp

Converts a character in Combination (Packed) code of KS C 5601-1992 to Completion code . 

Conversion for Simplified Chinese Character Codes

The following routines do character-based code conversion on the GB-2312-80 character set. They convert characters and strings between EUC format and GB-2312-80 format. To use these routines, the library libcle must be linked using the C compiler option -lcle. For further information, see the cconv(3x) man page.

Table B-9 Simplified Chinese Character-Based Functions

Function 

Description 

cgbtoeuc

Converts a character in GB-2312-80 format (7 bit) to EUC format  

scgbtoeuc

Converts a string in GB-2312-80 format (7 bit) to EUC format  

sncgbtoeuc

Converts part of a string in GB-2312-80 format (7 bit) to EUC format 

euctocgb

Converts a character in EUC format to GB-2312-80 format (7 bit) 

seuctocgb

Converts a string in EUC format in GB-2312-80 format (7 bit) 

sneuctocgb

Converts a part of a string in EUC to GB-2312-80 format (7 bit) 

Conversion for Traditional Chinese Character Codes

The following routines perform character-based code conversion on the CNS-11643 character set. They convert CNS-11643 characters between CNS-11643, EUC, and Big5 formats. To use these routines, the library hle must be linked using the C compiler option -lhle. For more information, see the hconv(3x) man page.

Table B-10 Traditional Chinese Character-Based Functions

Function 

Description 

cbig5toeuc

Converts Big5 character to EUC character. 

ccnstoeuc

Converts CNS character to EUC character. 

ceuctobig5

Converts EUC character to Big5 character. 

ceuctocns

Converts EUC character to CNS character. 

Table B-11 Traditional Chinese String-Based Functions

Function 

Description 

big5toeuc

Converts Big5 string to EUC string. 

cnstoeuc

Converts CNS string to EUC string. 

euctobig5

Converts EUC string to Big5 string. 

euctocns

Converts EUC string to Big5 string.