Asian Application Developer's Guide

Asian-Specific Utilities

This section describes functions for wide character and string input and output, character classification, and conversion functions for the Korean or Chinese character sets. Asian Solaris software implements a wide character library for handling Korean or Chinese character codes according to industry standards.

Routines that have Korean or Chinese language-specific dependency are in their own language-specific library, which is linked with the corresponding C compiler option:

Korean Solaris libkle is linked with -lkle
Simplified Chinese Solaris libcle is linked with -lcle
Traditional Chinese Solaris libhle is linked with -lhle

Refer to the appropriate man pages for more information.

Asian Solaris software defines WC as a constant-width, four-byte code. WC uses the ANSI C data type wchar_t, which Solaris software defines in wchar.h as follows:

typedef long wchar_h;

In Solaris software, long is four bytes.

Conversion Utilities

The conversion functions described in this section are available, but you should use iconv() as a standard function.

Asian Solaris software provides facilities for various conversions, for example:

Characters within a codeset, such as converting uppercase ASCII to lowercase.
Between different conventions for national standard character sets, such as:
- Between Combination and Completion code, both KS C 5601-1987 and KS C 5601-1992.
- Between GB and EUC.
- Between CNS 11643 code and Big5.
Between code formats (such as converting between EUC and WC).

Programs using the general multibyte conversion utilities should include the header files widec.h and wctype.h.

Korean Solaris specific routines (such as iskxxx) are declared in ko/xctype.h.
Simplified Chinese Solaris specific routines (such as iscxxx) are declared in zh/xctype.h.
Traditional Chinese Solaris specific routines (such as ishxxx) are declared in zh_TW/xctype.h.

Programs using general multibyte conversion utilities should include three header files: wctype.h, widec.h, plus one of the following locale-specific files:

ko/xctype.h (Korean)
zh/xctype.h (Simplified Chinese)
zh_TW/xctype.h (Traditional Chinese)

The locale/xctype.h file declares the Korean or Chinese locale-specific routines, which have names of the form:

iskxxxx (Korean)
iscxxxx (Simplified Chinese)
ishxxxx (Traditional Chinese)

As with classification functions described in the previous section, the use of these previously mentioned functions can be controlled by the setlocale function (described elsewhere in this and other chapters).

Locale-specific conversion routines (such as Korean comptopack or Chinese cgbtoeuc are contained in a locale-specific library:

libkle (Korean)
libcle (Simplified Chinese)
libhle (Traditional Chinese)

This library can be linked during compilation using the C compiler option:

-lkle (Korean)
-lcle (Simplified Chinese)
-lhle (Traditional Chinese)

Conversion Within a Codeset

The multibyte conversion functions are similar to the one-byte conversion functions toupper and tolower. These functions convert wide-characters to other wide characters. For more information on conversion routines, see the man pages for wconv(3) for all locales and for:

kconv(3)--Korean
cconv(3)--Simplified Chinese
hconv(3)--Traditional Chinese

The following routines are in the regular Chinese C library:

Table B-5 Simplified Chinese Case Conversion Functions (declared in zh/xctype.h)


Function	Description
`tocupper`	Converts codeset 1 Roman lowercase to uppercase
`toclower`	Converts codeset 1 Roman uppercase to lowercase

Table B-6 Traditional Chinese Case Conversion Functions (declared in zh_TW/xctype.h)


Function	Description
`tohupper`	Converts codeset 1 Roman lowercase to uppercase
`tohlower`	Converts codeset 1 Roman uppercase to lowercase

Conversion Between Simplified Chinese Codesets

In the Simplified Chinese character sets, the Roman characters and numbers in codeset 0 are repeated in codeset 1. The following functions test wide characters.

Table B-7 Simplified Chinese Codeset Conversion Functions


Function	Description
`atocgb`	Converts alphabetic or numeric characters in ASCII (codeset 0) to the corresponding characters in GB-2312-80 (codeset 1).
`cgbtoa`	Converts alphabetic or numeric characters in GB-2312-80 (codeset 1) to the corresponding characters in ASCII (codeset 0).

For further information on these functions, see the man page for cconv()(3x).

Conversion for Korean Character Codes

The following routines perform character-based code conversion on the KS C 5601 character set. They convert characters between Completion code (or EUC format) and Combination code (or Packed code).

To use these routines, the library kle must be linked using the C compiler option -lkle. For more information, see the kconv(3x) man page.

Table B-8 Korean Code Conversion Functions


Routine	Description
`comptopack`	Converts a character in Completion code to Combination (Packed) code of KS C 5601-1987.
`packtocomp`	Converts a character in Combination (Packed) code of KS C 5601-1987 to Completion code.
`wansuntojohap`	Converts a character in Completion code to Combination (Packed) code of KS C 5601-1992.
`packtocomp`	Converts a character in Combination (Packed) code of KS C 5601-1992 to Completion code .

Conversion for Simplified Chinese Character Codes

The following routines do character-based code conversion on the GB-2312-80 character set. They convert characters and strings between EUC format and GB-2312-80 format. To use these routines, the library libcle must be linked using the C compiler option -lcle. For further information, see the cconv(3x) man page.

Table B-9 Simplified Chinese Character-Based Functions


Function	Description
`cgbtoeuc`	Converts a character in GB-2312-80 format (7 bit) to EUC format
`scgbtoeuc`	Converts a string in GB-2312-80 format (7 bit) to EUC format
`sncgbtoeuc`	Converts part of a string in GB-2312-80 format (7 bit) to EUC format
`euctocgb`	Converts a character in EUC format to GB-2312-80 format (7 bit)
`seuctocgb`	Converts a string in EUC format in GB-2312-80 format (7 bit)
`sneuctocgb`	Converts a part of a string in EUC to GB-2312-80 format (7 bit)

Conversion for Traditional Chinese Character Codes

The following routines perform character-based code conversion on the CNS-11643 character set. They convert CNS-11643 characters between CNS-11643, EUC, and Big5 formats. To use these routines, the library hle must be linked using the C compiler option -lhle. For more information, see the hconv(3x) man page.

Table B-10 Traditional Chinese Character-Based Functions


Function	Description
`cbig5toeuc`	Converts Big5 character to EUC character.
`ccnstoeuc`	Converts CNS character to EUC character.
`ceuctobig5`	Converts EUC character to Big5 character.
`ceuctocns`	Converts EUC character to CNS character.

Table B-11 Traditional Chinese String-Based Functions


Function	Description
`big5toeuc`	Converts Big5 string to EUC string.
`cnstoeuc`	Converts CNS string to EUC string.
`euctobig5`	Converts EUC string to Big5 string.
`euctocns`	Converts EUC string to Big5 string.