Asian Application Developer's Guide

Appendix B Backward Compatibility Information

This appendix contains information for making programs backward-compatible with earlier version of Asian Solaris Software. Every utility described is supported, but for this version of Solaris, you are encouraged to use the XPG4 internationalization APIs as described in Solaris Internationalization Guide for Developers.

Asian Locale-Specific Utilities

These utilities test various aspects of the Korean or Chinese national standard character sets. Except Korean isksc, they also assume that the character being tested is part of the national standard character set:

The arguments for the functions in these tables must be a character in WC, wchar_t. For more information, see the appropriate man page for your locale:

Table B-1 Korean Character Classification Functions

Utility 

Description 

isksc

Returns true if it is in the KS C 5601 character set.  

iskroman

Returns true if it is a Roman character as defined by the KS C 5636 character set.  

iskromannum

Returns true if it is a Roman numeral symbol in the KS C 5601 character set.  

isksymbol

Returns true if it is a Latin symbol or special character in the KS C 5601 character set.  

iskparen

Returns true if it is a right or left parenthesis in the KS C 5601 character set.  

isklatin

Returns true if it is a Latin letter character in the KS C 5601 character set.  

iskletter

Returns true if it is a Korean vowel or consonant in the KS C 5601 character set.  

iskline

Returns true if it is a ruled line symbol in the KS C 5601 character set.  

iskunit

Returns true if is a unit character in KS C 5601.  

isksci

Returns true if is a scientific symbol in KS C 5601.  

iskgen

Returns true if it is a graphic or general symbol in the KS C 5601 character set. 

iskgreek

Returns true if it is a Greek character in the KS C 5601 character set.  

iskrussian

Returns true if it is a Russian character in the KS C 5601 character set.  

iskuser

Returns true if the character is in the user-defined area of the KS C 5601 character set.  

iskhanja

Returns true if it is an ideogram in KS C 5601.  

iskhangul

Returns true if it is a Hangul phonogram in KS C 5601.  

iskkata

Returns true if it is a Japanese Katakana character in the KS C 5601 character set.  

iskhira

Returns true if it is a Japanese Hiragana character in the KS C 5601 character set.  

Table B-2 Simplified Chinese Character Classification Functions

Routine 

Description 

ischanzi

Returns true if it is a Hanzi ideogram in GB-2312-80. 

iscaccent

Returns true if it is an accent notation in GB-2312-80. 

iscphonetic

Returns true if it is a phonetic symbol in GB-2312-80. 

iscpinyin

Returns true if it is a Pinyin symbol in GB-2312-80. 

iscalpha

Returns true if it is a Roman alphabetic in GB-2312-80. 

iscdigit

Returns true if it is a Roman digit in GB-2312-80. 

iscnumber

Returns true if it is a number in GB-2312-80. 

isclower

Returns true if it is a Roman lowercase in GB-2312-80. 

iscupper

Returns true if it is a Roman uppercase in GB-2312-80. 

iscblank

Returns true if it is a white space character from GB-2312-80. 

iscspace

Returns true if it is a space character from GB-2312-80. 

iscgen

Returns true if it is a graphic or general symbol in GB-2312-80. 

iscsci

Returns true if it is a scientific symbol in GB-2312-80. 

iscline

Returns true if it is a ruled line symbol in GB-2312-80. 

iscunit

Returns true if it is a unit character in GB-2312-80. 

iscparen

Returns true if it is a right or left parenthesis in GB-2312-80. 

iscpunct

Returns true if it is a punctuation character in GB-2312-80. 

iscgreek

Returns true if it is a Greek character in GB-2312-80. 

iscrussian

Returns true if it is a Russian character in GB-2312-80. 

iscspecial

Returns true if it is a Greek or Russian character in GB-2312-80. 

ischira

Returns true if it is a Japanese Hiragana character in GB-2312-80. 

isckata

Returns true if it is a Japanese Katakana character in GB-2312-80. 

For Simplified Chinese, two additional routines, iscgb and isceuc (Table B-3), test for characters from the GB-2312-80 character set. The iscgb routine expects a wide character, and isceuc expects a GB-2312-80 character in EUC format. For more information, see the cctype(3x) man page.

Table B-3 General Simplified Chinese General Character Classification Functions

Routine 

Description 

iscgb

Returns true if it is in GB-2312-80. 

isceuc

Returns true if it is a GB-2312-80 character in EUC format. 

Table B-4 Traditional Chinese Character Classification Functions

Utility 

Description 

ishalpha

Returns true if it is a Roman character in the CNS 11643 character set. 

ishupper

Returns true if it is an uppercase Roman character as defined by the CNS 11643 character set. 

ishlower

Returns true if it is a lower case Roman character in the CNS 11643 character set. 

ishdigit

Returns true if it is a number in the CNS 11643 character set. 

ishspace

Returns true if it is the space character in the CNS 11643 character set. 

ishpunct

Returns true if it is a punctuation character in the CNS 11643 character set. 

ishparen

Returns true if it is a right or left parenthesis in the CNS 11643 character set. 

ishphontone

Returns true if is a Mandarin phonetic tone. 

ishradical

Returns true if is a Chinese character radical. 

ishline

Returns true if it is a ruled line symbol in the CNS 11643 character set. 

ishunit

Returns true if it is a unit character in the CNS 11643 character set. 

ishsci

Returns true if it is a scientific symbol in the CNS 11643 character set. 

ishgen

Returns true if it is a general symbol in the CNS 11643 character set. 

ishgreek

Returns true if it is a Greek character in CNS 11643 character set. 

Asian-Specific Utilities

This section describes functions for wide character and string input and output, character classification, and conversion functions for the Korean or Chinese character sets. Asian Solaris software implements a wide character library for handling Korean or Chinese character codes according to industry standards.

Routines that have Korean or Chinese language-specific dependency are in their own language-specific library, which is linked with the corresponding C compiler option:

Refer to the appropriate man pages for more information.

Asian Solaris software defines WC as a constant-width, four-byte code. WC uses the ANSI C data type wchar_t, which Solaris software defines in wchar.h as follows:

typedef long wchar_h;

In Solaris software, long is four bytes.

Conversion Utilities

The conversion functions described in this section are available, but you should use iconv() as a standard function.

Asian Solaris software provides facilities for various conversions, for example:

Programs using the general multibyte conversion utilities should include the header files widec.h and wctype.h.

Programs using general multibyte conversion utilities should include three header files: wctype.h, widec.h, plus one of the following locale-specific files:

The locale/xctype.h file declares the Korean or Chinese locale-specific routines, which have names of the form:

As with classification functions described in the previous section, the use of these previously mentioned functions can be controlled by the setlocale function (described elsewhere in this and other chapters).

Locale-specific conversion routines (such as Korean comptopack or Chinese cgbtoeuc are contained in a locale-specific library:

This library can be linked during compilation using the C compiler option:

Conversion Within a Codeset

The multibyte conversion functions are similar to the one-byte conversion functions toupper and tolower. These functions convert wide-characters to other wide characters. For more information on conversion routines, see the man pages for wconv(3) for all locales and for:

The following routines are in the regular Chinese C library:

Table B-5 Simplified Chinese Case Conversion Functions (declared in zh/xctype.h)

Function 

Description 

tocupper

Converts codeset 1 Roman lowercase to uppercase 

toclower

Converts codeset 1 Roman uppercase to lowercase 

Table B-6 Traditional Chinese Case Conversion Functions (declared in zh_TW/xctype.h)

Function 

Description 

tohupper

Converts codeset 1 Roman lowercase to uppercase 

tohlower

Converts codeset 1 Roman uppercase to lowercase 

Conversion Between Simplified Chinese Codesets

In the Simplified Chinese character sets, the Roman characters and numbers in codeset 0 are repeated in codeset 1. The following functions test wide characters.

Table B-7 Simplified Chinese Codeset Conversion Functions

Function 

Description 

atocgb

Converts alphabetic or numeric characters in ASCII (codeset 0) to the corresponding characters in GB-2312-80 (codeset 1). 

cgbtoa

Converts alphabetic or numeric characters in GB-2312-80 (codeset 1) to the corresponding characters in ASCII (codeset 0). 

For further information on these functions, see the man page for cconv()(3x).

Conversion for Korean Character Codes

The following routines perform character-based code conversion on the KS C 5601 character set. They convert characters between Completion code (or EUC format) and Combination code (or Packed code).

To use these routines, the library kle must be linked using the C compiler option -lkle. For more information, see the kconv(3x) man page.

Table B-8 Korean Code Conversion Functions

Routine 

Description 

comptopack

Converts a character in Completion code to Combination (Packed) code of KS C 5601-1987. 

packtocomp

Converts a character in Combination (Packed) code of KS C 5601-1987 to Completion code.  

wansuntojohap

Converts a character in Completion code to Combination (Packed) code of KS C 5601-1992. 

packtocomp

Converts a character in Combination (Packed) code of KS C 5601-1992 to Completion code . 

Conversion for Simplified Chinese Character Codes

The following routines do character-based code conversion on the GB-2312-80 character set. They convert characters and strings between EUC format and GB-2312-80 format. To use these routines, the library libcle must be linked using the C compiler option -lcle. For further information, see the cconv(3x) man page.

Table B-9 Simplified Chinese Character-Based Functions

Function 

Description 

cgbtoeuc

Converts a character in GB-2312-80 format (7 bit) to EUC format  

scgbtoeuc

Converts a string in GB-2312-80 format (7 bit) to EUC format  

sncgbtoeuc

Converts part of a string in GB-2312-80 format (7 bit) to EUC format 

euctocgb

Converts a character in EUC format to GB-2312-80 format (7 bit) 

seuctocgb

Converts a string in EUC format in GB-2312-80 format (7 bit) 

sneuctocgb

Converts a part of a string in EUC to GB-2312-80 format (7 bit) 

Conversion for Traditional Chinese Character Codes

The following routines perform character-based code conversion on the CNS-11643 character set. They convert CNS-11643 characters between CNS-11643, EUC, and Big5 formats. To use these routines, the library hle must be linked using the C compiler option -lhle. For more information, see the hconv(3x) man page.

Table B-10 Traditional Chinese Character-Based Functions

Function 

Description 

cbig5toeuc

Converts Big5 character to EUC character. 

ccnstoeuc

Converts CNS character to EUC character. 

ceuctobig5

Converts EUC character to Big5 character. 

ceuctocns

Converts EUC character to CNS character. 

Table B-11 Traditional Chinese String-Based Functions

Function 

Description 

big5toeuc

Converts Big5 string to EUC string. 

cnstoeuc

Converts CNS string to EUC string. 

euctobig5

Converts EUC string to Big5 string. 

euctocns

Converts EUC string to Big5 string.