This appendix contains information for making programs backward-compatible with earlier version of Asian Solaris Software. Every utility described is supported, but for this version of Solaris, you are encouraged to use the XPG4 internationalization APIs as described in Solaris Internationalization Guide for Developers.
These utilities test various aspects of the Korean or Chinese national standard character sets. Except Korean isksc, they also assume that the character being tested is part of the national standard character set:
Korean: KS C 5601
Traditional Chinese: CNS 11643
Simplified Chinese: GB-2312-80
The arguments for the functions in these tables must be a character in WC, wchar_t. For more information, see the appropriate man page for your locale:
Korean--kctype(3x)
Simplified Chinese--cctype(3x)
Traditional Chinese--hctype(3x)
Table B-1 Korean Character Classification Functions
Utility |
Description |
---|---|
isksc |
Returns true if it is in the KS C 5601 character set. |
iskroman |
Returns true if it is a Roman character as defined by the KS C 5636 character set. |
iskromannum |
Returns true if it is a Roman numeral symbol in the KS C 5601 character set. |
isksymbol |
Returns true if it is a Latin symbol or special character in the KS C 5601 character set. |
iskparen |
Returns true if it is a right or left parenthesis in the KS C 5601 character set. |
isklatin |
Returns true if it is a Latin letter character in the KS C 5601 character set. |
iskletter |
Returns true if it is a Korean vowel or consonant in the KS C 5601 character set. |
iskline |
Returns true if it is a ruled line symbol in the KS C 5601 character set. |
iskunit |
Returns true if is a unit character in KS C 5601. |
isksci |
Returns true if is a scientific symbol in KS C 5601. |
iskgen |
Returns true if it is a graphic or general symbol in the KS C 5601 character set. |
iskgreek |
Returns true if it is a Greek character in the KS C 5601 character set. |
iskrussian |
Returns true if it is a Russian character in the KS C 5601 character set. |
iskuser |
Returns true if the character is in the user-defined area of the KS C 5601 character set. |
iskhanja |
Returns true if it is an ideogram in KS C 5601. |
iskhangul |
Returns true if it is a Hangul phonogram in KS C 5601. |
iskkata |
Returns true if it is a Japanese Katakana character in the KS C 5601 character set. |
iskhira |
Returns true if it is a Japanese Hiragana character in the KS C 5601 character set. |
Table B-2 Simplified Chinese Character Classification Functions
Routine |
Description |
---|---|
ischanzi |
Returns true if it is a Hanzi ideogram in GB-2312-80. |
iscaccent |
Returns true if it is an accent notation in GB-2312-80. |
iscphonetic |
Returns true if it is a phonetic symbol in GB-2312-80. |
iscpinyin |
Returns true if it is a Pinyin symbol in GB-2312-80. |
iscalpha |
Returns true if it is a Roman alphabetic in GB-2312-80. |
iscdigit |
Returns true if it is a Roman digit in GB-2312-80. |
iscnumber |
Returns true if it is a number in GB-2312-80. |
isclower |
Returns true if it is a Roman lowercase in GB-2312-80. |
iscupper |
Returns true if it is a Roman uppercase in GB-2312-80. |
iscblank |
Returns true if it is a white space character from GB-2312-80. |
iscspace |
Returns true if it is a space character from GB-2312-80. |
iscgen |
Returns true if it is a graphic or general symbol in GB-2312-80. |
iscsci |
Returns true if it is a scientific symbol in GB-2312-80. |
iscline |
Returns true if it is a ruled line symbol in GB-2312-80. |
iscunit |
Returns true if it is a unit character in GB-2312-80. |
iscparen |
Returns true if it is a right or left parenthesis in GB-2312-80. |
iscpunct |
Returns true if it is a punctuation character in GB-2312-80. |
iscgreek |
Returns true if it is a Greek character in GB-2312-80. |
iscrussian |
Returns true if it is a Russian character in GB-2312-80. |
iscspecial |
Returns true if it is a Greek or Russian character in GB-2312-80. |
ischira |
Returns true if it is a Japanese Hiragana character in GB-2312-80. |
isckata |
Returns true if it is a Japanese Katakana character in GB-2312-80. |
For Simplified Chinese, two additional routines, iscgb and isceuc (Table B-3), test for characters from the GB-2312-80 character set. The iscgb routine expects a wide character, and isceuc expects a GB-2312-80 character in EUC format. For more information, see the cctype(3x) man page.
Table B-3 General Simplified Chinese General Character Classification Functions
Routine |
Description |
---|---|
iscgb |
Returns true if it is in GB-2312-80. |
isceuc |
Returns true if it is a GB-2312-80 character in EUC format. |
Table B-4 Traditional Chinese Character Classification Functions
Utility |
Description |
---|---|
ishalpha |
Returns true if it is a Roman character in the CNS 11643 character set. |
ishupper |
Returns true if it is an uppercase Roman character as defined by the CNS 11643 character set. |
ishlower |
Returns true if it is a lower case Roman character in the CNS 11643 character set. |
ishdigit |
Returns true if it is a number in the CNS 11643 character set. |
ishspace |
Returns true if it is the space character in the CNS 11643 character set. |
ishpunct |
Returns true if it is a punctuation character in the CNS 11643 character set. |
ishparen |
Returns true if it is a right or left parenthesis in the CNS 11643 character set. |
ishphontone |
Returns true if is a Mandarin phonetic tone. |
ishradical |
Returns true if is a Chinese character radical. |
ishline |
Returns true if it is a ruled line symbol in the CNS 11643 character set. |
ishunit |
Returns true if it is a unit character in the CNS 11643 character set. |
ishsci |
Returns true if it is a scientific symbol in the CNS 11643 character set. |
ishgen |
Returns true if it is a general symbol in the CNS 11643 character set. |
ishgreek |
Returns true if it is a Greek character in CNS 11643 character set. |
This section describes functions for wide character and string input and output, character classification, and conversion functions for the Korean or Chinese character sets. Asian Solaris software implements a wide character library for handling Korean or Chinese character codes according to industry standards.
Routines that have Korean or Chinese language-specific dependency are in their own language-specific library, which is linked with the corresponding C compiler option:
Korean Solaris libkle is linked with -lkle
Simplified Chinese Solaris libcle is linked with -lcle
Traditional Chinese Solaris libhle is linked with -lhle
Refer to the appropriate man pages for more information.
Asian Solaris software defines WC as a constant-width, four-byte code. WC uses the ANSI C data type wchar_t, which Solaris software defines in wchar.h as follows:
typedef long wchar_h;
In Solaris software, long is four bytes.
The conversion functions described in this section are available, but you should use iconv() as a standard function.
Asian Solaris software provides facilities for various conversions, for example:
Characters within a codeset, such as converting uppercase ASCII to lowercase.
Between different conventions for national standard character sets, such as:
Between Combination and Completion code, both KS C 5601-1987 and KS C 5601-1992.
Between GB and EUC.
Between CNS 11643 code and Big5.
Between code formats (such as converting between EUC and WC).
Programs using the general multibyte conversion utilities should include the header files widec.h and wctype.h.
Korean Solaris specific routines (such as iskxxx) are declared in ko/xctype.h.
Simplified Chinese Solaris specific routines (such as iscxxx) are declared in zh/xctype.h.
Traditional Chinese Solaris specific routines (such as ishxxx) are declared in zh_TW/xctype.h.
Programs using general multibyte conversion utilities should include three header files: wctype.h, widec.h, plus one of the following locale-specific files:
ko/xctype.h (Korean)
zh/xctype.h (Simplified Chinese)
zh_TW/xctype.h (Traditional Chinese)
The locale/xctype.h file declares the Korean or Chinese locale-specific routines, which have names of the form:
iskxxxx (Korean)
iscxxxx (Simplified Chinese)
ishxxxx (Traditional Chinese)
As with classification functions described in the previous section, the use of these previously mentioned functions can be controlled by the setlocale function (described elsewhere in this and other chapters).
Locale-specific conversion routines (such as Korean comptopack or Chinese cgbtoeuc are contained in a locale-specific library:
libkle (Korean)
libcle (Simplified Chinese)
libhle (Traditional Chinese)
This library can be linked during compilation using the C compiler option:
-lkle (Korean)
-lcle (Simplified Chinese)
-lhle (Traditional Chinese)
The multibyte conversion functions are similar to the one-byte conversion functions toupper and tolower. These functions convert wide-characters to other wide characters. For more information on conversion routines, see the man pages for wconv(3) for all locales and for:
kconv(3)--Korean
cconv(3)--Simplified Chinese
hconv(3)--Traditional Chinese
The following routines are in the regular Chinese C library:
Table B-5 Simplified Chinese Case Conversion Functions (declared in zh/xctype.h)
Function |
Description |
---|---|
tocupper |
Converts codeset 1 Roman lowercase to uppercase |
toclower |
Converts codeset 1 Roman uppercase to lowercase |
Table B-6 Traditional Chinese Case Conversion Functions (declared in zh_TW/xctype.h)
Function |
Description |
---|---|
tohupper |
Converts codeset 1 Roman lowercase to uppercase |
tohlower |
Converts codeset 1 Roman uppercase to lowercase |
In the Simplified Chinese character sets, the Roman characters and numbers in codeset 0 are repeated in codeset 1. The following functions test wide characters.
Table B-7 Simplified Chinese Codeset Conversion Functions
Function |
Description |
---|---|
atocgb |
Converts alphabetic or numeric characters in ASCII (codeset 0) to the corresponding characters in GB-2312-80 (codeset 1). |
cgbtoa |
Converts alphabetic or numeric characters in GB-2312-80 (codeset 1) to the corresponding characters in ASCII (codeset 0). |
For further information on these functions, see the man page for cconv()(3x).
The following routines perform character-based code conversion on the KS C 5601 character set. They convert characters between Completion code (or EUC format) and Combination code (or Packed code).
To use these routines, the library kle must be linked using the C compiler option -lkle. For more information, see the kconv(3x) man page.
Table B-8 Korean Code Conversion Functions
Routine |
Description |
---|---|
comptopack |
Converts a character in Completion code to Combination (Packed) code of KS C 5601-1987. |
packtocomp |
Converts a character in Combination (Packed) code of KS C 5601-1987 to Completion code. |
wansuntojohap |
Converts a character in Completion code to Combination (Packed) code of KS C 5601-1992. |
packtocomp |
Converts a character in Combination (Packed) code of KS C 5601-1992 to Completion code . |
The following routines do character-based code conversion on the GB-2312-80 character set. They convert characters and strings between EUC format and GB-2312-80 format. To use these routines, the library libcle must be linked using the C compiler option -lcle. For further information, see the cconv(3x) man page.
Table B-9 Simplified Chinese Character-Based Functions
Function |
Description |
---|---|
cgbtoeuc |
Converts a character in GB-2312-80 format (7 bit) to EUC format |
scgbtoeuc |
Converts a string in GB-2312-80 format (7 bit) to EUC format |
sncgbtoeuc |
Converts part of a string in GB-2312-80 format (7 bit) to EUC format |
euctocgb |
Converts a character in EUC format to GB-2312-80 format (7 bit) |
seuctocgb |
Converts a string in EUC format in GB-2312-80 format (7 bit) |
sneuctocgb |
Converts a part of a string in EUC to GB-2312-80 format (7 bit) |
The following routines perform character-based code conversion on the CNS-11643 character set. They convert CNS-11643 characters between CNS-11643, EUC, and Big5 formats. To use these routines, the library hle must be linked using the C compiler option -lhle. For more information, see the hconv(3x) man page.
Table B-10 Traditional Chinese Character-Based Functions
Function |
Description |
---|---|
cbig5toeuc |
Converts Big5 character to EUC character. |
ccnstoeuc |
Converts CNS character to EUC character. |
ceuctobig5 |
Converts EUC character to Big5 character. |
ceuctocns |
Converts EUC character to CNS character. |
Table B-11 Traditional Chinese String-Based Functions
Function |
Description |
---|---|
big5toeuc |
Converts Big5 string to EUC string. |
cnstoeuc |
Converts CNS string to EUC string. |
euctobig5 |
Converts EUC string to Big5 string. |
euctocns |
Converts EUC string to Big5 string. |