Wide-Character Classification Functions

Language:

The following functions are used for classification of wide-characters and return a non-zero value for TRUE, and 0 for FALSE. These functions check the given wide character against named character classes, such as alpha, lower, or jkana, which are defined in the LC_CTYPE category of the current locale. Therefore, these functions are locale sensitive.

iswalpha(): Test for an alphabetic wide-character
iswalnum(): Test for an alphanumeric wide character
iswascii(): Test whether a wide character represents a 7-bit US-ASCII character
iswblank(): Test for a blank wide character
iswcntrl(): Test for a control wide character
iswdigit(): Test for a decimal digit wide character
iswgraph(): Test for a visible wide character
iswlower(): Test for a lowercase letter wide character
iswprint(): Test for a printable wide character
iswpunct(): Test for a punctuation wide character
iswspace(): Test for a white-space wide character
iswupper(): Test for an uppercase letter wide character
iswxdigit(): Test for a hexadecimal digit wide character
isenglish(): Test for a wide character representing an English language character, excluding US-ASCII characters
isideogram(): Test for a wide character representing an ideographic language character, excluding US-ASCII characters
isnumber(): Test for wide character representing digit, excluding US-ASCII characters
isphonogram(): Test for a wide character representing a phonetic language character, excluding US-ASCII characters
isspecial(): Test for a wide character representing a special language character, excluding US-ASCII characters

The following character classes are defined in all the locales:

alnum
alpha
blank
cntrl
digit
graph
lower
print
punct
space
upper
xdigit

The isenglish(), isideogram(), isnumber(), isphonogram(), and isspecial() are legacy Oracle Solaris specific wide-character classification functions. The character classes for these functions are defined only in the following Asian locales: ko_KR.EUC, zh_CN.EUC, zh_CN.GBK, zh_CN.GB18030, zh_HK.BIG5HK, zh_TW.BIG5, and zh_TW.EUC and their variants. The return values will always be false when used in other locales including Unicode locales.

You can to query for a specific character class in a generic way by using the following functions:

wctype(): Define character class

iswctype(): Test character for specified class

Example 2-11 Querying Character Class of a Wide Character

In the following example, calls to the iswctype() and wctype() functions are used to check whether the given Unicode character belongs to the jhira character class . The jhira character class is from Japanese Hiragana script.

  wint_t  wc;
  int     ret;

  setlocale(LC_ALL, "ja_JP.UTF-8");

  /* "\xe3\x81\xba" is UTF-8 for HIRAGANA LETTER PE */
  ret = mbtowc(&wc, "\xe3\x81\xba", 3);
  if (ret == (size_t)-1) {
          /* Invalid character sequence. */
          :
  }

  if (iswctype(wc, wctype("jhira"))) {
          wprintf(L"'%c' is a hiragana character.\n", wc);
  }

The example will produce the following output:

ぺ is a hiragana character.