JFP 開発ガイド

日本語文字分類

文字分類の操作のための API のうち、日本語ロケールの文字集合の処理に有効な API を表 3-3、表 3-4 に示します。その他にも API が用意されています。詳しくはマニュアルページ (iswalpha(3C)、wctype(3C)、wctype_ja(3C)) などを参照してください。

表 3-3 文字分類 API その 1


`XPG で規定される``インタフェース名`	作用
`iswctype(wc, type)`	`wc` が type クラスに属するかどうか調べる
`wctype("タイプ名")`	`iswctype()` の第 2 引数を、タイプ名から作成する。 XPG で標準文字クラスとして規定されているものは以下のとおり
	`"alnum"` `"alpha"` `"cntrl"` `"digit""graph""lower"` `"print""punct""space""upper"` `"xdigit""blank"`

表 3-4 文字分類 API その 2


`日本語 Solaris で拡張された``インタフェース名`	`作用`
`wctype`("タイプ")	`iswctype()` の第 2 引数を、タイプ名から作成する。Solaris で日本語ロケール向けに拡張された文字クラスは以下のとおり
	`"jkanji"()` `"jkata"()` `"jhira"()` `"jdigit"()` `"jparen"()` `"jline"()` `"jisx0201r"()` `"jisx0208"()"jisx0212"()` `"udc"()"vdc"()` `"jalpha""jspecial"` `"jgreek""jrussian""junit"` `"jsci""jgen"` `"jpunct"`

プログラム例

日本語文字分類の API を用いたプログラム例を紹介します。前述の文字対応 API の場合と同様に、使用する場合には wchar.h ヘッダファイルを取り込むこと、また setlocale() を処理の最初の段階で呼び出して動作ロケールを適切に設定することが必要です。

例 3-3 では、入力ファイルをワイド文字列として読み込んで入力中の JIS X 0208 ひらがなとカタカナを交換し、JIS X 0208 数字文字は ASCII に変換して出力します。

例 3-3 文字分類

sun% cat my_charconv.c
/*
 * Read lines from a file and convert JIS X 0208 hiragana
 * characters to JIS X 0208 katakana characters, and
 * vice versa. In addition, JIS X 0208 digit characters
 * are converted to the correspondent ones in JIS X 0201
 * characters.
 * Conversion will stop if the input file reaches EOF.
 * It is assumed that each line has at most BUFSIZ - 1
 * wide char length.
 *
 * Actual conversion is done by my_charconv(), which does
 * the followings.
 *	1.	Get the length of the wide string.
 *	2.	Convert each wide char from the top
 *		of the string by applying towctrans().
 *		(The return value of towctrans() will be
 *		the same if there's no correspondent char.)
 *	3.	Write the correspondent wide char to
 *		original string and output it.
 */

#include <stdio.h>
#include <locale.h>
#include <wchar.h>
#include <wctype.h>
#include <jctype.h>
#include <errno.h>

#define			WRET		L'¥n'

static int my_charconv(wchar_t *);

int
main(int argc, char *argv[])
{
	wchar_t buf[BUFSIZ];
	wchar_t *headp, *nextp;
	long retval, total;

	setlocale(LC_ALL, "");
	total = retval = 0;

while (fgetws(buf, BUFSIZ, stdin) != (wchar_t *)NULL) {
		retval = my_charconv(buf);
		if (retval == -1) {
			perror("my_charconv()");
			exit(-1);
		}
		fprintf(stdout, "%S", buf);
	}

	return (0);
}

static int
my_charconv(wchar_t *wcp)
{
	size_t wstr_len;
	wint_t retval;
	int index;
	long ret_val;

	wstr_len = wcslen(wcp);
	for (index = 0; index < wstr_len; index++) {
		errno = 0;
		if (iswctype((wint_t)wcp[index], wctype("jhira")))
			retval = towctrans((wint_t)wcp[index], wctrans("tojkata"));
		else if (iswctype((wint_t)wcp[index], wctype("jkata")))
			retval = towctrans((wint_t)wcp[index], wctrans("tojhira"));
		else if (iswctype((wint_t)wcp[index], wctype("jdigit")))
			retval = towctrans((wint_t)wcp[index], wctrans("tojisx0201"));
		else
			retval = wcp[index];

		if (errno != 0)
			return (-1);
		wcp[index] = (wchar_t)retval;
	}

	return (0);
}
sun% cat file5ひらがなはかたかなに置換されます。
カタカナハヒラガナニ置換サレマス。
漢字、記号、全角ａｌｐｈａｂｅｔや
JIS X 0201 カナナドハ* 置換 サレマセン*。
sun% cc -o my_charconv my_charconv.c
sun% ./my_charconv < file5
ヒラガナハカタカナニ置換サレマス。
かたかなはひらがなに置換されます。
漢字、記号、全角ａｌｐｈａｂｅｔヤ
JIS X 0201 カナナドハ* 置換 サレマセン*。

注意 -

* の部分のカタカナは、半角カタカナになります。