JFP 開発ガイド

複数バイト・ワイド文字の相互変換

表 4-1 で主な複数バイト・ワイド文字の相互変換のための API を紹介します。この他、printf(3S)、scanf(3S) などのマニュアルページも参照してください。

表 4-1 複数バイト・ワイド文字相互変換 API


`インタフェース名`	`作用`
`mbtowc(pwc,s,n)`	`s` の先頭から最大 `n` バイト調べ、複数バイト 1 文字分をワイド文字表現にして `pwc` へ格納
`mbstowcs(pwcs,s,n)`	`s` の先頭から複数バイト文字列をワイド文字列に変換する。最大 `n` ワイド文字変換したら終了
`wctomb(s,wc)`	ワイド文字 `wc` を複数バイト表現に変換し `s` へ格納
`wcstombs(s,pwcs,n)`	`pwcs` からワイド文字列を複数バイト表現に変換しながら `s` に格納。変換した複数バイトの合計が最大 `n` バイトになれば終了
`mblen(s,n)`	`s` の先頭から最大 `n` バイト調べ複数バイト 1 文字分を構成するバイト数を返す
`fgetwc(stream)`	入力ストリームから 1 複数バイト分を読み込みワイド文字表現で返す
`ungetwc(wc, stream)`	ワイド文字 `wc` を `stream` へプッシュバックする
`fgetws(ws,n,stream)`	入力ストリーム `stream` から複数バイト文字列を読み込み、最大 `n-1` ワイド文字分を `ws` に格納する
`fputwc(wc,stream)`	出力ストリーム `stream` へワイド文字 `wc` を出力
`fputws(ws,stream)`	出力ストリーム `stream` へワイド文字列 `ws` を出力

プログラム例

例 4-1 では、あるファイルに対してこれらの API を適用した複数バイト・ワイド文字の相互変換のプログラム例を紹介します。これらの API を使用する場合は、適切なヘッダファイルを取り込むこと (mbtowc()、mbstowcs()、wctomb()、wcstombs()、mblen() は stdlib.h を、ungetwc()、fgetws()、fputwc()、fputws() は wchar.h を取り込む) および setlocale() を処理の最初の段階で呼び出して動作ロケールを適切に設定することが必要です。

例 4-1 複数バイト・ワイド文字の相互変換

sun% cat my_mbwc.c

/*
 * Read lines from stdin and
 * count the number of chars
 * that belong to specific category.
 * Counting will stop if input reaches
 * EOF. It is assumed that each line
 * has at most BUFSIZ - 1 byte length.
 *
 * To categorize each chars, iswctype()
 * is used. Therefore, it is necessary
 * to convert the input multibyte buffer
 * to the wide char buffer. mbstowcs()
 * is called for that purpose.
 */

#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
#include <wchar.h>

static char mbbuf[BUFSIZ];
static wchar_t wcbuf[BUFSIZ];

int
main(int argc, char *argv[])
{
        size_t retval;
        int i, alpha_char, ideo_char, kana_char, other_char;
        
        setlocale(LC_ALL, "");
        alpha_char = ideo_char = kana_char = other_char = 0;
        while(fgets(mbbuf, BUFSIZ, stdin) != NULL) {
                retval = mbstowcs(wcbuf, mbbuf, BUFSIZ);
                if (retval == (size_t)-1) {
                        fprintf(stderr, "Invalid char is found during mbstowcs()¥n");
                        exit(-1);
                }
                retval = wcslen((const wchar_t *)wcbuf);
                for (i = 0; i < retval; i++) {
                        if (iswctype(wcbuf[i], wctype("jisx0201r"))) {
                        } else if (iswctype(wcbuf[i], wctype("alpha"))) {
                                alpha_char++;
                        } else {
                                other_char++;
                        }
                }
        }
        fprintf(stdout, "The input consist of¥n");
        fprintf(stdout, "%d Alphabetical chars,¥n", alpha_char);
        fprintf(stdout, "%d JIS X 0208 Kanji chars,¥n", ideo_char);
        fprintf(stdout, "%d JIS X 0201 Kana chars and¥n", kana_char);
        fprintf(stdout, "%d other chars.¥n", other_char);
        return(0);
}
sun% cc -o my_mbwc my_mbwc.c
sun% cat file6
/* Here's the content of file3 */
新しいシステム*は現在のネットワーク*環境を変えることなく
インターネット*とのシームレス*な接続を可能にします。また
セキュリティ*の問題も新しい認証テクノロジー*を用いることで
アドミニストレータ*の負担を減らしています。
/* Here's the content of file5 */
ひらがなはかたかなに置換されます。
カタカナハヒラガナニ置換サレマス。
漢字、記号、全角ａｌｐｈａｂｅｔや
JIS X 0201 カナナドハ* 置換 サレマセン*。
sun% ./my_mbwc < file6
The input consist of
54 Alphabetical chars,
31 JIS X 0208 Kanji chars,
56 JIS X 0201 Kana chars and
117 other chars.

注意 -

* の部分のカタカナは、半角カタカナになります。