iconv_ja - man pages section 7: Standards, Environments, Macros, Character Sets, and Miscellany

Language:

iconv_ja (7)

Name

iconv_ja - codeset conversions for Japanese encodings

Description

Iconv and cconv support conversions to and from a wide range of codesets.

The list below provides basic information about Japanese codesets supported. For information on other codesets, refer to iconv_unicode(7), iconv_extra(7), iconv_ko(7), iconv_zh(7), iconv_zh_TW(7), and iconv_zh_HK(7).

Following are the descriptions of Japanese codeset names used by iconv and cconv with aliases in parentheses where applicable:

`Code Sets`	`Description`
`eucJP (EUC-JP)` `eucJP-S11 (EUC-JP-S11)`	Japanese EUC. See `eucJP`(7). It is Windows-compatible in the conversions from and to Unicode. `eucJP-S11` is a variant of `eucJP`, which is compatible with `eucJP` in Solaris 11 and earlier releases.
`PCK (Shift_JIS)` `PCK-S11`	PC Kanji. See PCK(7). It is Windows-compatible in the conversions from and to Unicode. `PCK-S11` is a variant of `PCK`, which is a compatible with `PCK` in Solaris 11 and earlier releases.
`ISO-2022-JP (JIS7)`	Code representation of the character sets `ISO 646 IRV` or `JIS X 0201` (both Roman and Katakana); `JIS X 0208`, and `JIS X 0212` using the designation sequences to `G0` specified by `ISO/IEC 2022` and `UI-OSF` Application Platform Profile for Japanese Environment Version 1.1.
`ISO-2022-JP.RFC1468`	Code representation of the character sets `ISO 646 IRV` or `JIS X 0201` (Roman only) and `JIS X 0208` using the designation sequences to `G0` specified by `RFC 1468`.
`JIS`	JIS 7-bit code used in `JLE`, `JFP 2.4`, and the preceding releases.
`IBM-930`, `IBM-931`, `IBM-939`, `IBM-5026`, `IBM-5035`	IBM codesets based on EBCDIC. IBM CCSID is prefixed with `"IBM-"`. For example, `"IBM-930"` represents the `IBM CCSID 930` codeset.
`IBMJ`, `IBMJ-EBCDIK`	IBM codesets based on EBCDIC. IBMJ and IBMJ-EBCDIK are different for `SBCS` (Single-Byte Character Set). The `SBCS` of `"IBMJ"` is `RFC 1345 IBM038` (EBCDIC-INT), and the SBCS of `"IBMJ-EBCDIK"` is `RFC 1345 IBM290` (EBCDIC-JP-kana), it's also known as EBCDIK. It contains Japanese half-width Katakana, but doesn't contain lowercase alphabet characters. The `DBCS` for both codeset is IBM code page (CPGID) 300.
`FujitsuJEF-ascii-code`, `FujitsuJEF-ascii-face`, `FujitsuJEF-kana-code`, `FujitsuJEF-kana-face`	Fujitsu `JEF` code. There are four variants for converting `SBCS` and characters mapped differently between `JIS C 6226` and `JIS X 0208`. In the `"-ascii"` variants, EBCDIC(ASCII) is used for `SBCS`, and in the `"-kana"` variants EBCDIC(Kana) is used for `SBCS`. With the `"-code"` variants, `JIS C 6226` characters are converted by code value, while with `"-face"` these characters are converted by character face.
`HitachiKEIS83`, `HitachiKEIS90`	Hitachi `KEIS83` and `KEIS90`. In the Solaris `iconv` implementation, the `SBCS` of this codeset is equivalent with the IBM code page 290, Japanese (Katakana) Extended.
`NECJIPS`	`NEC JIPS(J)`. In the Solaris `iconv` implementation, the `SBCS` of this codeset is equivalent with the IBM code 290, Japanese (Katakana) Extended.
`EUC-JIS-2004`	Extended `eucJP` codeset to support `JIS X 0213`. It does not contain `JIS X 0212` though it's contained `eucJP`.
`Shift_JIS-2004`	Extended `PCK` codeset to support `JIS X 0213`. All characters in `PCK` is contained `Shift_JIS-2004`.
`ISO-2022-JP.2004`	Extended `ISO-2022-JP` to support `JIS X 0213`. The two designator are added to designate `JIS X 0213` characters.
`UTF-8-CP932`	UTF-8 encoded Unicode which was converted from `CP932`.
`UTF-8-Java`	UTF-8 encoded Unicode, Java implementation. The user-defined characters and vendor-defined characters are not mapped in this codeset. They will be replaced with the substitute character when converting. See NOTES.

Available iconv and cconv conversions in the current system can be obtained by running 'iconv -l' as described in the iconv(1) manual page.

Additional information on the mappings between canonical names and supported aliases with optional variant levels, refer to alias(5) manual page and /usr/lib/iconv/alias file.

Files

/usr/lib/iconv/*.so: iconv conversion modules
/usr/lib/iconv/*.bt: cconv code conversion binary tables for iconv(1), cconv(3C), and iconv(3C)
/usr/lib/iconv/geniconvtbl/binarytables/*.bt: geniconvtbl conversions binary tables
/usr/lib/iconv/alias: alias table file of codeset names

Notes

The user-defined characters are mapped to the corresponding values sequentially in the target codeset. When the codeset is Unicode encoding like UTF-8, it is mapped to the value in the Private Use Area (from U+E000 to U+F8FF). When there are no user-defined characters in the target codeset, they are mapped to the substitute character. When the source codeset has bigger user-defined characters' area than the target codeset, overflowed characters are mapped to the substitute character.

The vendor-defined character is mapped to the corresponding code value in the target codeset. If the target codeset does not have that value, it is replaced with the substitute character.

There are codesets which contain duplicated characters in the vendor-defined characters. The characters duplicated with the standard characters like JIS X 0208, they are mapped to those standard characters. In the PCK and CP932 codeset, there are duplicated characters between the NEC special characters and the IBM extended characters, those characters are mapped to the NEC special characters.

The substitute character is different in each codeset. It's a representation of the Unicode replacement character (U+FFFD) when the target codeset is Unicode encoding. It is question mark '?' when the target codeset is the ASCII-compatible or the EBCDIC-compatible.

man pages section 7: Standards, Environments, Macros, Character Sets, and Miscellany

iconv_ja (7)

Name

Description

Files

See Also

Notes