Go to main content

man pages section 7: Standards, Environments, Macros, Character Sets, and Miscellany

Exit Print View

Updated: Wednesday, July 27, 2022
 
 

iconv_ja (7)

Name

iconv_ja - codeset conversions for Japanese encodings

Description

Iconv and cconv support conversions to and from a wide range of codesets.

The list below provides basic information about Japanese codesets supported. For information on other codesets, refer to iconv_unicode(7), iconv_extra(7), iconv_ko(7), iconv_zh(7), iconv_zh_TW(7), and iconv_zh_HK(7).

Following are the descriptions of Japanese codeset names used by iconv and cconv with aliases in parentheses where applicable:

Code Sets
Description
eucJP (EUC-JP)
eucJP-S11 (EUC-JP-S11)
Japanese EUC. See eucJP(7). It is Windows-compatible in the conversions from and to Unicode. eucJP-S11 is a variant of eucJP, which is compatible with eucJP in Solaris 11 and earlier releases.
PCK (Shift_JIS)
PCK-S11
PC Kanji. See PCK(7). It is Windows-compatible in the conversions from and to Unicode. PCK-S11 is a variant of PCK, which is a compatible with PCK in Solaris 11 and earlier releases.
ISO-2022-JP (JIS7)
Code representation of the character sets ISO 646 IRV or JIS X 0201 (both Roman and Katakana); JIS X 0208, and JIS X 0212 using the designation sequences to G0 specified by ISO/IEC 2022 and UI-OSF Application Platform Profile for Japanese Environment Version 1.1.
ISO-2022-JP.RFC1468
Code representation of the character sets ISO 646 IRV or JIS X 0201 (Roman only) and JIS X 0208 using the designation sequences to G0 specified by RFC 1468.
JIS
JIS 7-bit code used in JLE, JFP 2.4, and the preceding releases.
IBM-930, IBM-931, IBM-939, IBM-5026, IBM-5035
IBM codesets based on EBCDIC. IBM CCSID is prefixed with "IBM-". For example, "IBM-930" represents the IBM CCSID 930 codeset.
IBMJ, IBMJ-EBCDIK
IBM codesets based on EBCDIC. IBMJ and IBMJ-EBCDIK are different for SBCS (Single-Byte Character Set). The SBCS of "IBMJ" is RFC 1345 IBM038 (EBCDIC-INT), and the SBCS of "IBMJ-EBCDIK" is RFC 1345 IBM290 (EBCDIC-JP-kana), it's also known as EBCDIK. It contains Japanese half-width Katakana, but doesn't contain lowercase alphabet characters. The DBCS for both codeset is IBM code page (CPGID) 300.
FujitsuJEF-ascii-code, FujitsuJEF-ascii-face, FujitsuJEF-kana-code, FujitsuJEF-kana-face
Fujitsu JEF code. There are four variants for converting SBCS and characters mapped differently between JIS C 6226 and JIS X 0208. In the "-ascii" variants, EBCDIC(ASCII) is used for SBCS, and in the "-kana" variants EBCDIC(Kana) is used for SBCS. With the "-code" variants, JIS C 6226 characters are converted by code value, while with "-face" these characters are converted by character face.
HitachiKEIS83, HitachiKEIS90
Hitachi KEIS83 and KEIS90. In the Solaris iconv implementation, the SBCS of this codeset is equivalent with the IBM code page 290, Japanese (Katakana) Extended.
NECJIPS
NEC JIPS(J). In the Solaris iconv implementation, the SBCS of this codeset is equivalent with the IBM code 290, Japanese (Katakana) Extended.
EUC-JIS-2004
Extended eucJP codeset to support JIS X 0213. It does not contain JIS X 0212 though it's contained eucJP.
Shift_JIS-2004
Extended PCK codeset to support JIS X 0213. All characters in PCK is contained Shift_JIS-2004.
ISO-2022-JP.2004
Extended ISO-2022-JP to support JIS X 0213. The two designator are added to designate JIS X 0213 characters.
UTF-8-CP932
UTF-8 encoded Unicode which was converted from CP932.
UTF-8-Java
UTF-8 encoded Unicode, Java implementation. The user-defined characters and vendor-defined characters are not mapped in this codeset. They will be replaced with the substitute character when converting. See NOTES.

Available iconv and cconv conversions in the current system can be obtained by running 'iconv -l' as described in the iconv(1) manual page.

Additional information on the mappings between canonical names and supported aliases with optional variant levels, refer to alias(5) manual page and /usr/lib/iconv/alias file.

Files

/usr/lib/iconv/*.so

iconv conversion modules

/usr/lib/iconv/*.bt

cconv code conversion binary tables for iconv(1), cconv(3C), and iconv(3C)

/usr/lib/iconv/geniconvtbl/binarytables/*.bt

geniconvtbl conversions binary tables

/usr/lib/iconv/alias

alias table file of codeset names

See Also

geniconvtbl(1), iconv(1), cconv(3C), cconv_close(3C), cconv_open(3C), cconvctl(3C), iconv(3C), iconvctl(3C), alias(5), geniconvtbl(5), geniconvtbl-cconv(5), attributes(7), environ(7), iconv_extra(7), iconv_ko(7), iconv_unicode(7), iconv_zh(7), iconv_zh_HK(7), iconv_zh_TW(7)

Murai, J., M. Crispin, and E. van der Poel, Japanese Character Encoding for Internet Messages, RFC 1468, Keio University, Panda Programming, June 1993.

Ohta, M., Character Sets ISO-10646 and ISO-10646-J-1, RFC 1815, Tokyo Institute of Technology, July 1995.

Ohta, M., and K. Handa, ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP, RFC 1554, Tokyo Institute of Technology, December 1993.

Simonson, K., Character Mnemonics & Character Sets, RFC 1345, Rationel Almen Planlaegning, June 1992.

UI-OSF Japanese Localization Group, UI-OSF Application Platform Profile for Japanese Environment Version 1.1, May 1993.

ISO/IEC 2022:1994 Information technology -- Character code structure and extension techniques, 1994.

Notes

The user-defined characters are mapped to the corresponding values sequentially in the target codeset. When the codeset is Unicode encoding like UTF-8, it is mapped to the value in the Private Use Area (from U+E000 to U+F8FF). When there are no user-defined characters in the target codeset, they are mapped to the substitute character. When the source codeset has bigger user-defined characters' area than the target codeset, overflowed characters are mapped to the substitute character.

The vendor-defined character is mapped to the corresponding code value in the target codeset. If the target codeset does not have that value, it is replaced with the substitute character.

There are codesets which contain duplicated characters in the vendor-defined characters. The characters duplicated with the standard characters like JIS X 0208, they are mapped to those standard characters. In the PCK and CP932 codeset, there are duplicated characters between the NEC special characters and the IBM extended characters, those characters are mapped to the NEC special characters.

The substitute character is different in each codeset. It's a representation of the Unicode replacement character (U+FFFD) when the target codeset is Unicode encoding. It is question mark '?' when the target codeset is the ASCII-compatible or the EBCDIC-compatible.