iconv_ja - codeset conversions for Japanese encodings
Iconv and cconv support conversions to and from a wide range of codesets.
The list below provides basic information about Japanese codesets supported. For information on other codesets, refer to iconv_unicode(7), iconv_extra(7), iconv_ko(7), iconv_zh(7), iconv_zh_TW(7), and iconv_zh_HK(7).
Following are the descriptions of Japanese codeset names used by iconv and cconv with aliases in parentheses where applicable:
|
Available iconv and cconv conversions in the current system can be obtained by running 'iconv -l' as described in the iconv(1) manual page.
Additional information on the mappings between canonical names and supported aliases with optional variant levels, refer to alias(5) manual page and /usr/lib/iconv/alias file.
iconv conversion modules
cconv code conversion binary tables for iconv(1), cconv(3C), and iconv(3C)
geniconvtbl conversions binary tables
alias table file of codeset names
geniconvtbl(1), iconv(1), cconv(3C), cconv_close(3C), cconv_open(3C), cconvctl(3C), iconv(3C), iconvctl(3C), alias(5), geniconvtbl(5), geniconvtbl-cconv(5), attributes(7), environ(7), iconv_extra(7), iconv_ko(7), iconv_unicode(7), iconv_zh(7), iconv_zh_HK(7), iconv_zh_TW(7)
Murai, J., M. Crispin, and E. van der Poel, Japanese Character Encoding for Internet Messages, RFC 1468, Keio University, Panda Programming, June 1993.
Ohta, M., Character Sets ISO-10646 and ISO-10646-J-1, RFC 1815, Tokyo Institute of Technology, July 1995.
Ohta, M., and K. Handa, ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP, RFC 1554, Tokyo Institute of Technology, December 1993.
Simonson, K., Character Mnemonics & Character Sets, RFC 1345, Rationel Almen Planlaegning, June 1992.
UI-OSF Japanese Localization Group, UI-OSF Application Platform Profile for Japanese Environment Version 1.1, May 1993.
ISO/IEC 2022:1994 Information technology -- Character code structure and extension techniques, 1994.
The user-defined characters are mapped to the corresponding values sequentially in the target codeset. When the codeset is Unicode encoding like UTF-8, it is mapped to the value in the Private Use Area (from U+E000 to U+F8FF). When there are no user-defined characters in the target codeset, they are mapped to the substitute character. When the source codeset has bigger user-defined characters' area than the target codeset, overflowed characters are mapped to the substitute character.
The vendor-defined character is mapped to the corresponding code value in the target codeset. If the target codeset does not have that value, it is replaced with the substitute character.
There are codesets which contain duplicated characters in the vendor-defined characters. The characters duplicated with the standard characters like JIS X 0208, they are mapped to those standard characters. In the PCK and CP932 codeset, there are duplicated characters between the NEC special characters and the IBM extended characters, those characters are mapped to the NEC special characters.
The substitute character is different in each codeset. It's a representation of the Unicode replacement character (U+FFFD) when the target codeset is Unicode encoding. It is question mark '?' when the target codeset is the ASCII-compatible or the EBCDIC-compatible.