Common Desktop Environment: Help System Author's and Programmer's Guide

Character Sets and Multibyte Characters

A character set determines how a computer's internal character codes (numbers) are mapped to recognizable characters. In most languages, single-byte characters are sufficient for representing an entire character set. However, there are some languages that use thousands of characters. These languages require two, three, or four bytes to represent each character uniquely.

Character sets supported by the Help System are listed in Table 14-1. However, some characters sets may not exist on all platforms.

Table 14-1 Common Desktop Environment Character Sets


Language	Character Set Name	Description

Western Europe and Americas	ISO-8859-1 HP-ROMAN8	ISO Latin 1 HP Roman
	IBM-850	PC Multi-lingual

Central Europe	ISO-8859-2	ISO Latin 2

Cyrillic	ISO-8859-5	ISO Latin/Cyrillic

Arabic	ISO-8859-6	ISO Latin/Arabic
	HP-ARABIC8	HP Arabic8
	IBM-1046	PC Arabic

Hebrew	ISO-8859-8	ISO Latin/Hebrew
	HP-HEBREW8	HP Hebrew8
	IBM-856	PC Hebrew

Greek	ISO-8859-7	ISO Latin/Greek
	HP GREEK8	HP Greek8

Turkish	ISO-8859-9	ISO Latin 5
	HP-TURKISH8	HP Turkish8

Japanese	EUC-JP	Japanese EUC (JISX0201, JISX0208, JISX0212)

	HP-SJIS	HP Japanese Shift JIS
	HP-KANA8	HP Japanese Katakana8 (JISX0201 1976)
	IBM-932	PC Japanese Shift JIS

Korean	EUC-KR	Korean EUC

Chinese	EUC-CN	Simplified Chinese EUC (China) (GB2312)
	EUC-TW	Traditional Chinese EUC (Taiwan) (CNS 11643.*)
	HP-BIG5	HP Traditional Chinese Big5
	HP-CCDC	HP Traditional Chinese CCDC
	HP-15CN	HP Traditional Chinese EUC

Thai	TIS-620	Thai

When writing HelpTag files, you may use multibyte characters for any help text. However, the HelpTag markup itself (tag names, entity names, IDs, and so on) must be entered using eight-bit characters