Common Desktop Environment: Help System Author's and Programmer's Guide

Character Sets and Multibyte Characters

A character set determines how a computer's internal character codes (numbers) are mapped to recognizable characters. In most languages, single-byte characters are sufficient for representing an entire character set. However, there are some languages that use thousands of characters. These languages require two, three, or four bytes to represent each character uniquely.

Character sets supported by the Help System are listed in Table 14-1. However, some characters sets may not exist on all platforms.

Table 14-1 Common Desktop Environment Character Sets

Language 

Character Set Name 

Description 

 

 

 

Western Europe and Americas 

ISO-8859-1 HP-ROMAN8 

ISO Latin 1 HP Roman  

 

IBM-850 

PC Multi-lingual 

 

 

 

Central Europe 

ISO-8859-2 

ISO Latin 2  

 

 

 

Cyrillic 

ISO-8859-5 

ISO Latin/Cyrillic 

 

 

 

Arabic 

ISO-8859-6 

ISO Latin/Arabic 

 

HP-ARABIC8 

HP Arabic8 

 

IBM-1046 

PC Arabic 

 

 

 

Hebrew 

ISO-8859-8 

ISO Latin/Hebrew 

 

HP-HEBREW8 

HP Hebrew8 

 

IBM-856 

PC Hebrew 

 

 

 

Greek 

ISO-8859-7 

ISO Latin/Greek 

 

HP GREEK8 

HP Greek8 

 

 

 

Turkish 

ISO-8859-9 

ISO Latin 5  

 

HP-TURKISH8 

HP Turkish8 

 

 

 

Japanese 

EUC-JP 

Japanese EUC (JISX0201, JISX0208, JISX0212) 

 

 

 

 

HP-SJIS 

HP Japanese Shift JIS 

 

HP-KANA8 

HP Japanese Katakana8 (JISX0201 1976) 

 

IBM-932 

PC Japanese Shift JIS 

 

 

 

Korean 

EUC-KR 

Korean EUC 

 

 

 

Chinese 

EUC-CN 

Simplified Chinese EUC (China) (GB2312) 

 

EUC-TW 

Traditional Chinese EUC (Taiwan) (CNS 11643.*) 

 

HP-BIG5 

HP Traditional Chinese Big5 

 

HP-CCDC 

HP Traditional Chinese CCDC 

 

HP-15CN 

HP Traditional Chinese EUC 

 

 

 

Thai 

TIS-620 

Thai 

When writing HelpTag files, you may use multibyte characters for any help text. However, the HelpTag markup itself (tag names, entity names, IDs, and so on) must be entered using eight-bit characters