Common Desktop Environment: Help System Author's and Programmer's Guide

Internationalization Factors

Several factors, which are explained in the following section, contribute to providing online help in the user's native language.

Character Sets and Multibyte Characters

A character set determines how a computer's internal character codes (numbers) are mapped to recognizable characters. In most languages, single-byte characters are sufficient for representing an entire character set. However, there are some languages that use thousands of characters. These languages require two, three, or four bytes to represent each character uniquely.

Character sets supported by the Help System are listed in Table 14-1. However, some characters sets may not exist on all platforms.

Table 14-1 Common Desktop Environment Character Sets


Language	Character Set Name	Description

Western Europe and Americas	ISO-8859-1 HP-ROMAN8	ISO Latin 1 HP Roman
	IBM-850	PC Multi-lingual

Central Europe	ISO-8859-2	ISO Latin 2

Cyrillic	ISO-8859-5	ISO Latin/Cyrillic

Arabic	ISO-8859-6	ISO Latin/Arabic
	HP-ARABIC8	HP Arabic8
	IBM-1046	PC Arabic

Hebrew	ISO-8859-8	ISO Latin/Hebrew
	HP-HEBREW8	HP Hebrew8
	IBM-856	PC Hebrew

Greek	ISO-8859-7	ISO Latin/Greek
	HP GREEK8	HP Greek8

Turkish	ISO-8859-9	ISO Latin 5
	HP-TURKISH8	HP Turkish8

Japanese	EUC-JP	Japanese EUC (JISX0201, JISX0208, JISX0212)

	HP-SJIS	HP Japanese Shift JIS
	HP-KANA8	HP Japanese Katakana8 (JISX0201 1976)
	IBM-932	PC Japanese Shift JIS

Korean	EUC-KR	Korean EUC

Chinese	EUC-CN	Simplified Chinese EUC (China) (GB2312)
	EUC-TW	Traditional Chinese EUC (Taiwan) (CNS 11643.*)
	HP-BIG5	HP Traditional Chinese Big5
	HP-CCDC	HP Traditional Chinese CCDC
	HP-15CN	HP Traditional Chinese EUC

Thai	TIS-620	Thai

When writing HelpTag files, you may use multibyte characters for any help text. However, the HelpTag markup itself (tag names, entity names, IDs, and so on) must be entered using eight-bit characters

Language and Territory Names

When choosing a language, you select both a character set and a language and territory name. The language and territory name is used to accommodate variations, such as currency and date format, for a given country or region.

The language and territory names supported by the Help System are listed in the following table. Before you choose a language, refer to your system documentation to identify the languages and character sets supported on your platform.

Table 14-2 Help System Language and Territory Names


Languages	Language/Territory Name	Language, Territory

Standards compliance
	C	C
	POSIX	C
Western Europe/Americas
	da_DK	Danish, Denmark
	de_AT	German, Austria
	de_CH	German, Switzerland
	de_DE	German, Germany
	en_AU	English, Australia
	en_CA	English, Canada
	en_DK	English, Denmark
	en_GB	English, U.K.
	en_IE	English, Ireland
	en_MY	English, Malaysia

	en_NZ	English, New Zealand
	en_US	English, USA
	es_AR	Spanish, Argentina
	es_BO	Spanish, Bolivia
	es_CL	Spanish, Chile
	es_CO	Spanish, Columbia
	es_CR	Spanish, Costa Rica
	es_EC	Spanish, Ecuador
	es_ES	Spanish, Spain
	es_GT	Spanish, Guatemala
	es_MX	Spanish, Mexico
	es_PE	Spanish, Peru
	es_UR	Spanish, Uruguay
	es_VE	Spanish, Venezuela
	et_EE	Estonian, Estonia
	fi_FI	Finnish, Finland
	fo_FO	Faroese, Faeroe Island
	fr_BE	French, Belgium
	fr_CA	French, Canada
	fr_CH	French, Switzerland
	fr_FR	French, France
	is_IS	Icelandic, Iceland
	it_CH	Italian, Switzerland
	it_IT	Italian, Italy
	kl_GL	Greenlandic, Greenland
	lt_LT	Lithuanian, Lithuania
	lv_LV	Latvian, Latvia
	nl_BE	Dutch, Belgium
	nl_NL	Dutch, The Netherlands
	no_NO	Norwegian, Norway
	pt_BR	Portuguese, Brazil
	pt_PT	Portuguese, Portugal
	sv_FI	Swedish, Finland
	sv_SE	Swedish, Sweden
Central Europe
	cs_CS	Czech
	hr_HR	Croatian, Croatia
	hu_HU	Hungarian, Hungary
	pl_PL	Polish, Poland
	ro_RO	Rumanian, Romania
	sh_YU	Serbocroatian, Yugoslavia
	si_CS	Slovenian
	si_SI	Slovenian
	sk_SK	Slovak
Cyrillic
	bg_BG	Bulgarian, Bulgaria
	mk_MK	Macedonian
	ru_RU	Russian
	ru_SU	Russian
	sp_YU	Serbian, Yugoslavia
Arabic [No ISO territory name exists for the Arabic-speaking regions of the world. Vendors have supplied their own, which have been adopted for use in the Common Desktop Environment.]
	ar_SA	Arabic

	ar_AA	Arabic
	ar_DZ	Arabic


Hebrew
	iw_IL	Hebrew, Israel
Greek
	el_GR	Greek, Greece
Turkish
	tr_TR	Turkish, Turkey
Asia
	ja_JP	Japanese, Japan
	ko_KR	Korean, Korea
	zh_CN	Chinese, China
	zh_TW	Chinese, Taiwan
Thai
	th_TH	Thai, Thailand

Locale and Character Set

A help volume's default language and character set can be defined as an entity in the helplang.ent file. To specify a complete locale name, combine the language and territory name with the character set name using this syntax:

language-and-territory-name.character-set-name

For a description of the helplang.ent file, see "helplang.ent File".

Examples

The following entity declaration specifies a complete locale name for the C standard language and the ISO-8859-1 character set:
```
<!ENTITY LanguageElementDefaultLocale   SDATA "C.ISO-8859-1">
```

The same information could also be entered using two entity declarations as follows:

<!ENTITY LanguageElementDefaultLocale       SDATA "C">
 <!ENTITY LanguageElementDefaultCharset      SDATA "ISO-8859-1">

To specify the German language using the same character set, use this declaration:
```
<!ENTITY LanguageElementDefaultLocale   SDATA "de_DE.ISO-8859-1">
```
Or, to specify the Japanese language using the EUC-JP character set, use this declaration:
```
<!ENTITY LanguageElementDefaultLocale   SDATA "ja_JP.EUC-JP">
```

If the locale is not specified in the helplang.ent file, then the value is derived from the value of the LANG environment variable.

HelpTag Software

When you process a help volume to create run-time help files, the HelpTag software must be told what language and character set you used to author your files. The language and character set information is used to determine the proper fonts for displaying help topics. If you do not specify a language and character set, HelpTag assumes the default, which is English and ISO-8859-1.

The language and character set can be defined in the helplang.ent file (see "helplang.ent File"). Or, the character set can be specified as an option on the command line when running dthelptag in a terminal window.

Note -

When writing HelpTag files, you may use multibyte characters for any help text. However, the HelpTag markup itself (tag names, entity names, IDs, and so on) must be entered using eight-bit characters.

DtHelp Message Catalog

The menus, buttons, and labels that appear in help dialogs should also be displayed in the user's native language. To enable this, Help dialogs read such strings from a message catalog named DtHelp.cat.

The message catalog source file, DtHelp.msg, contains strings for menus, buttons, and messages. If the language you need is not supplied, you must translate the sample message catalog (/usr/dt/dthelp/nls/C/DtHelp.msg) and then use the gencat command to create the run-time message catalog file. See "To Create a Message Catalog"for instructions.

Refer to your system documentation to determine the correct directory where your new message catalog should be installed.

LANG Environment Variable

The user's LANG environment variable is important for two reasons:

The value of LANG is used to locate the correct help volume.
When a help topic is displayed, the correct fonts and formatting rules are chosen based on the user's LANG variable. This is especially important for Asian languages that have word-wrap rules that are more sophisticated than European and American languages.

helplang.ent File

The helplang.ent file defines text entities used by the Helptag software to determine the default locale and character set for a help volume. See "Locale and Character Set"to learn how to specify a language and character set for your help volume.

The helplang.ent file also defines text entities for default strings such as Note, Caution, and Warning. If you want to override the English strings built into the HelpTag software, copy the file and localize the strings. The file is located in the directory /usr/dt/dthelp/dthelptag.

Here is an excerpt from the helplang.ent file:

<!ENTITY LanguageElementDefaultLocale          SDATA "C.ISO-8859-1">
 <!ENTITY NoteElementDefaultHeadingString       SDATA "NOTE">
 <!ENTITY CautionElementDefaultHeadingString    SDATA "CAUTION">
 <!ENTITY WarningElementDefaultHeadingString    SDATA "WARNING">
 <!ENTITY ChapterElementDefaultHeadingString    SDATA "Chapter">
 <!ENTITY FigureElementDefaultHeadingString     SDATA "Figure">
 <!ENTITY GlossaryElementDefaultHeadingString   SDATA "Glossary">
 .
 .
 .

Formatting Tables

A multibyte language, such as Japanese or Chinese, requires a formatting table. This table specifies a list of characters that cannot start a line and those characters that cannot end a line. When help files are processed, the formatting table ensures that lines wrap correctly. "Creating a Formatting Table"explains how to create a new table or edit the sample table provided in the Help Developer's Kit.

Font Schemes

One of the primary functions of the HelpTag software is to convert your marked-up files into a run-time format that the Help System understands. Text is formatted by specifying particular attributes such as type family, size, slant, and weight. A font scheme is simply a name, like an alias, that the Help System uses to assign fonts to HelpTag elements such as heads, procedures, lists, and so forth. It provides a way to map a group of text attributes used by the Help System with specific fonts.

Applications that use the standard Common Desktop Environment fonts do not need to define additional font resources. If your application relies on a different set of fonts, you must create and add a font scheme to your application.