Unicode Support in the Solaris Operating Environment

Chapter 4 Technical Considerations

4.1 Internationalized Applications with Unicode

The Unicode codeset enables developers to write applications that support multiple scripts simultaneously. The base language script and one or more additional scripts, depending on the Unicode locale, can be input, displayed, and printed. Distributed applications within network environments can also provide individual users access to different language environments simultaneously.

By itself, an application using Unicode is not fully internationalized. For example, if an application customizes data handling for Unicode directly, it needs to provide codeset converters as wrappers to support a codeset other than Unicode. This approach is direct Unicode localization--not internationalization. With direct localization, developers may localize an application that duplicates or conflicts with the localization provided by the operating system. In addition, an application may assume that all characters are represented in two-octet cells, which conflicts with UTF-8.

To properly internationalize an application, use the following guidelines:

Avoid direct access with Unicode. (This is a task of the platform's internationalization framework.)
Use the POSIX model for multibyte and wide-character interfaces. See Section 4.2, Unicode Application Interfaces.
Only call APIs that the internationalization framework provides for language and cultural-specific operations. All POSIX, X11, Motif, and CDE interfaces are available to Unicode locales.
Remain codeset independent.

4.2 Unicode Application Interfaces

When internationalizing applications for Unicode, developers should use the POSIX or X Window model. These models define two sets of interfaces--multibyte and wide character--without specifying the encoding methods.

Standard multibyte codesets contain characters of varying widths; from one to several bytes. Characters are represented in minimal storage space, with the fewest number of bytes possible. Because multibyte codesets contain characters of varying widths, they are not conveniently processed by standard functions.

The Unicode codeset provides the necessary format for both multibyte and wide-character representation. In the Solaris operating environment Unicode locales, multibyte interfaces use UTF-8 character set representation and wide-character interfaces use UCS-4 representation.

4.3 Font Resources

Properly internationalized applications require only a few changes to run properly in the Solaris operating environment Unicode locales. One required change is to set the proper resource definitions for font sets (FontSet) or font list (XmFontList) in the application's resource file.

The en_US.UTF-8 locale supports the following set of font character sets as the FontSet:

ISO 8859-1 (Latin-1)
ISO 8859-2 (Latin-2)
ISO 8859-4 (Latin-4)
ISO 8859-5 (Latin/Cyrillic)
ISO 8859-7 (Latin/Greek)
ISO 8859-9 (Latin-5)
ISO 8859-15 (Latin-9)
ISO 8859-6 based one (Arabic)
ISO 8859-8 (Hebrew)
TIS 620-2533 based one (Thai)
BIG5 (Traditional Chinese)
GB 2312-1980 (Simplified Chinese)
JIS X0201-1976, JIS X0208-1983 (Japanese)
KS C 5601-1992 Annex 3 (Korean)

4.4 Setting Resource Definitions

To create a font set for an application, the resource definition should contain the complete set of fonts supported by the Unicode locale. For example:

fs = XCreateFontSet(display,
"-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-1,
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-2,
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-4,
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-5,
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-6,
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-7,
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-8,
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-9,
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-15,
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-big5-1,
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-gb2312.1980-0,
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-jisx0201.1976-0,
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-jisx0208.1983-0,
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-ksc5601.1992-3,
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-tis620.2533-0",
 -dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-unicode-fontspecific",
  &missing_ptr, &missing_count, &def_string);

Or, more simply:

fs = XCreateFontSet(display, "-dt-interface system-medium-r-normal-s*utf*",
&missing_ptr, &missing_count, &def_string);

The XmFontList resource definition of an application should also include all fonts for every character set supported by the locale. For example:

!
! This is an example XmNFontList definition for en_US.UTF-8 locale:
*fontList:\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-1;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-2;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-4;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-5;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-6;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-7;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-8;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-9;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-iso8859-15;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-big5-1;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-gb2312.1980-0;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-jisx0201.1976-0;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-jisx0208.1983-0;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-ksc5601.1992-3;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-tis620.2533-0;\
-dt-interface system-medium-r-normal-s*utf*-*-*-*-*-*-*-*-unicode-fontspecifc:

Or, more simply:

!
! This is an example XmNFontList definition for en_US.UTF-8 locale:
*fontList: -dt-interface system-medium-r-normal-s*utf*: