Asian-Language Support in the Solaris Operating Environment

Chapter 2 Internationalized Software for the Solaris Operating Environment

2.1 Solaris Language-Support Framework

In an internationalized application, language-specific features and cultural data are separated from application code. The Solaris internationalization framework divides code and language and cultural data into the following three areas:

Locale
Interface localization
Codeset independence

A locale is a set of language and cultural variables, particular to a global region. The locale is selected by the user and loaded in memory at run time. The selected locale applies to the operating system and subsequent application launches.

Interface localization is the process of translating the interface language into another language by storing text strings and messages in a separate message file. Messages are more easily composed, translated, and referenced in a separate file than in hard-coded statements throughout the application. Furthermore, recompilation of the source binary is unnecessary.

Codeset independence does not assume a particular codeset to display and manipulate data.

2.2 Locale

The Solaris operating environment provides a number of locales. Each locale includes:

Associated codeset and codeset conversion modules
Numeric, time, and date formats
Collation (sort order)
Monetary format
Interface information (messages and icons)
Input method(s)
Fonts

Developers access locale settings directly through Solaris operating environment APIs. For example, instead of encoding a particular currency symbol, an application calls the appropriate system API, which returns the currency symbol of the set locale.

2.3 Interface Localization

The Solaris operating environment supports several messaging schemes for localizing the interface, including the Sun proprietary API gettext() and the XPG catgets(). These APIs directly reference the message file.

Note that the size and position of interface elements (icons, graphics, and functions or private data affecting text elements) may be different in different languages. For example, Japanese messages are usually longer than English messages and Japanese ideographs are taller and wider than English characters. Text widget positioning should be relative, not absolute.

Icons and graphics should also be culturally neutral or be easily changeable to local tastes. Essentially, what a user sees and what affects text should be changed only in the message catalog, resources, or some other means.

2.4 Codeset Independence

The Solaris operating environment architecture supports codeset independence (CSI), expanding the number of supported codesets from Extended UNIX\256 Codeset (EUC) to both EUC and non-EUC encodings, including PC-Kanji (also known as ShiftJIS) and GBK.

Note that text-handling routines should not define the size of the character codeset. Nor should other locale-specific components, such as the window system, input method, and online help, depend on a particular codeset. Figure 2-1 shows the locale-specific components which should be codeset independent.

Figure 2-1 Design model for international software

Support for Unicode, a universal codeset encompassing most written characters, is often confused with codeset independence. Unicode is often referred to as ISO 10646 and is an International Standards Organization (ISO) standard. Note that codeset independence must also apply to Unicode. Although Unicode supports many languages and writing systems, to an application Unicode is just another codeset. The Solaris operating environment supports the Unicode UTF-8 (File System Safe UCS Transformation Format) format, which is compatible with ISO 10646. For more information, see Unicode Support in the Solaris Operating Environment.

Note -

Codeset independence is often assumed because the idea of a character (in ISO C terms) and char (or byte) is thought of as a one-to-one relationship in programming languages. In written languages, however, the idea of a character can encompass one char/byte or multiple bytes. An alphabetic character from most European languages can be represented in one byte. An Asian-language character often requires more than one byte because there are more characters in the charset than one byte can represent.

Furthermore, applications often assume the representation of a given character. For example, a codeset independent application does not assume that `a' = \x61 or char = byte. Instead, during text-manipulation routines, such as truncating a stream of characters, the APIs determine the size of the number of bytes by the character and its definition or type. By not assuming the size of a character or the codeset, the application will be codeset independent.

Solaris maintains a codeset independence framework. Applications can use Solaris APIs to determine the size of the number of bytes used by the character and its definition or type. By not making assumptions about the underlying codeset, an application is codeset independent in Solaris.