Common Desktop Environment: Internationalization Programmer's Guide

Inputting Localized Text

The following discusses the Xlib and desktop mechanisms used for international text input. If you are using Motif Text[Field] widgets or you are using the XmIm APIs for text input, this section provides background information. However, it will not impact your application design or coding practice. If you are not interested in how character input is achieved from the keyboard with low-level Xlib calls, you can proceed to "Interclient Communications Conventions for Localized Text".

Xlib Input Method Overview

This section provides definitions for terms and concepts used for internationalized text input and a brief overview of the intended use of the mechanisms provided by Xlib.

A large number of languages in the world use alphabets consisting of a small set of symbols (letters) to form words. To enter text into a computer in an alphabetic language, a user usually has a keyboard on which there are key symbols corresponding to the alphabet. Sometimes, a few characters of an alphabetic language are missing on the keyboard. Many computer users who speak a Latin-alphabet-based language only have an English-based keyboard. They need to press a combination of keystrokes to enter a character that does not exist directly on the keyboard. A number of algorithms have been developed for entering such characters, known as European input methods, the compose input method, or the dead-keys input method.

Japanese is an example of a language with a phonetic symbol set, where each symbol represents a specific sound. There are two phonetic symbol sets in Japanese: Katakana and Hiragana. In general, Katakana is used for words that are of foreign origin, and Hiragana for writing native Japanese words. Collectively, the two systems are called Kana. Hiragana consists of 83 characters; Katakana, 86 characters.

Korean also has a phonetic symbol set, called Hangul. Each of the 24 basic phonetic symbols (14 consonants and 10 vowels) represent a specific sound. A syllable is composed of two or three parts: the initial consonants, the vowels, and the optional last consonants. With Hangul, syllables can be treated as the basic units on which text processing is done. For example, a delete operation may work on a phonetic symbol or a syllable. Korean code sets include several thousands of these syllables. A user types the phonetic symbols that make up the syllables of the words to be entered. The display may change as each phonetic symbol is entered. For example, when the second phonetic symbol of a syllable is entered, the first phonetic symbol may change its shape and size. Likewise, when the third phonetic symbol is entered, the first two phonetic symbols may change their shape and size.

Not all languages rely solely on alphabetic or phonetic systems. Some languages, including Japanese and Korean, employ an ideographic writing system. In an ideographic system, rather than taking a small set of symbols and combining them in different ways to create words, each word consists of one unique symbol (or, occasionally, several symbols). The number of symbols may be very large: approximately 50,000 have been identified in Hanzi, the Chinese ideographic system.

There are two major aspects of ideographic systems for their computer usage. First, the standard computer character sets in Japan, China, and Korea include roughly 8,000 characters, while sets in Taiwan have between 15,000 and 30,000 characters, which make it necessary to use more than one byte to represent a character. Second, it is obviously impractical to have a keyboard that includes all of a given language's ideographic symbols. Therefore a mechanism is required for entering characters so that a keyboard with a reasonable number of keys can be used. Those input methods are usually based on phonetics, but there are also methods based on the graphical properties of characters.

In Japan, both Kana and Kanji are used. In Korea, Hangul and sometimes Hanja are used. Now, consider entering ideographs in Japan, Korea, China, and Taiwan.

In Japan, either Kana or English characters are entered and a region is selected (sometimes automatically) for conversion to Kanji. Several Kanji characters can have the same phonetic representation. If that is the case, with the string entered, a menu of characters is presented and the user must choose the appropriate option. If no choice is necessary or a preference has been established, the input method does the substitution directly. When Latin characters are converted to Kana or Kanji, it is called a Romaji conversion.

In Korea, it is usually acceptable to keep Korean text in Hangul form, but some people may choose to write Hanja-originated words in Hanja rather than in Hangul. To change Hangul to Hanja, a region is selected for conversion and the user follows the same basic method as described for Japanese.

Probably because there are well-accepted phonetic writing systems for Japanese and Korean, computer input methods in these countries for entering ideographs are fairly standard. Keyboard keys have both English characters and phonetic symbols engraved on them, and the user can switch between the two sets.

The situation is different for Chinese. While there is a phonetic system called Pinyin promoted by authorities, there is no consensus for entering Chinese text. Some vendors use a phonetic decomposition (Pinyin or another), others use ideographic decomposition of Chinese words, with various implementations and keyboard layouts. There are about 16 known methods, none of which is a clear standard.

Also, there are actually two ideographic sets used: Traditional Chinese (the original written Chinese) and Simplified Chinese. Several years ago, the People's Republic of China launched a campaign to simplify some ideographic characters and eliminate redundancies altogether. Under the plan, characters would be streamlined every five years. Characters have been revised several times now, resulting in the smaller, simpler set that makes up Simplified Chinese.

Input Method Architecture

As shown in the previous section, there are many different input methods used today, each varying with language, culture, and history. A common feature of many input methods is that the user can type multiple keystrokes to compose a single character (or set of characters). The process of composing characters from keystrokes is called preediting. It may require complex algorithms and large dictionaries involving substantial computer resources.

Input methods may require one or more areas in which to show the feedback of the actual keystrokes, to show ambiguities to the user, to list dictionaries, and so on. The following are the input method areas of concern.

Status area: Intended to be a logical extension of the light-emitting diodes (LEDs) that exist on the physical keyboard. It is a window that is intended to present the internal state of the input method that is critical to the user. The status area may consist of text data and bitmaps or some combination.
Preedit area: Intended to display the intermediate text for those languages that are composing prior to the client handling the data.
Auxiliary area: Used for pop-up menus and customizing dialog boxes that may be required for an input method. There may be multiple auxiliary areas for any input method. Auxiliary areas are managed by the input method independent of the client. Auxiliary areas are assumed to be a separate dialog that is maintained by the input method.

There are various user interaction styles used for preediting. The following are the preediting styles supported by Xlib.

OnTheSpot: Data is displayed directly in the application window. Application data is moved to allow preedit data to be displayed at the point of insertion.
OverTheSpot: Data is displayed in a preedit window that is placed over the point of insertion.
OffTheSpot: Preedit window is displayed inside the application window but not at the point of insertion. Often, this type of window is placed at the bottom of the application window.
Root window: Preedit window is the child of RootWindow.

It would require a lot of computing resources if portable applications had to include input methods for all the languages in the world. To avoid this, a goal of the Xlib design is to allow an application to communicate with an input method placed in a separate process. Such a process is called an input server. The server to which the application should connect is dependent on the environment when the application is started up: what the user language is and the actual encoding to be used for it. The input method connection is said to be locale-dependent. It is also user-dependent; for a given language, the user can choose, to some extent, the user-interface style of input method (if there are several choices).

Using an input server implies communications overhead, but applications can be migrated without relinking. Input methods can be implemented either as a token communicating to an input server or as a local library.

The abstraction used by a client to communicate with an input method is an opaque data structure represented by the XIM data type. This data structure is returned by the XOpenIM() function, which opens an input method on a given display. Subsequent operations on this data structure encapsulate all communication between client and input method. There is no need for an X client to use any networking library or natural language package to use an input method.

A single input server can be used for one or more languages, supporting one or more encoding schemes. But the strings returned from an input method are always encoded in the (single) locale associated with the XIM object.

Input Contexts

Xlib provides the ability to manage a multithreaded state for text input. A client may be using multiple windows, each window with multiple text entry areas, with the user possibly switching among them at any time. The abstraction for representing the state of a particular input thread is called an input context. The Xlib representation of an input context is an XIC. See Figure 5-1 for an illustration.

Figure 5-1 Input method and input contexts

An input context is the abstraction retaining the state, properties, and semantics of communication between a client and an input method. An input context is a combination of an input method, a locale specifying the encoding of the character strings to be returned, a client window, internal state information, and various layout or appearance characteristics. The input context concept somewhat matches for input the graphics context abstraction defined for graphics output.

One input context belongs to exactly one input method. Different input contexts can be associated with the same input method, possibly with the same client window. An XIC is created with the XCreateIC() function, providing an XIM argument, affiliating the input context to the input method for its lifetime. When an input method is closed with the XCloseIM() function, no affiliated input contexts should be used again (and should preferably be deleted before closing the input method).

Considering the example of a client window with multiple text entry areas, the application programmer can choose to implement the following:

As many input contexts are created as text-entry areas. The client can get the input accumulated on each context each time it looks up that context.
A single context is created for a top-level window in the application. If such a window contains several text-entry areas, each time the user moves to another text-entry area, the client has to indicate changes in the context.

Application designers can choose a range of single or multiple input contexts, according to the needs of their applications.

Keyboard Input

To obtain characters from an input method, a client must call the XmbLookupString() function or XwcLookupString() function with an input context created from that input method. Both a locale and display are bound to an input method when they are opened, and an input context inherits this locale and display. Any strings returned by the XmbLookupString() or XwcLookupString() function are encoded in that locale.

Xlib Focus Management

For each text-entry area in which the XmbLookupString() or XwcLookupString() function is used, there is an associated input context.

When the application focus moves to a text-entry area, the application must set the input context focus to the input context associated with that area. The input context focus is set by calling the XSetICFocus() function with the appropriate input context.

Also, when the application focus moves out of a text-entry area, the application should unset the focus for the associated input context by calling the XUnsetICFocus() function. As an optimization, if the XSetICFocus() function is called successively on two different input contexts, setting the focus on the second automatically unsets the focus on the first.

Note -

To set and unset the input context focus correctly, it is necessary to track application-level focus changes. Such focus changes do not necessarily correspond to X server focus changes.

If a single input context is used to do input for multiple text-entry areas, it is also necessary to set the focus window of the input context whenever the focus window changes.

Xlib Geometry Management

In most input method architectures (OnTheSpot being the notable exception), the input method performs the display of its own data. To provide better visual locality, it is often desirable to have the input method areas embedded within a client. To do this, the client may need to allocate space for an input method. Xlib provides support that allows the client to provide the size and position of input method areas. The input method areas that are supported for geometry management are the status area and the preedit area.

The fundamental concept on which geometry management for input method windows is based is the proper division of responsibilities between the client (or toolkit) and the input method. The division of responsibilities is the following:

The client is responsible for the geometry of the input method window.
The input method is responsible for the contents of the input method window. It is also responsible for creating the input method window per the geometry constraints given to it by the client.

An input method can suggest a size to the client, but it cannot suggest a placement. The input method can only suggest a size: it does not determine the size, and it must accept the size it is given.

Before a client provides geometry management for an input method, it must determine if geometry management is needed. The input method indicates the need for geometry management by setting the XIMPreeditArea() or XIMStatusArea() function in its XIMStyles value returned by the XGetIMValues() function. When a client decides to provide geometry management for an input method, it indicates that decision by setting the XNInputStyle value in the XIC.

After a client has established with the input method that it will do geometry management, the client must negotiate the geometry with the input method. The geometry is negotiated by the following steps:

The client suggests an area to the input method by setting the XNAreaNeeded value for that area. If the client has no constraints for the input method, it either does not suggest an area or sets the width and height to 0 (zero). Otherwise, it sets one of the values.
The client gets the XIC XNAreaNeeded value. The input method returns its suggested size in this value. The input method should pay attention to any constraints suggested by the client.
The client sets the XIC XNArea value to inform the input method of the geometry of the input method's window. The client should try to honor the geometry requested by the input method. The input method must accept this geometry.

Clients performing geometry management must be aware that setting other IC values may affect the geometry desired by an input method. For example, the XNFontSet and XNLineSpacing values may change the geometry desired by the input method. It is the responsibility of the client to renegotiate the geometry of the input method window when it is needed.

In addition, a geometry management callback is provided by which an input method can initiate a geometry change.

Event Filtering

A filtering mechanism is provided to allow input methods to capture X events transparently to clients. It is expected that toolkits (or clients) using the XmbLookupString() or XwcLookupString() function call this filter at some point in the event processing mechanism to make sure that events needed by an input method can be filtered by that input method. If there is no filter, a client can receive and discard events that are necessary for the proper functioning of an input method. The following provides a few examples of such events:

Expose events that are on a preedit window in local mode.
Events can be used by an input method to communicate with an input server. Such input server protocol-related events have to be intercepted if the user does not want to disturb client code.
Key events can be sent to a filter before they are bound to translations such as Xt provides.

Clients are expected to get the XIC XNFilterEvents value and add to the event mask for the client window with that event mask. This mask can be 0.

Callbacks

When an OnTheSpot input method is implemented, only the client can insert or delete preedit data in place and possibly scroll existing text. This means the echo of the keystrokes has to be achieved by the client itself, tightly coupled with the input method logic.

When a keystroke is entered, the client calls the XmbLookupString() or XwcLookupString() function. At this point, in the OnTheSpot case, the echo of the keystroke in the preedit has not yet been done. Before returning to the client logic that handles the input characters, the lookup function must call the echoing logic for inserting the new keystroke. If the keystrokes entered so far make up a character, the keystrokes entered need to be deleted, and the composed character is returned. The result is that, while being called by client code, input method logic has to call back to the client before it returns. The client code, that is, a callback routine, is called from the input method logic.

There are a number of cases where the input method logic has to call back the client. Each of those cases is associated with a well-defined callback action. It is possible for the client to specify, for each input context, which callback is to be called for each action.

There are also callbacks provided for feedback of status information and a callback to initiate a geometry request for an input method.

X Server Keyboard Protocol

This section discusses the server and keyboard groups.

A keysym is the encoding of a symbol on a keycap. The goal of the server's keysym mapping is to reflect the actual key caps on the physical keyboards. The user can redefine the keyboard by running the xmodmap command with the new mapping desired.

X Version 11 Release 4 (X11R4) allows for definition of a bilingual keyboard at the server. The following describes this capability.

A list of keysyms is associated with each key code. The following list discusses the set of symbols on the corresponding key:

If the list (ignoring trailing NoSymbol entries) is a single keysym K, the list is treated as if it were the list K NoSymbol K NoSymbol.
If the list (ignoring trailing NoSymbol entries) is a pair of keysyms K1 K2, the list is treated as if it were the list K1 K2 K1 K2.
If the list (ignoring trailing NoSymbol entries) is three keysyms K1 K2 K3, the list is treated as if it were the list K1 K2 K3 NoSymbol.

When an explicit void element is desired in the list, the VoidSymbol value can be used.

The first four elements of the list are split into two groups of keysyms. Group 1 contains the first and second keysyms; Group 2 contains the third and fourth keysyms. Within each group, if the second element of the group is NoSymbol, the group is treated as if the second element were the same as the first element, except when the first element is an alphabetic keysym K for which both lowercase and uppercase forms are defined. In that case, the group is treated as if the first element is the lowercase form of K and the second element is the uppercase form of K.

The standard rules for obtaining a keysym from an event make use of the Group 1 and Group 2 keysyms only; no interpretation of other keysyms in the list is given here. The modifier state determines which group to use. Switching between groups is controlled by the keysym named MODE SWITCH by attaching that keysym to some key code and attaching that key code to any one of the modifiers Mod1 through Mod5. This modifier is called the group modifier. For any key code, Group 1 is used when the group modifier is off, and Group 2 is used when the group modifier is on.

Within a group, the keysym to use is also determined by the modifier state. The first keysym is used when the Shift and Lock modifiers are off. The second keysym is used when the Shift modifier is on, when the Lock modifier is on, and when the second keysym is uppercase alphabetic, or when the Lock modifier is on and is interpreted as ShiftLock. Otherwise, when the Lock modifier is on and is interpreted as CapsLock, the state of the Shift modifier is applied first to select a keysym; if that keysym is lowercase alphabetic, the corresponding uppercase keysym is used instead.

No spatial geometry of the symbols on the key is defined by their order in the keysym list, although a geometry might be defined on a vendor-specific basis. The server does not use the mapping between key codes and keysyms. Rather, it stores it merely for reading and writing by clients.

The KeyMask modifier named Lock is intended to be mapped to either a CapsLock or a ShiftLock key, but which one it is mapped to is left as an application-specific decision, user-specific decision, or both. However, it is suggested that users determine mapping according to the associated keysyms of the corresponding key code.