This part includes end-user information.
This chapter describes the design of the Simplified Chinese Solaris software. These sections also provide information on the locales supported in the Simplified Chinese Solaris environment. Topics in this chapter include:
The Simplified Chinese localization of the internationalized release of CDE provides two work environments: a user environment and a developer environment. Each environment is localized to handle the linguistic and cultural conventions that are unique to the Simplified Chinese language.
The user environment has desktop tools and a window manager (dtwm) that are customized to communicate in the language of a particular locale.
The development environment provides internationalized versions of Xlib and Motif that programmers use to develop localized applications. For further information, see the International Language Environments Guide.
You can set any of the following locales when you login to your Simplified Chinese desktop:
C – ASCII English environment.
zh – Simplified Chinese environment in extended UNIX code (EUC).
zh.GBK – Simplified Chinese environment in GBK, an extension of GB2312-80. This standard is Guo Biao Kuo in Chinese PinYin, which supports all CJK characters that are in Unicode 2.0.
zh_CN.GB18030 – Simplified Chinese environment in GB18030-2000. The new GB 18030-2000 standard, which will obsolete the GBK, supports CJK Unified Ideographs Extension A and Yi, Mongolian, Tibetan and Uigur minority scripts in Unicode 3.0.
zh_CN.EUC – Symbolic link to zh locale.
zh_CN.GBK – Symbolic link to zh.GBK locale.
zh.UTF-8 – Simplified Chinese environment in Unicode 3.2.
zh_CN.UTF-8 – Symbolic link to zh.UTF-8.
You can set your default locale or change it with the following procedure.
Choose Language from the options menu on the login screen.
Select the C, zh, zh.UTF-8, zh_CN.GB18030, or zh.GBK locale.
The new locale is set for your CDE session.
The following sections provide information on the Simplified Chinese localization of the Simplified Chinese Common Desktop Environment (CDE) for windowed applications.
The Simplified Chinese CDE Motif graphical user interface is similar in layout and design to the U.S. release of CDE. Simplified Chinese CDE supports multibyte characters and Simplified Chinese messages with Motif objects. Differences in character width and proportional spacing cause minor differences in the exact layout of some Motif objects.
All application windows that can take Simplified Chinese input include a status area associated with their input window to show the current conversion mode. With an input conversion mode on, as Simplified Chinese is being typed its entry point becomes a highlighted (reverse video) preedit area until the input is converted to Simplified Chinese or special characters and committed. Some input modes also provide conversion choices among several Simplified Chinese characters on menus.
To accommodate the diversity of Simplified Chinese, the Solaris software provides several different input methods for entering Simplified Chinese characters. With these methods you can enter ASCII/English characters, Simplified Chinese radicals, and Simplified Chinese characters using an ASCII keyboard or a Simplified Chinese keyboard.
Your Simplified Chinese input is stored temporarily as an intermediate representation. You then use the conversion manager program to transform the representation into a displayed character string.
The following desktop tools are available in this release. All of the tools can handle Simplified Chinese input and output.
Address Manager – Carries out remote operations and finds information about the systems and users on your network. This application can speed up such tasks as sending email, logging in remotely, and setting appointments on someone else's calendar.
Application Manager – Manages the tools and other software applications available on your system.
Audio Tool – Records, plays, and saves audio files in AU, AIFF, and WAV format.
Calculator – Mimics the function of a hand-held calculator.
Calendar Manager – Manages appointments and To Do lists. You can use this application to set and distribute appointment reminders.
Clock – Displays the current time in analog or digital format. You can control the display of local time with this application.
Console – Starts a dtterm terminal emulator as your workspace console window.
File Manager – Displays the files and folders on your system. You can move, copy, open, and delete files and folders with this application. You can also use the application to view the contents of your floppy diskettes and CD-ROMs.
Find Files – Enables you to search your system for files or folders that match specific search criteria. Your criteria might include, for example, the name, location, or the size of a file or folder you want to find.
Front Panel – Controls for access to applications and utilities and utilities on the system. The Front Panel is a centrally-located window that occupies all workspaces.
Help – Displays searchable help information for CDE.
Icon Editor – Enables you to create new icons or modify existing icons.
Image Viewer – Enables you to view, print, and save the contents of file types such as GIF, TIFF, JPEG, and PostScript. You can use the Snapshot function of this application to capture a picture of a window or another part of your screen. The picture is saved as a raster file in bitmap format.
Mailer – Handles the distribution, receipt of your electronic mail messages.
Performance Meter – Monitors various aspects of system performance.
Print Manager – Enables you to submit, view, and cancel print jobs. This application is the graphical front-end to the print command. It supports drag-and-drop file transfer operations.
Process Manager – Display the processes that are currently running on your workstation. The application enables to perform actions on the active processes.
Text Editor – Enables you to create and edit text files. The application is used in CDE tools such as the Mailer composition window.
Style Manager – Enables you to customize some of the visual elements and system device behaviors of your workspace environment, such as: colors and fonts, keyboard, mouse, window, and session start-up behaviors.
Terminal – Acts as an ASCII character terminal that you can use to enter UNIX® commands at a system shell prompt.
This chapter provides procedures and other information that you can use to enter Simplified Chinese text. The chapter discusses the following topics:
This chapter describes the Simplified Chinese Solaris input modes that you can use to enter the following kinds of characters.
ASCII/English characters
Simplified Chinese characters
Special symbols
You can type all of these characters in the input areas of the following application windows:
In terminal emulation (TTY) windows, such as Terminal windows
In text entry areas, such as those found in the Text Editor and Mailer applications
In dialog boxes, such as the new folder name box in the File Manager application
In other special use subwindows, such as pop-ups
In the Simplified Chinese Operating System, application subwindows contain two areas that are used to enter Simplified Chinese characters.
In the Simplified Chinese Operating System, application subwindows contain two areas that are used to enter Simplified Chinese characters. A lookup choice window and an auxiliary window are also available in Chinese input mode.
Preedit area – The text entry area that holds your character formations before you commit them.
When you commit characters, the characters are put in the text block that is assembled for the application.
Status area – The area at the lower left of the application subwindow that displays the current conversion mode and the active keyboard. Later sections in this chapter discuss keyboard switching and using the available conversion modes.
Lookup choice window – A popup window that displays the conversion candidates that are available for the characters or the radicals in the preedit area.
Auxiliary window – This window contains a palette of icons that provide you the following functions and utilities to simplify text entry and to manage input methods:
Input method switching
Chinese full-width/half-width character mode switching
Chinese/English punctuation mode switching
Input method properties setting
Input method selection
Lookup tables for GB2312, GBK, GB18030-2000, and Unicode character sets
Virtual keyboard
The input method auxiliary windows supports all UTF-8 locales and the following Simplified Chinese locales:
zh/zh_CN.EUC
zh.GBK/zh_CN.GBK
zh.UTF-8/zh_CN.UTF-8
Two kinds of input methods are supported:
Methods based on a code table such as Wubi
Methods developed by a vendor, such as NewPinYin or NeiMa.
The following figure shows the interface model for auxiliary window support.
This section provides procedures that you can use to select and switch between different input methods.
In the typing area, press Control-spacebar to turn on Simplified Chinese input conversion.
An auxiliary window appears.
Select the desired input method through one of the following actions:
In the status area of the application subwindow, use the function keys to switch input methods: the F2 key for the first input method, the F3 key for the second input method, and so on.
You can also press Control-Escape repeatedly until you reach the desired input method.
Use the input method panel.
Click the utilities button in the auxiliary window.
The utilities menu appears.
Click the input method selection item from the utilities menu.
The input method selection panel appears.
Select the input method you want to use from the input selection panel.
After you select an input method, click OK or Apply to activate the setting. The first input method you select is the default input method.
When you press Control-spacebar the default input method is selected.
If you change input methods, you can press Control-Escape to return to the default input method.
Help pages display in the default browser, such as Netscape NavigatorTM.
Switch between half_width character mode and full_width character mode through one of the following actions.
In the status area of an application subwindow, type Shift-spacebar to switch between half_width character mode and full_width character mode.
In the auxiliary window, click the half_width/full_width button.
The input method system is in full_width character mode when this button appears in the auxiliary window:
The input method system is in half_width character mode when this button appears in the auxiliary window:
When the system is in full_width mode, the full_width character of the input key is committed. For example, when you input an a in full_width mode, the full_width a is committed.
Switch between Chinese punctuation mode and English punctuation mode through one of the following actions.
In the status area of an application subwindow, type Control- to switch between Chinese punctuation mode and English punctuation mode.
In the auxiliary window, click the Chinese/English punctuation button.
The following icon indicates the input method system is in Chinese Punctuation Mode:
The following icon indicates the input method system is in English Punctuation Mode.
When you select the punctuation key in Chinese Punctuation mode, the corresponding Chinese punctuation character is commited to the application. For example, when you are in Chinese Punctuation mode and the $ symbol is selected, the Chinese currency symbol character is committed to the application.
The punctuation keys include: , . / <> :;’”\$!^&_-
The correspondence between English keys and Chinese punctuation is mapped in the following figure.
Four code table input options are available for the input method you select.
Display candidates key by key – This option causes the input method to search a dictionary table when you press a valid key. Candidates for selection then display in the lookup window.
If this option is not active, the character mapped to the key you press appears in the preedit area. When you press the spacebar, the input method engine searches the dictionary table and displays the available candidates for the character in the preedit area.
Display external codes – This option displays the external codes of the candidates you enter display in a lookup window.
Automatically commit if only one candidate – This option commits the external code of a character when only one candidate is available. If this option is not selected, the external code of a character appears in a lookup window even when only one candidate is available.
Display keymap character for every external code – This option displays the character mapped to a valid key in the preedit area when you press the key.
Click the input method selection item from the utilities menu.
The input method selection panel appears.
Select an input method from the selection panel.
The input method options panel appears.
Select an input method option.
Click OK or Apply to activate the selection.
After you make your input method selections, you can use the information and the procedures in this section to take the following actions:
Activate lookup table selection
Select a virtual keyboard
Create user defined characters
From a lookup table, you can search for and select the Chinese characters you want to input. Three kinds of lookup tables available:
Lookup tables with native encoding. A lookup table with EUC_CN encoding is provided in the zh_CN.EUC/zh_CN/zh locale. A lookup table with GBK encoding is provided in the zh_CN.GBK/zh.GBK locale, and a lookup table with GB18030 encoding is provided in the zh_CN.GB18030 locale.
Lookup table for special characters, such as Greek characters and Mathematic symbols.
Click the lookup item from the selection menu.
When you activate the lookup option, the characters that are available for a string you type in the preedit area display for selection in a lookup choice window.
Type a string you want to convert in the preedit area.
The lookup choice window appears.
You can use the following keys to search through the characters and radicals that are available for your string.
Moves forward to the next page of choices
Moves backward through the choices
Type the number or letter of the label of the lookup choice you want to select.
Your choice is substituted for the string in the preedit area.
You can use virtual keyboards as lookup utilities to simplify the input of certain special symbols.
The Simplified Chinese system supports several virtual keyboards.
The following figure shows the PC virtual keyboard.
The following figure shows the Greek virtual keyboard.
The following figure shows the Russian virtual keyboard.
The following figure shows the ZhuYin virtual keyboard.
The following figure shows the Chinese Punctuation Characters virtual keyboard.
The following figure shows the Number Symbol Lookup virtual keyboard.
The following figure shows the Mathematic Symbol Lookup virtual keyboard.
The following figure shows the Special Symbol Lookup virtual keyboard.
The following figure shows the Table Symbol Lookup virtual keyboard.
Click the virtual keyboard button in the auxiliary window.
The virtual keyboard for the active input method appears.
The user-defined character (UDC) editor tool enables you to draw and save new characters. Once you ascribe a character to an input method, the character can be displayed in an application.
Select the user defined character item on the utility menu to activate the UDC tool.
See Chapter 9, Fonts for more information about user defined characters.
This section describes the input methods and conversion modes that are available for entering ASCII/English, Simplified Chinese, and other characters.
In the zh/zh_CN/zh_CN.EUC locales, you can use the following function keys to access the available input methods:
NewQuanPin, the default input method (F2)
NewShuangPin (F3)
GB2312 (F4)
QuanPin (F5)
ShuangPin (F6)
English_Chinese (F7)
WangMa Wubi (F8)
In the zh.GBK/zh_CN.GBK locales, you can use the following function keys to access the available input methods.
NewShuangPin (F3)
GBK NeiMa (F4)
QuanPin (F5)
ShuangPin (F6)
English_Chinese (F7)
WangMa Wubi (F8)
In the zh_CN.GB18030/zh.UTF-8/zh_CN.UTF-8 locales, you can use the following function keys to access the available input methods.
NewQuanPin, the default input method (F2)
NewShuangPin (F3)
GB18030 NeiMa (F4)
QuanPin (F5)
ShuangPin (F6)
English_Chinese (F7)
WangMa Wubi (F8)
Applications start in ASCII mode and the status area of the application subwindow is blank. You can toggle ASCII mode on or off by pressing Control-spacebar or the Chinese/English key on a Chinese keyboard. When you turn off ASCII input mode, the indicator of the default input mode appears.
This section describes the features in the New QuanPin and New ShuangPin input methods, and how to use some of the features in the zh_CN.EUC and zh_CN.GBK locales.
PinYin is a popular input method in PRC. Various PinYin-based input methods exist. Two of these input methods, New QuanPin and New ShuangPin, provide support for the following features:
Storing and recalling user-defined phrases
Dynamically adjusting the frequency of lookup choices
Typing PinYin strings up to 222 characters with the New QuanPin input method
Typing ShengMu characters
Entering GBK Hanzi phrases
These features are described in detail in the following sections.
The following describes how to define the phrase ke lin dun and store it for later use.
Select the input method.
Follow the steps in the procedure How to Select an Input Method to select the input method.
Type the phrase kelindun without spaces in the typing area.
The New QuanPin and New ShuangPin input methods insert spaces for you automatically.
Type the number that corresponds to the candidate you want to select.
Select the characters of the second and third parts of the phrase.
The new phrase is defined and added to the user dictionary file. The next time you type ke lin dun, you will see the phrase you defined.
In the New QuanPin and the New ShuangPin input methods, the candidates that you select are moved to the start of the list to facilitate repeated use.
Select the input method.
Follow the steps in the procedure How to Select an Input Method to select the input method.
Type sh yi.
Notice the order of the five available candidates.
Select the fifth candidate.
Type sh yi again.
Notice that the fifth candidate has moved to the first position because you previously selected it. Frequently used candidates are promoted for faster selection.
The NewQuanPin and New ShuangPin input methods provide support for other useful functionality.
The New QuanPin input method accepts PinYin strings of up to 222 characters long. The following string is used in the next figure.
>>meiguozhongtongkelindunzhengzaitaolunhaiwanjushiwenti<< |
The result is the following Chinese string:
The New ShuangPin input method supports input strings of up to 30 characters.
You can also type ShengMu only. Candidates are supplied for ShengMu, as shown in the following figure.
The zh_CN.GBK locale supports GBK by default, as shown in the following illustration:
The second Chinese character in the following figure is defined only in the GBK standard.
Single GBK candidates are placed at the end of the list of candidates. Press Return to scroll to the GBK area.
For faster selection next time, you can define the GBK candidate as a phrase. For more information, see How to Define Phrases for Later Use.
Both New QuanPin and New ShuangPin support GBK Hanzi by default in the zh.GBK locale. However, because several Hanzi have the same ShengMu (the first part of PinYin), New QuanPin and New ShuangPin do not display GBK candidates if you provide only the ShengMu.
For example, typing the string rong will display GBK candidates because the string is a complete PinYin string. However, typing r alone will not display any GBK candidates because the string is only a ShengMu string.
This section describes the keyboard definitions that are used for the New QuanPin and New ShuangPin input methods.
The following table shows the definitions of the edit keys.
The preedit line is a normal X text field.
Key |
Definition |
---|---|
[a-z] |
PinYin character. |
Home |
Moves to the start of the preedit line. |
End |
Moves to the end of the preedit line. |
Left |
Moves the caret in the preedit line to the left. If the left character is Hanzi, the original PinYin is displayed. |
Right |
Moves the caret in the preedit line to the right. |
Delete |
Deletes the PinYin character following the caret on the preedit line. |
Backspace |
Deletes the PinYin character preceding the caret on the preedit line. |
The candidates of a PinYin string belong to the following groups:
G1 – Highest frequency Hanzi + Long (3 or more) Cizu + Double Chinese Cizu
G2 – GB Single Hanzi
G3 – GBK Single Hanzi (in the zh_CN.GBK locale)
Some PinYin strings might have more candidates than can be displayed in the same window. In that case, use the keys described in the following table to scroll through the available candidates.
Table 4–2 Page Scroll Key Definitions
Key |
Definition |
---|---|
- = |
Scrolls to previous/next candidate |
[ ] |
Scrolls to previous/next candidate |
, . |
Scrolls to previous/next candidate |
Return |
Quickly scrolls through all candidates |
New QuanPin and New ShuangPin use the numeric selection keys.
In accordance with the national PinYin standard, the separator (') is supported to avoid ambiguous interpretations of PinYin strings. For example, the PinYin string [jiang] can be interpreted as [jiang] or [ji][ang]. Both spellings are valid. In New QuanPin, however, [jiang] is interpreted only as [jiang]. You must use the separator and enter [ji'ang] for the string to be interpreted as [ji] and [ang]. New ShuangPin does not require the use of separators.
New QuanPin and New ShuangPin share two dictionary files: PyCiku.dat and Ud.Ciku.dat. In the zh_CN.EUC and zh_CN.GBK locale, the default path names are /usr/lib/im/locale/zh_CN/data/PyCiku.dat and /usr/lib/im/locale/zh_CN/data/UdCiku.dat.
Users cannot normally write to these files. However, because users can affect the way New QuanPin and New ShuangPin work through features such as frequency adjustment and user-defined phrases, you should update the dictionary files frequently.
A user's dictionary is normally located in ~/.Xlocale/PyCiku.dat or ~/.Xlocale/UdCiku.dat. The tilde (~) indicates the home directory of the user who starts the htt command. When you start New QuanPin and New ShuangPin input methods, the system locates and reads the dictionary files in the user's home directory. If a dictionary file is not found, the following system default path is used:
/usr/lib/im/locale/zh_CN/…
ShuangPin is an abbreviated form of QuanPin. ShuangPinis faster but more difficult to use than QuanPin. New ShuangPin supports all of the features, keyboard definitions, and dictionary files of New QuanPin.
Various ShuangPin keyboard mapping designs exist in PRC. The most popular three designs are ZiRanMa, Chinese Star, and Intelligent_ABC. The New ShuangPin input method supports all three of these keyboard mappings.
The following tables contain keyboard mappings for the ZiRanMa, Chinese Star, and Intelligent_ABC keyboards.
Table 4–3 ZiRanMa Keyboard Mapping
Key |
Definition |
---|---|
i |
ch |
u |
sh |
v |
zh |
a |
a |
b |
ou |
c |
iao |
d |
uang, iang |
e |
e |
f |
en |
g |
eng |
h |
ang |
i |
i |
j |
an |
k |
ao |
l |
ai |
m |
ian |
n |
in |
o |
o, uo |
p |
un |
q |
iu |
r |
uan, er |
s |
iong, ong |
t |
ue |
u |
u |
v |
v, ui |
w |
ua, ia |
x |
ie |
y |
uai, ing |
z |
ei |
Table 4–4 CStar2.97 Keyboard Mapping
Key |
Definition |
---|---|
u |
ch |
i |
sh |
v |
zh |
a |
a |
b |
ia, ua |
c |
uan |
d |
ao |
e |
e |
f |
an |
g |
ang |
h |
iang, uang |
i |
i |
j |
ian |
k |
iao |
l |
in |
m |
ie |
n |
iu |
o |
o, uo |
p |
ou |
q |
er, ing |
r |
en |
s |
ai |
t |
eng |
u |
u |
v |
v, ui |
w |
ei |
x |
uai, ue |
y |
iong, ong |
z |
un |
Table 4–5 Intelligent ABC Keyboard Mapping
Key |
Definition |
---|---|
i |
ch |
u |
sh |
v |
zh |
a |
a |
b |
ou |
c |
in, uai |
d |
ua, ia |
e |
e |
f |
en |
g |
eng |
h |
ang |
i |
i |
j |
an |
k |
ao |
l |
ai |
m |
ue, ui |
n |
un |
o |
o, uo |
p |
uan |
q |
ei |
r |
iu, er |
s |
ong, iong |
t |
uang, iang |
u |
u |
v |
v |
w |
ian |
x |
ie |
y |
ing |
z |
iao |
The GBK code input method uses the GBK code defined by the Chinese Internal Code Specification. This method includes all of the Chinese characters and symbols in GB2312-80, and other CJK Chinese characters in GB 13000-1. Each Chinese character or symbol is identified by a four-hexadecimal digital internal code defined in the Chinese Internal Code Specification.
This procedure describes how to use the GBK codes to type Chinese characters and symbols.
Open a Terminal window.
In the Terminal window, press Control-spacebar to turn on Chinese input conversion.
Press F4 to select the GBK code input method.
The status area shows that the GBK code input mode is on.
Type the first three of the four keys that represent the character to display. In this example, type b0a of the string b0a1.
The first three letters are visible in the preedit area.
Type the fourth key.
The character automatically replaces the letters in the preedit area.
The GB2312 code input method uses the GBK code defined by the Chinese Internal Code Specification. This specification includes all of the Chinese characters and symbols in GB2312-80, and other CJK Chinese characters in GB 13000-1. Each Chinese character or symbol is identified by a four-hexadecimal digital internal code defined in the Chinese Internal Code Specification.
This procedure describes how to use the GB2312 codes to type Chinese characters and symbols.
Select the input method.
Follow the steps in the procedure How to Select an Input Method to select the input method.
The status area shows that the GB2312 code input mode is on.
Type the first three of the four keys that represent the character to display. In this example, type b0a of the string b0a1.
The first three letters are visible in the preedit area.
Type the fourth key.
The character automatically replaces the letters in the preedit area.
The GB18030 code input method uses the GB18030 code defined by the Chinese Internal Code Specification. This method includes all of the Chinese characters and symbols in GB2312-80, and other CJK Chinese characters in GB 18030. Each Chinese character or symbol is identified by a four-hexadecimal or eight-hexadecimal digital internal code defined in the Chinese Internal Code Specification.
This procedure describes how to use the GB18030 codes to type Chinese characters and symbols.
Select the input method.
Follow the steps in the procedure How to Select an Input Method to select the input method.
The status area shows that the GB18030 code input mode is on.
For example, to input Chinese GB18030 character with code 0xb0a1, press the first three of the four keys that represent the character to display. In this example, type b0a of the string b0a1.
The first three letters are visible in the preedit area.
Type the fourth key.
The character automatically replaces the letters in the preedit area.
To input a Chinese GB18030 character with code 0x82358538, press the first seven of the eight keys that represent the character to display. In this example, type 8235853 of the string 82358538.
The first seven numbers are visible in the preedit area.
Type the last key.
The character is automatically committed to the window.
The QuanPin input method requires up to six keystrokes to type each Chinese PinYin character. QuanPin maps PinYin phonetics to single lowercase Roman letters. You can use the QuanPin input method to type individual Chinese characters in both the zh_CN.EUC and zh_CN.GBK locales.
This procedure describes how to use the QuanPin input method to type the character that represents the Full PinYin word fang. For information on making the lookup choices used in this procedure, see How to Search and Select Lookup Choices.
Select the input method.
Follow the steps in the procedure How to Select an Input Method to select the input method.
Type the four keystrokes fang.
Type 1 to select the corresponding GBK Chinese character in the lookup choice list.
Your choice is substituted for the Full PinYin string in the preedit area.
You can use the English_Chinese input method in both zh_CN.EUC and zh_CN.GBK locales. With this method, you type English words of up to 15 keystrokes that are mapped to Chinese phrases. For each keystroke, a lookup window displays characters that match your input. To select a character, you type the number that corresponds to your lookup choice. For more information, see How to Search and Select Lookup Choices.
The following procedure shows you how to use this input method to enter the Simplified Chinese phrase for the English word, world.
Select the input method.
Follow the steps in the procedure How to Select an Input Method to select the input method.
Type the five keystrokes world.
Type 3 to select the corresponding Chinese phrase from the lookup choice list.
Your choice is substituted for the English string in the preedit area.
You can use the wildcard characters asterisk (*) or question mark (?) to search a system dictionary. The * stands for one or more letter. The ? represents only one letter.
To search for all the English words that end with lution, type input *lution. The lookup choice window appears as shown in the following figure.
To search for all three-letter English words which begin with c, type c??.
The lookup choice window appears as shown in the following figure.
Wubi is a popular input method in China. The encoding rule used in the Wubi input method is based on the radical or stroke shape of Chinese characters.
One of the main advantages of Wubi and other shape-based input methods is a very low repetition rate. The lower repetition rate, a feature not found in PinYin-based input systems, means that only one or two Chinese characters are represented by a Wubi key sequence. Because a single Wubi code seldom represents more than one character, you can enter text more quickly.
Wubi is built on the GB18030-2000 character set standard, a graphemic encoding system. Almost all Chinese, Kanji, and Hanja characters can be encoded with the GB18030-2000 standard.
This section describes the following features included in this release.
GB18030-2000 character set support
Easy character set switching
New radical mechanism for Simplified and Traditional Chinese
Three-level progressive identification code
Phrase input and professional word galleries
Help key
Fault tolerance code
Word-phrase association
Properties settings
The GB18030-2000 character set is a national encoding standard issued by the Chinese government in 2000. The encoding length set by the standard is one, two, or four bytes. GB18030-2000 includes 6,763 standard Simplified Chinese characters, 13,053 Traditional Chinese (Big5) characters, 3,000 characters used in Hong Kong, and 21,003 GBK characters. The Wubi input method supports the GB18030-2000 character set, which makes it working with the smaller character sets contained in GB18030-2000 easy. See Easy Character Set Switching.
For example, if you type the letters gigg and scroll pages to the end, you will find a GB18030 character shown in the following figures:
Solaris WangMa Wubi divides the GB18030-2000 character set into smaller sets of commonly used Chinese characters.
GB2312, which contains 6,763 characters
GBK, which contains 21,003 characters
GB18030-2000, which contains 27,533 characters
When you enter text, you can use the following keyboard shortcuts to switch between character sets.
To use the GB2312 character set, press Control-Shift-1.
To use the GBK character set, press Control-Shift-2.
To use the GB18030-2000 character set, press Control-Shift-3.
Because GB18030-2000 is a relatively new standard, support in Wubi for the GB2312 and GBK character sets ensures backward-compatibility with earlier standards. You might prefer to work in the GB2312 or GBK character set because of improved performance and lower repetition rates.
The new radical, or root, mechanism is a patented technology invented by professor Wang Yongmin who invented Wubi. Professor Yongmin developed from the mechanism from version 86, the old radical system. The mechanism has evolved into a new encoding system compatible with both Simplified and Traditional Chinese. Users of Wubi version 86 can work with three times more characters, using the same encoding and typing rules, without additional training.
One of the main features of Wubi is the last-stroke grapheme identification codes that distinguish between characters of a similar shape. The identification codes are assigned according to the shape of the last radical of the character. The purpose of identification codes is to help users master the Wubi input method at three different levels.
In level A, for beginning users, all three graphemic types with less than four codes have identification codes.
In level B, for intermediate users, only the left-right shaped Chinese characters have identification codes.
In Level C, for advanced users, identification codes are not used.
Wubi supports phrase input. In addition to individual characters, entire phrases can be assigned Wubi codes. In addition to 90,000 basic phrases, there are 11 professional word galleries, similar to glossaries, for each of the following industries:
Traffic and transportation
Computer and household electronics
Economy and finance
Medicine and health
Mining and metallurgy
Foreign trade and travel
Military affairs and national defense
Law and aesthetics
Galleries also exist for place names and for idioms.
You can select word galleries that contain between 3,000 and 20,000 entries. in the Preferences dialog box.
For example, when you choose the Medicine and Health phrase gallery and type the word mino, medical phrases are listed for selection.
The Solaris Wubi input method supports encoding hint features. As you type, the character encoding appears in the Select Repetition Code Window. This feature can help you master the encoding methods and codes of Chinese characters. In addition, you can use the uppercase or lowercase Z key as a wildcard at any time. Z is the only key not mapped to a character in Wubi. To help you learn to use Wubi, you can press the Z key to query the system for input codes.
For example, when you can type azzd to search all characters or phrases with a Wubi code that begins with the letter A and ends with the letter D.
According to the preferences you set, the fault tolerance code feature can increase the probability that the system will provide the correct character even when you make a typing mistake.
The word-phrase feature is another productivity aid. The system provides a list of characters that are most likely to follow the character just selected. Instead of typing a code, the system provides a list of likely options from which you can choose the correct character. This feature is also accessed in the Preferences dialog box.
For example, when you type the letters iuxx, the Chinese character ×Ì is automatically committed to application. After the character appears in application window, a new candidate window will display and the phrases which begin with this Chinese character will be listed in this candidate window.
You can make the following settings in the Properties dialog box:
Character sets: GB2312, GBK, or GB18030
Professional word galleries
Identification code mode
Display the Wubi code for a candidate
Display the candidates after each keystroke
Association of characters with phrases
Fault tolerance code
Display characters and phrases with the same code
Display the key prompt in the preedit area
The current Solaris Operating System provides a code table input method interface that enables Chinese users to add new input methods into their system.
A code table is a plain text dictionary file that contains a list Chinese characters, words, and phrases that are mapped to input keystrokes. When you type the specified keystrokes, the associated characters, words, and phrases appear for selection.
The code table file contains the following sections:
[Description] This section lists the distinguishing characteristics of the code table.
Name: Code table name.
Encode: UTF-8, GB, GB2312, GBK, or BIG5 encoding used the code table.
WildChar: Wild character used for input codes.
UsedCodes: Valid characters for input.
MaxCodes: Maximum number of input codes for one item.
[Comment]
[Key_Prompt] This section identifies the prompt string of an input key. The prompt string appears in the preedit area of the application subwindow.
[Function_Key] This section describes the behavior specified function keys.
PageUp: Scroll up a list of selection items.
PageDown: Scroll down a list of selection items.
BackSpace: Delete an input code.
ClearAll: Clears all the input areas, such as preedit area and the lookup area.
[Phrase] This section associates input codes with corresponding Chinese phrases. The phrases must be separated by spaces. The format of each line is:keystroke_sequence word1 word2 word3 ....
[Single] This section associates input codes with corresponding Chinese characters. The format of each line is: keystroke_sequence Characterlist. The characters of the Characterlist are not separated by spaces.
[Options] This section specifies the options that you toggle on or turn off for the code table input method.
HelpInfo_Mode: Display help information.
KeyByKey_Mode: Display lookup candidates key by key or only when the spacebar is pressed.
KeyPrompt_Mode: Display the prompt string of the input key in the preedit area.
AutoSelect_Mode: Commit the lookup choice automatically when only a single candidate is available.
SelectKey_Mode: Select numbers, uppercase letters, or lowercase letters.
The following example shows a code table file.
Create and edit the code table source file.
Prepare the code table source file to define the mapping of characters, words, or phrases to input keystrokes.
Convert the source code table file to binary format.
Use the txt2bin utility to convert the code table text file to binary file.
# /usr/lib/im/locale/zh_CN/common/txt2bin \ source_codetable_file binary_codetable_file
You can find the txt2bin and bin2txt are utilities in the directory: /usr/lib/im/locale/zh_CN/common/
Add the code table to the input method specification file, /usr/lib/im/locale/zh_CN/sysime.cfg.
For example, if your new code table binary file is called newim.data, add the entry newim to the input method specification file, sysime.cfg.
Restart the htt input method server by typing the following commands as root.
# /etc/init.d/IIim stop
# /etc/init.d/IIim start
The new input method is ready to use when you log in to the system.
The following sections in this chapter describe the utilities and applications that you use in the Simplified Chinese Solaris Operating System.
The sdtconvtool graphic user interface utility enables file conversion between various code sets. The sdtconvtool functionality is similar to iconv.
The following figure shows the stdconvtool panel.
Select the code set of the file to be converted.
Scroll through the pull-down list and select the code set of the file to be converted.
Enter the path of the file to be converted.
You can enter the path manually in the source file path area, or you can use the browse button to find and select the file.
Select the code set to for the converted file.
Select the target code set.
Enter the path for the target file.
You can enter the path manually in the target file path area, or you can use the browse button to find and select the file.
Click the start conversion button.
The iconv command converts the characters or sequences of characters in a file from one code set to another. The command then writes the results to standard output. The Simplified Chinese Solaris software includes special filters for the iconv command.
If no conversion exists for a particular character, the character is converted to the underscore _ in the target code set. The following options are supported:
-f from-code — Symbol of the input code set
-t to-code — Symbol of the output code set
The following table lists the code set conversion modules that are supported in Simplified Chinese Solaris software. For more information, see Solaris 10 Reference Manual Collectioniconv(1).
Table 5–1 Simplified Chinese iconv Code Conversion Modules (zh locale)
Code |
Symbol |
Target Code |
Symbol |
---|---|---|---|
ISO2022–CN |
zh_CN.iso2022–CN |
UTF–8 |
UTF–8 |
UTF–8 |
UTF–8 |
ISO2022-CN |
zh_CN.iso2022–CN |
zh.GBK |
zh_CN.gbk |
ISO2022-CN |
zh_CN.iso2022–CN |
zh.GBK |
zh_CN.gbk |
UTF–8 |
UTF–8 |
GB2312-80 |
zh_CN.euc |
ISO 2022-7 |
zh_CN.iso2022-7 |
ISO 2022-7 |
zh_CN.iso2022-7 |
GB2312-80 |
zh_CN.euc |
GB2312-80 |
zh_CN.euc |
ISO 2022-CN |
zh_CN.iso2022-CN |
ISO-2022-CN |
zh_CN.iso2022-CN |
GB2312-80 |
zh_CN.euc |
UTF-8 |
UTF-8 |
GB2312-80 |
zh_CN.euc |
GB2312-80 |
zh_CN.euc |
UTF-8 |
UTF-8 |
GB2312-80 |
zh_CN.euc |
BIG5 |
zh_TW-big5 |
BIG5 |
zh_TW.big5 |
GB2312-80 |
zh_CN.euc |
HZ-GB-2312 |
HZ-GB-2312 |
GB2312–80 |
zh_CN.euc |
GB2312–80 |
zh_CN.euc |
HZ-GB-2312 |
zh_CN.euc |
Table 5–2 Simplified Chinese iconv Code Conversion Modules (zh.GBK locale)
Code |
Symbol |
Target Code |
Symbol |
---|---|---|---|
UTF-8 |
UTF-8 |
GBK |
zh_CN.gbk |
GBK |
zh_CN.gbk |
UTF-8 |
UTF-8 |
GBK |
zh_CN.gbk |
BIG5P |
zh_TW-big5p |
GBK |
zh_CN.gbk |
BIG5HK |
zh_TW-big5hk |
GBK |
zh_CN.gbk |
ISO-2022-CN |
zh_CN.iso2022-CN |
ISO2022-CN |
zh_CN.iso2022-CN |
GBK |
zh_CN.gbk |
GBK |
zh_CN.gbk |
BIG5 |
zh_TW-big5 |
BIG5 |
zh_TW-big5 |
GBK |
zh_CN.gbk |
BIG5P |
zh_TW-big5p |
GBK |
zh_CN.gbk |
BIG5HK |
zh_TW-big5hk |
GBK |
zh_CN.gbk |
HZ-GB-2312 |
HZ-GB-2312 |
GBK |
zh_CN.gbk |
GBK |
zh_CN.gbk |
HZ-GB-2312 |
zh_CN.gbk |
HZ-GB-2312 |
HZ-GB-2312 |
UTF-8 |
UTF-8 |
UTF-8 |
UTF-8 |
HZ-GB-2312 |
HZ-GB-2312 |
Table 5–3 Simplified Chinese iconv Code Conversion Modules (zh_CN.GB18030 locale)
Code |
Symbol |
Target Code |
Symbol |
---|---|---|---|
UTF-8 |
UTF-8 |
GB18030-2000 |
zh_CN.gb18030 |
GB18030-2000 |
zh_CN.gb18030 |
UTF-8 |
UTF-8 |
GB18030-2000 |
zh_CN.gb18030 |
BIG5HK |
zh_HK-big5hk |
GB18030-2000 |
zh_CN.gb18030 |
BIG5P |
zh_TW-big5p |
BIG5HK |
zh_HK-big5hk |
GB18030-2000 |
zh_CN.gb18030 |
BIG5P |
zh_TW-big5p |
GB18030-2000 |
zh_CN.gb18030 |
The following iconv code conversion modules are located in /usr/lib/iconv:
For the zh locale:
zh_CN.euc%zh_TW-big5.so
zh_TW-big5%zh_CN.euc.so
For the zh.GBK locale:
UTF-8%zh_CN.gbk.so
zh_CN.gbk%UTF-8.so
zh.CN.gbk%zh_CN.iso2022-CN.so
zh_CN.iso2022-CN%zh_CN.gbk.so
zh_CN.gbk%zh_TW-big5.so
zh_TW-big5%zh_CN.gbk.so
In the following example, an EUC mail file is converted to ISO 2022-CN:
system% iconv -f zh_CN.euc -t zh_CN.iso2022-CN mail.euc > mail.iso2022-CN |
For further information, see the iconv(3C) and the iconv_zh(5) man pages. These utilities can be used for converting files for printing.
The Simplified Chinese Solaris Operating System supports printing Simplified Chinese output through the following types of printers:
Line printer with built-in Simplified Chinese fonts
PostScript-based printer with built-in scalable fonts
Any PostScript-based printer for bitmap printing
Review the manufacturer's documentation on installing the printer before you complete the procedures in this chapter.
For the Simplified Chinese Solaris Operating System to run a line printer, the printer must recognize EUC.
A printer that does not support EUC needs filters that convert EUC files for printing. Use the commands in this section to print EUC files to non-EUC printers.
The following commands install the printer lp1 on port ttya. The commands signal the print service that lp1 accepts only GB format files.
# lpadmin -p lp1 -v /dev/ttya -I GB # accept lp1 # enable lp1 |
See the lpadmin(1M) man page for more information.
You can use an lpfilter command shown in the following example to print files with formats that are not supported by the printer. The command line signals the print service that a converter called filter-name is available through the filter description file named in pathname.
# lpfilter -f filter-name -F pathname |
The following example shows the output of pathname for a converter called euctogb. The pathname filter converts the default input type to GB with the euctogb converter.
Input types: simple Output types: GB Command: euctocgb |
To print an EUC file, use a command line such as the following.
system% lp EUC-filename |
To print a GB format file, use a command line such as the following.
system% lp -T GB GB-filename |
An application must have the mp utility to print Simplified Chinese characters.
The mp utility supports all Asian locales including UTF-8 locales. As a printing filter, mp generates a properly formatted version of the file content in PostScript format. Depending on the locale's system font configuration for mp, the Postscript output file contains glyph images from a scalable or a bitmap system font. The mp utility is enhanced in this release to print files of a certain type for each locale. For more information, see the mp(1) man page.
You can use a command such as the following to print a file with Simplified Chinese characters. The file might also include ASCII/English characters.
system% mp filename | lp -d printer |