International Language Environments Guide

Chinese Text

Chinese usually consists entirely of characters from the ideographic script called Hanzi.

In the People's Republic of China (PRC) there are about 7000 commonly used Hanzi characters in the GB2312 (zh locale), more than 20,000 characters in the GBK charset (zh.GBK locale), and about 30,000 characters in the GB18030-2000 charset (zh_CN.GB18030 locale), including all CJK extension A characters defined in Unicode 3.0.
In Taiwan, the most frequently used charsets are the CNS11643-1992 (zh_TW locale) and the Big5 (zh_TW.BIG5 locale). They share about 13,000 Hanzi characters.
In Hong Kong, 4702 characters have been added into the Big5 charset to become the Big5-HKSCS charset (zh_HK.BIG5HK).

If a character is not a root character, it usually consists of two or more parts, two being most common. In two-part characters, one part generally represents meaning, and the other represents pronunciation. Occasionally both parts represent meaning. The radical is the most important element, and characters are traditionally arranged by radical, of which there are several hundred. A single sound can be represented by many different characters, which are not interchangeable in usage. A single character can have different sounds.

Some characters are more appropriate than others in a given context. The appropriate character is distinguished phonetically by the use of tones. By contrast, spoken Japanese and Korean lack tones.

Several phonetic systems represent Chinese. In the People's Republic of China the most common is pinyin, which uses Roman characters and is widely employed in the West for place names such as Beijing. The Wade-Giles system is an older phonetic system, formerly used for place names such as Peking. In Taiwan zhuyin (or bopomofo), a phonetic alphabet with unique letter forms, is often used instead.