Bookshelf v7.7: Traditional Character Sets

Global Deployment Guide > Supported Character Sets and Collation > Supported Traditional and Unicode Character Sets >

Traditional Character Sets

Before the emergence of Unicode, traditional character sets were available to address storage and processing requirements for a specific language or group of languages. Examples of traditional character sets are Western European, covering languages spoken in Western European countries as well as in all countries in the Americas, and Japanese covering the Japanese language.

Because of the regional aspect of traditional character sets, character data for languages not part of the character set cannot be processed in the same environment. Therefore, when a need to process data from belonging to multiple character sets arise, customers are forced to provide multiple environments.

Also, as character sets are expressed in code pages, the numeric representation of a character in one code page may be different from the representation in another code page, and often the character does not even exist.

For example, the letter a-umlaut (ä) in the Western European character set does not exist in the Arabic character set. In a Western European code page, like 1252 or ISO 8859-1, the a-umlaut occupies code point E4 (Hex value). In the Arabic code page, like 1256 or ISO 8859-6, the E4 code point is an Arabic character and not the a-umlaut. This means that it is not possible to represent neither the umlaut character on an Arabic system, nor the Arabic character in a Western European system.

There are a set of characters that are common in most generally used traditional character sets and code pages. These characters are known as the ASCII characters. They include the common characters used in the English language and they occupy the first 128 code points (Hex 00-9F) in the traditional code pages.

For more information, see System Requirements and Supported Platforms.

Global Deployment Guide