Choosing a Database Character Set

TimesTen uses the database character set to define the encoding of data stored in character data types, such as CHAR and VARCHAR2.

Use the DatabaseCharacterSet data store attribute to specify the database character set during database creation. You cannot alter the database character set after database creation, and there is no default value for DatabaseCharacterSet. See Supported Character Sets in the Oracle TimesTen In-Memory Database Reference for a list of supported character sets.

Consider the following questions when you choose a character set for a database:

  • What languages does the database need to support now and in the future?

  • Is the character set available on the operating system?

  • What character sets are used on clients?

  • How well does the application handle the character set?

  • What are the performance implications of the character set?

If you are caching Oracle database tables, or if you loading Oracle database data into a TimesTen table, you must create the database with the same database character set as the Oracle database.

This section includes the following topics:

Character Sets and Languages

Choosing a database character set determines what languages can be represented in the database.

A group of characters, such as alphabetic characters, ideographs, symbols, punctuation marks, and control characters, can be encoded as a character set. An encoded character set assigns unique numeric codes to each character in the character repertoire. The numeric codes are called code points or encoded values.

Character sets can be single-byte or multibyte. Single-byte 7-bit encoding schemes can define up to 128 characters and usually support just one language. Single-byte 8-bit encoding schemes can define up to 256 characters and often support a group of related languages. Multibyte encoding schemes are needed to support ideographic scripts used in Asian languages like Chinese or Japanese because these languages use thousands of characters. These encoding schemes use either a fixed number or a variable number of bytes to represent each character. Unicode is a universal encoded character set that enables information from any language to be stored using a single character set. Unicode provides a unique code value for every character, regardless of the platform, program, or language.

Client Operating System and Application Compatibility

The database character set is independent of the operating system. On an English operating system, you can create and run a database with a Japanese character set. However, when an application in the client operating system accesses the database, the client operating system must be able to support the database character set with appropriate fonts and input methods.

For example, you cannot insert or retrieve Japanese data on the English Windows operating system without first installing a Japanese font and input method. Another way to insert and retrieve Japanese data is to use a Japanese operating system remotely to access the database server.

If all client applications use the same character set, then that character set is usually the best choice for the database character set. When client applications use different character sets, the database character set should be a superset of all the application character sets. This ensures that every character is represented when converting from an application character set to the database character set.

Performance and Storage Implications

For best performance, choose a character set that avoids character set conversion and uses the most efficient encoding for the languages desired.

Single-byte character sets result in better performance than multibyte character sets and are more efficient in terms of space requirements. However, single-byte character sets limit how many languages you can support.

Character Sets and Replication

All databases in a replication scheme must have the same database character set. No character set conversion occurs during replication.