About Character Set Selection During Installation

Before you create the database, decide the character set that you want to use.

After a database is created, changing its character set is usually very expensive in terms of time and resources. Such operations may require converting all character data by exporting the whole database and importing it back. Therefore, it is important that you carefully select the database character set at installation time.

Oracle Database uses character sets for the following:

  • Data stored in SQL character data types (CHAR, VARCHAR2, CLOB, and LONG).

  • Identifiers such as table names, column names, and PL/SQL variables.

  • Stored SQL and PL/SQL source code, including text literals embedded in this code.

The default database character set of a database created from the General Purpose/Transaction Processing or the Data Warehousing template is Unicode AL32UTF8.

Unicode is the universal character set that supports most of the currently spoken languages of the world. It also supports many historical scripts (alphabets). Unicode is the native encoding of many technologies, including Java, XML, XHTML, ECMAScript, and LDAP. Unicode is ideally suited for databases supporting the Internet and the global economy.

Because AL32UTF8 is a multibyte character set, it requires slightly more CPU time for text processing compared to single-byte character sets. Also, storage space requirements are higher for text in most languages compared to corresponding legacy character sets. However, the universality and flexibility of Unicode that enables easy addition of data in new languages to applications running in an AL32UTF8 database generally outweighs these additional costs.

The database character set of an Oracle Database, that is, of its CDB$ROOT container, determines which pluggable databases (PDBs) can be plugged into it. If you use Unicode AL32UTF8 as your database character set, then you can plug in a PDB in any database character set supported by Oracle Database (with the exception of EBCDIC-based character sets). If you use any character set other than AL32UTF8 when creating the container database, you will be able to plug in PDBs in the same character set only. Therefore, you should generally use the default option for the database character set when installing a new database.

If you need to deploy PDBs in a given legacy character set to fulfill a specific compatibility, storage, or performance requirement, create a temporary container database in this legacy character set with one empty PDB. This PDB will have the same legacy database character set. Then, unplug this PDB and plug it into the target AL32UTF8 container database. Drop the temporary container database. You can use such a plugged-in PDB as a template to clone further PDBs in the same legacy character set as needed. You can use the same method to add further legacy character set template PDBs to the same AL32UTF8 container database, as required.

See Also:

Oracle Database Globalization Support Guide for more information about choosing a database character set for an Oracle Database.