10.1.10 Unicode Support

10.1.10.1 The ucs2 Character Set (UCS-2 Unicode Encoding)
10.1.10.2 The utf8 Character Set (3-Byte UTF-8 Unicode Encoding)

MySQL 5.0 supports two character sets for storing Unicode data:

These two character sets support the characters from the Basic Multilingual Plane (BMP) of Unicode Version 3.0. BMP characters have these characteristics:

The ucs2 and utf8 character sets do not support supplementary characters that lie outside the BMP. Characters outside the BMP compare as REPLACEMENT CHARACTER and convert to '?' when converted to a Unicode character set.

A similar set of collations is available for each Unicode character set. For example, each has a Danish collation, the names of which are ucs2_danish_ci and utf8_danish_ci. All Unicode collations are listed at Section 10.1.13.1, “Unicode Character Sets”.

The MySQL implementation of UCS-2 stores characters in big-endian byte order and does not use a byte order mark (BOM) at the beginning of values. Other database systems might use little-endian byte order or a BOM. In such cases, conversion of values will need to be performed when transferring data between those systems and MySQL.

MySQL uses no BOM for UTF-8 values.

Client applications that need to communicate with the server using Unicode should set the client character set accordingly; for example, by issuing a SET NAMES 'utf8' statement. ucs2 cannot be used as a client character set, which means that it does not work for SET NAMES or SET CHARACTER SET. (See Section 10.1.4, “Connection Character Sets and Collations”.)

The following sections provide additional detail on the Unicode character sets in MySQL.