MySQL Connector/J 5.1 Developer Guide

5.6 Using Character Sets and Unicode

All strings sent from the JDBC driver to the server are converted automatically from native Java Unicode form to the client character encoding, including all queries sent using Statement.execute(), Statement.executeUpdate(), Statement.executeQuery() as well as all PreparedStatement and CallableStatement parameters with the exclusion of parameters set using setBytes(), setBinaryStream(), setAsciiStream(), setUnicodeStream(), and setBlob().

Number of Encodings Per Connection

Connector/J supports a single character encoding between client and server, and any number of character encodings for data returned by the server to the client in ResultSets.

Setting the Character Encoding

The character encoding between client and server is automatically detected upon connection (provided that the Connector/J connection properties characterEncoding and connectionCollation are not set). You specify the encoding on the server using the system variable character_set_server (for more information, see Server Character Set and Collation). The driver automatically uses the encoding specified by the server. For example, to use the 4-byte UTF-8 character set with Connector/J, configure the MySQL server with character_set_server=utf8mb4, and leave characterEncoding and connectionCollation out of the Connector/J connection string. Connector/J will then autodetect the UTF-8 setting.

To override the automatically detected encoding on the client side, use the characterEncoding property in the connection URL to the server. Use Java-style names when specifying character encodings. The following table lists MySQL character set names and their corresponding Java-style names:

Table 5.3 MySQL to Java Encoding Name Translations

MySQL Character Set Name Java-Style Character Encoding Name
ascii US-ASCII
big5 Big5
gbk GBK
sjis SJIS (or Cp932 or MS932 for MySQL Server < 4.1.11)
cp932 Cp932 or MS932 (MySQL Server > 4.1.11)
gb2312 EUC_CN
ujis EUC_JP
euckr EUC_KR
latin1 Cp1252
latin2 ISO8859_2
greek ISO8859_7
hebrew ISO8859_8
cp866 Cp866
tis620 TIS620
cp1250 Cp1250
cp1251 Cp1251
cp1257 Cp1257
macroman MacRoman
macce MacCentralEurope

For 5.1.46 and earlier: utf8

For 5.1.47 and later: utf8mb4

UTF-8
ucs2 UnicodeBig

Notes

For Connector/J 5.1.46 and earlier: In order to use the utf8mb4 character set for the connection, the server MUST be configured with character_set_server=utf8mb4; if that is not the case, when UTF-8 is used for characterEncoding in the connection string, it will map to the MySQL character set name utf8, which is an alias for utf8mb3.

For Connector/J 5.1.47 and later:

  • When UTF-8 is used for characterEncoding in the connection string, it maps to the MySQL character set name utf8mb4.

  • If the connection option connectionCollation is also set alongside characterEncoding and is incompatible with it, characterEncoding will be overridden with the encoding corresponding to connectionCollation.

  • Because there is no Java-style character set name for utfmb3 that you can use with the connection option charaterEncoding, the only way to use utf8mb3 as your connection character set is to use a utf8mb3 collation (for example, utf8_general_ci) for the connection option connectionCollation, which forces a utf8mb3 character set to be used, as explained in the last bullet.

Warning

Do not issue the query SET NAMES with Connector/J, as the driver will not detect that the character set has been changed by the query, and will continue to use the character set configured when the connection was first set up.