6.17 Character Set Considerations for Connector/Net

Treating Binary Blobs As UTF8

Before the introduction of The utf8mb4 Character Set (4-Byte UTF-8 Unicode Encoding) (in MySQL 5.5.3), MySQL did not support 4-byte UTF8 sequences. This makes it difficult to represent some multibyte languages such as Japanese. To try and alleviate this, Connector/Net supports a mode where binary blobs can be treated as strings.

To do this, you set the 'Treat Blobs As UTF8' connection string keyword to yes. This is all that needs to be done to enable conversion of all binary blobs to UTF8 strings. To convert only some of your BLOB columns, you can make use of the 'BlobAsUTF8IncludePattern' and'BlobAsUTF8ExcludePattern' keywords. Set these to a regular expression pattern that matches the column names to include or exclude respectively.

When the regular expression patterns both match a single column, the include pattern is applied before the exclude pattern. The result, in this case, would be that the column would be excluded. Also, be aware that this mode does not apply to columns of type BINARY or VARBINARY and also do not apply to nonbinary BLOB columns.

This mode only applies to reading strings out of MySQL. To insert 4-byte UTF8 strings into blob columns, use the .NET Encoding.GetBytes function to convert your string to a series of bytes. You can then set this byte array as a parameter for a BLOB column.