MySQL HeatWave User Guide
This topic describes how to encode string columns to improve the performance of queries that access them.
This topic contains the following sections:
Review the requirements.
The examples in this topic use the sample database
airportdb
. To learn how to download
the sample database, see
AirportDB
Analytics Quickstart.
MySQL HeatWave supports
Variable-length
encoding(VARLEN
) and
Dictionary
encoding (SORTED
).
The following table provides an overview of encoding type characteristics:
Table 5.14 Column Encoding Type Characteristics
Encoding Type | Expression, Filter, Function, and Operator Support | Best Suited To | Space Required On |
---|---|---|---|
Variable-length (VARLEN) |
Supports JOIN operations, string functions and
operators, and LIKE predicates. |
Columns with a high number of distinct values | MySQL HeatWave nodes |
Dictionary (SORTED) |
Supports GROUP BY and ORDER BY
operations on string columns |
Columns with a low number of distinct values | DB System node |
For additional information about string column encoding, see Section 5.8.1, “String Column Encoding”.
By default, when tables are loaded into MySQL HeatWave,
CHAR
,
VARCHAR
, and
TEXT
type
columns are encoded using variable-length encoding. To use
dictionary encoding, define the
RAPID_COLUMN=ENCODING=SORTED
keyword
string in a column comment before loading the table.
Select one of the encoding type, based on the query pattern and the cardinality of the column.
The keyword string must be uppercase. Lowercase and mixed-case strings are ignored.
Variable-length
encoding(VARLEN
)
To run JOIN
operations involving
string columns or use string functions and operators
and, LIKE
predicates.
To encode columns with a high number of distinct
values, such as comment
columns.
When you require more space for column values on the MySQL HeatWave nodes.
Dictionary
encoding (SORTED
)
To run expression, filter, function, and operator support than dictionary encoding.
To encode columns with a low number of distinct
values, such as country
columns.
When you require space on the DB System node for dictionaries.
You can define the keyword string in a
CREATE TABLE
or
ALTER TABLE
statement, as
shown:
mysql> CREATE TABLE airline_info (name VARCHAR(100)
COMMENT 'RAPID_COLUMN=ENCODING=SORTED');
Query OK, 0 rows affected (0.2363 sec)
mysql> ALTER TABLE airline_info MODIFY name VARCHAR(100)
COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
Query OK, 0 rows affected (0.2284 sec)
If necessary, you can specify variable-length encoding
explicitly using the
RAPID_COLUMN=ENCODING=VARLEN
keyword
string.
Other information is permitted in column comments. For example, it is permitted for a column description to be specified alongside a column encoding keyword string:
COMMENT
'
column_description
RAPID_COLUMN=ENCODING=SORTED'
For string column encoding recommendations, use MySQL HeatWave Autopilot Advisor after loading tables into MySQL HeatWave and running queries. For more information, see Section 5.8, “Optimize Workloads for OLAP”.
To modify or remove a string column encoding, refer to the procedure described in Section 5.5, “Modify Tables”.
To identify explicitly encoded string columns in tables
on the DB System, query the
COLUMN_COMMENT
column of the
INFORMATION_SCHEMA.COLUMNS
table. For example:
mysql> SELECT COLUMN_NAME, COLUMN_COMMENT FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME LIKE 'airline_info' AND COLUMN_COMMENT LIKE '%ENCODING%';
+-------------+------------------------------+
| COLUMN_NAME | COLUMN_COMMENT |
+-------------+------------------------------+
| name | RAPID_COLUMN=ENCODING=SORTED |
+-------------+------------------------------+
You can also view explicitly defined column encodings
for an individual table using SHOW
CREATE TABLE
.
To view the dictionary size for dictionary-encoded columns, in bytes:
mysql>USE performance_schema;
mysql>SELECT rpd_table_id.TABLE_NAME, rpd_columns.COLUMN_ID, rpd_columns.DICT_SIZE_BYTES FROM rpd_table_id, rpd_columns WHERE rpd_table_id.ID = rpd_columns.TABLE_ID AND rpd_columns.DICT_SIZE_BYTES > 0 ORDER BY rpd_table_id.TABLE_NAME;
---------------+-----------+-----------------+ | TABLE_NAME | COLUMN_ID | DICT_SIZE_BYTES | +--------------+-----------+-----------------+ | airline_info | 3 | 25165912 | +--------------+-----------+-----------------+