MySQL HeatWave User Guide

5.8.1.4 Encoding String Columns

This topic describes how to encode string columns to improve the performance of queries that access them.

This topic contains the following sections:

Before You Begin
Supported Encoding Types

MySQL HeatWave supports Variable-length encoding(VARLEN) and Dictionary encoding (SORTED).

The following table provides an overview of encoding type characteristics:

Table 5.14 Column Encoding Type Characteristics

Encoding Type Expression, Filter, Function, and Operator Support Best Suited To Space Required On
Variable-length (VARLEN) Supports JOIN operations, string functions and operators, and LIKE predicates. Columns with a high number of distinct values MySQL HeatWave nodes
Dictionary (SORTED) Supports GROUP BY and ORDER BY operations on string columns Columns with a low number of distinct values DB System node

For additional information about string column encoding, see Section 5.8.1, “String Column Encoding”.

Select the Encoding Type

By default, when tables are loaded into MySQL HeatWave, CHAR, VARCHAR, and TEXT type columns are encoded using variable-length encoding. To use dictionary encoding, define the RAPID_COLUMN=ENCODING=SORTED keyword string in a column comment before loading the table.

Select one of the encoding type, based on the query pattern and the cardinality of the column.

Note

The keyword string must be uppercase. Lowercase and mixed-case strings are ignored.

  • Variable-length encoding(VARLEN)

    • To run JOIN operations involving string columns or use string functions and operators and, LIKE predicates.

    • To encode columns with a high number of distinct values, such as comment columns.

    • When you require more space for column values on the MySQL HeatWave nodes.

  • Dictionary encoding (SORTED)

    • To run expression, filter, function, and operator support than dictionary encoding.

    • To encode columns with a low number of distinct values, such as country columns.

    • When you require space on the DB System node for dictionaries.

Define String Column Encoding

You can define the keyword string in a CREATE TABLE or ALTER TABLE statement, as shown:

mysql> CREATE TABLE airline_info (name VARCHAR(100) 
          COMMENT 'RAPID_COLUMN=ENCODING=SORTED');
Query OK, 0 rows affected (0.2363 sec)
mysql> ALTER TABLE airline_info MODIFY name VARCHAR(100) 
          COMMENT 'RAPID_COLUMN=ENCODING=SORTED';
Query OK, 0 rows affected (0.2284 sec)

If necessary, you can specify variable-length encoding explicitly using the RAPID_COLUMN=ENCODING=VARLEN keyword string.

Note

Other information is permitted in column comments. For example, it is permitted for a column description to be specified alongside a column encoding keyword string:

COMMENT 'column_description RAPID_COLUMN=ENCODING=SORTED'

Tip

For string column encoding recommendations, use MySQL HeatWave Autopilot Advisor after loading tables into MySQL HeatWave and running queries. For more information, see Section 5.8, “Optimize Workloads for OLAP”.

To modify or remove a string column encoding, refer to the procedure described in Section 5.5, “Modify Tables”.

View String Column Encoding
  • To identify explicitly encoded string columns in tables on the DB System, query the COLUMN_COMMENT column of the INFORMATION_SCHEMA.COLUMNS table. For example:

    mysql>  SELECT COLUMN_NAME, COLUMN_COMMENT FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME LIKE 'airline_info' AND COLUMN_COMMENT LIKE '%ENCODING%';
    +-------------+------------------------------+
    | COLUMN_NAME | COLUMN_COMMENT               |
    +-------------+------------------------------+
    | name        | RAPID_COLUMN=ENCODING=SORTED |
    +-------------+------------------------------+

    You can also view explicitly defined column encodings for an individual table using SHOW CREATE TABLE.

  • To view the dictionary size for dictionary-encoded columns, in bytes:

    mysql> USE performance_schema;
    mysql> SELECT rpd_table_id.TABLE_NAME, rpd_columns.COLUMN_ID, rpd_columns.DICT_SIZE_BYTES
              FROM rpd_table_id, rpd_columns 
              WHERE rpd_table_id.ID = rpd_columns.TABLE_ID AND rpd_columns.DICT_SIZE_BYTES > 0 
              ORDER BY rpd_table_id.TABLE_NAME;
    ---------------+-----------+-----------------+
    | TABLE_NAME   | COLUMN_ID | DICT_SIZE_BYTES |
    +--------------+-----------+-----------------+
    | airline_info |         3 |        25165912 |
    +--------------+-----------+-----------------+
    
What's Next

Data Placement Keys.