MySQL HeatWave User Guide
This topic describes how to prepare your data for loading into MySQL HeatWave.
Before loading data, ensure that you meet the following requirements:
The tables you intend to load must be set to
ENGINE=InnoDB
tables. You can manually
convert tables by using the following
ALTER TABLE
statement:
mysql> ALTER TABLE tbl_name
ENGINE=InnoDB;
The tables you intend to load must be defined with a primary key. You can add a primary key using the following syntax:
mysql> ALTER TABLE tbl_name
ADD PRIMARY KEY (column
);
Adding a primary key is a table-rebuilding operation.
Primary key columns defined with column prefixes are not supported.
Load time is affected if the primary key contains more
than one column, or if the primary key column is not an
INTEGER
column. The impact on MySQL
performance during load, change propagation, and query
processing depends on factors such as data properties,
available resources (compute, memory, and network), and
the rate of transaction processing on the DB System.
Identify all of the tables that your queries access to ensure that you load all of them into MySQL HeatWave. If a query accesses a table that is not loaded into MySQL HeatWave, it will not be offloaded to MySQL HeatWave for processing.
To minimize the number of MySQL HeatWave nodes required for your data, exclude table columns that are not accessed by your queries.
The maximum column widths supported for different string and text data types are as follows:
Table 4.2 Supported Column Widths
String/Text Datatype | Column Width Supported from MySQL 9.2.2 | Column Width Supported Prior to MySQL 9.2.2 |
---|---|---|
MEDIUMTEXT |
4192192 bytes | 65532 bytes |
LONGTEXT |
4192192 bytes | 65532 bytes |
JSON |
4192192 bytes | 65532 bytes |
TEXT |
65535 bytes | 65532 bytes |
CHAR |
65532 bytes | 65532 bytes |
VARCHAR |
65532 bytes | 65532 bytes |
VECTOR |
65532 bytes | 65532 bytes |
This applies to intermediate relations and final query results as well.
The number of columns per table cannot exceed 1017.
To save space in memory, set
CHAR
,
VARCHAR
, and
TEXT
type column lengths to
the minimum length required for the longest string value.
Data is partitioned by the table primary key when no data placement keys are defined. Only consider defining data placement keys if partitioning data by the primary key does not provide suitable performance.
Reserve the use of data placement keys for the most time-consuming queries. In such cases, define data placement keys on:
The most frequently used JOIN
keys.
The keys of the longest running queries.
Consider using Auto Data Placement for data placement recommendations.
Apply
dictionary
encoding to CHAR
,
VARCHAR
, and
TEXT
columns whenever
appropriate. It reduces memory consumption on the
MySQL HeatWave Cluster nodes. Use the following criteria when
selecting string columns for dictionary encoding:
The column is not used as a key in
JOIN
queries.
Your queries do not perform operations such as
LIKE
, SUBSTR
,
CONCAT
, etc., on the column.
Variable-length encoding supports string functions and
operators as well as LIKE
predicates; dictionary encoding does not.
The column has a limited number of distinct values.
Dictionary encoding is best suited to columns with a
limited number of distinct values, such as
country
columns.
The column is expected to have few new values added during change propagation. Avoid dictionary encoding for columns with a high number of inserts and updates. Adding a significant number of a new, unique values to a dictionary encoded column can cause a change propagation failure.
When in doubt about choosing an encoding type, use variable-length encoding, which is applied by default when tables are loaded into MySQL HeatWave, or use Auto Encoding to obtain encoding recommendations.
Increase the number of read threads
For medium to large tables, increase the number of read
threads to 32 by setting the
innodb_parallel_read_threads
variable on the DB System.
mysql> SET SESSION innodb_parallel_read_threads = 32;
If the DB System is not busy, you can increase the value to 64.
The Auto Parallel Load utility automatically optimizes the number of parallel read threads for each table. See Section 4.2.5, “Load Data Using Auto Parallel Load”. For users of MySQL HeatWave on AWS, the number of parallel read threads is also optimized when loading data from the MySQL HeatWave Console. See Manage Data in MySQL HeatWave with Workspaces in the MySQL HeatWave on AWS Service Guide.
Load tables concurrently
If you have many small and medium tables (less than 20GB in size), load tables from multiple sessions:
Session 1: mysql>ALTER TABLE supplier SECONDARY_LOAD;
Session 2: mysql>ALTER TABLE parts SECONDARY_LOAD;
Session 3: mysql>ALTER TABLE region SECONDARY_LOAD;
Session 4: mysql>ALTER TABLE partsupp SECONDARY_LOAD;
Avoid or reduce conflicting operations
Data load operations share resources with other OLTP DML
and DDL operations on the DB System. To improve load
performance, avoid or reduce conflicting DDL and DML
operations. For example, avoid running DDL and large DML
operations on the LINEITEM
table while
executing an ALTER TABLE LINEITEM
SECONDARY_LOAD
operation.
Learn how to enable or disable data compression.
Learn how to load data using Auto Parallel Load or manually.