MySQL HeatWave User Guide
When data is loaded into MySQL HeatWave, it is partitioned by the table
primary key and sliced horizontally for distribution among
MySQL HeatWave nodes by default. However, this may not be optimal for
queries involving JOIN
or GROUP
BY
operations as they incur costs due to data
redistribution among MySQL HeatWave nodes at query execution time
In case of such queries, the data placement key feature permits
partitioning data by JOIN
or GROUP
BY
key columns instead of primary keys. To use this
feature, define data placement keys on the most frequently used
JOIN
keys and the keys of the longest running
queries.
For data placement key recommendations, use MySQL HeatWave Autopilot Advisor after loading tables into MySQL HeatWave and running queries. For more information, see Section 5.8, “Optimize Workloads for OLAP”.
This topic includes the following sections:
Review the requirements.
The examples in this topic use the sample database
airportdb
. To learn how to download the
sample database, see
AirportDB
Analytics Quickstart.
Defining a data placement key requires adding a column comment with the data placement keyword string:
RAPID_COLUMN=DATA_PLACEMENT_KEY=N
where N
is an index value that
defines the priority order of data placement keys. The rules
for assigning a data placement index are:
The index must start with 1.
Index values range from 1 to 16.
Each column must have an unique index value. For example, you cannot assign the index value of 2 to more than one column in the same table.
Always assign consecutive index values, without gaps.For example, if you define a data placement key column with an index value of 3, there must also be two other data placement key columns with index values of 1 and 2, respectively.
The following example shows multiple columns defined as data placement keys. Although a primary key is defined, data is partitioned by the data placement keys, which are prioritized over the primary key.
mysql> CREATE TABLE airline_discount(
id INT PRIMARY KEY,
date DATE COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1',
discount FLOAT COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=2');
Query OK, 0 rows affected (0.2448 sec)
When defining multiple columns as data placement keys,
prioritize the keys according to query cost. For example,
assign DATA_PLACEMENT_KEY=1
to the key of
the costliest query, and
DATA_PLACEMENT_KEY=2
to the key of the
second costliest query, and so on.
Other information is permitted in column comments. For example, it is permitted to specify a column description alongside a data placement keyword string:
COMMENT 'column_description
RAPID_COLUMN=DATA_PLACEMENT_KEY=1'
To view the data placement type of the tables loaded into
MySQL HeatWave, PrimaryKey
means the data
placement key is using the primary key of the table:
mysql> SELECT SCHEMA_NAME, TABLE_NAME, DATA_PLACEMENT_TYPE, LOAD_STATUS
FROM performance_schema.rpd_table_id
JOIN performance_schema.rpd_tables USING (ID);
You can also view data placement keys for an individual table using SHOW CREATE TABLE.
For information about data placement key index values, see Section 5.8.2, “Data Placement Keys”.
To determine whether a JOIN
or
GROUP BY
query used data placement
partitions, examine the QEP_TEXT
column in
the performance_schema.rpd_query_stats
table. This column contains prepart
(pre-partitioning) data, which indicates how the query was
distributed across MySQL HeatWave nodes.
For GROUP BY
operations, the
prepart
data is displayed as:
"prepart": #
Where, # represents the number of nodes.
For JOIN
operations, the
prepart
data appears as:
"prepart": [#, #]
where [#, #]
represents the number of
MySQL HeatWave nodes used in the left and the right branches
respectively. If either value is greater than 1, that branch
used data placement partitions. If both values are 1 (i.e.,
"prepart": [1, 1]), then data placement partitions is not used
both the branches.
To query QEP_TEXT
prepart
data for the last executed query:
mysql> SELECT CONCAT( '"prepart":[', (JSON_EXTRACT(QEP_TEXT->>"$**.prepart", '$[0][0]')),
"," ,(JSON_EXTRACT(QEP_TEXT->>"$**.prepart", '$[0][1]')) , ']' )
FROM performance_schema.rpd_query_stats
WHERE query_id = (select max(query_id)
FROM performance_schema.rpd_query_stats);
+-----------------------------------------------------------------------------+
| concat( '"prepart":[', (JSON_EXTRACT(QEP_TEXT->>"$**.prepart", '$[0][0]')), |
|"," ,(JSON_EXTRACT(QEP_TEXT->>"$**.prepart", '$[0][1]')) , ']' ) |
+-----------------------------------------------------------------------------+
| "prepart":[2,2] |
+-----------------------------------------------------------------------------+
Refer to the following example to understand how to use MySQL HeatWave data placement keys to improve query performance by controlling data distribution and verifying execution plan efficiency.
To begin with, check whether the target table (airline_discount) has any existing data placement keys, using SELECT statement:
mysql> SELECT COLUMN_NAME, COLUMN_COMMENT
FROM INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME LIKE 'airline_discount'
AND
COLUMN_COMMENT LIKE '%DATA_PLACEMENT_KEY%';/
add output
Modify the table (airline_discount) to add (airline_discount)the from and to columns as placement keys, so that MySQL HeatWave can locate related data across nodes and reduce data movement during joins.
mysql> ALTER TABLE flight
MODIFY `from` smallint NOT NULL COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=1',
MODIFY `to` smallint NOT NULL COMMENT 'RAPID_COLUMN=DATA_PLACEMENT_KEY=2';
Add output
Use the following query to inspect the number of partitions used in the most recent offloaded query:
mysql> SELECT QEP_TEXT->>"$**.nPreparts" FROM performance_schema.rpd_query_stats
WHERE query_id = (select max(query_id) FROM performance_schema.rpd_query_stats);
Add output
The nPreparts
value represents the
number of partitions involved. A lower
nPreparts
value indicates more
efficient data placement and reduced inter-node
communication.
JOIN
and GROUP BY
query optimizations are only applied if at least one of
the JOIN
or GROUP BY
relations has a key that matches the defined data
placement key.
If you execute a JOIN
operation with or
without the JOIN
and GROUP
BY
query optimization, a compilation-time cost
model determines how the query is executed. The cost model
uses estimated statistics.
A data placement key cannot be defined on a dictionary-encoded string column but are permitted on variable-length encoded columns. MySQL HeatWave applies variable-length encoding to string columns by default. See Section 5.8.1.4, “Encoding String Columns”.
A data placement key can only be defined on a column with a supported data type. See Section 4.2.1, “Supported Data Types for MySQL HeatWave”.
A data placement key column cannot be defined as a
NOT SECONDARY
column. See
Section 4.2.6.1, “Exclude Table Columns”.
Learn about Autopilot Advisor.