MySQL HeatWave User Guide

4.2.2 Prepare to Load Data

This topic describes how to prepare your data for loading into MySQL HeatWave.

Before You Begin

Prepare Data

Before loading data, ensure that you meet the following requirements:

  • The tables you intend to load must be set to ENGINE=InnoDB tables. You can manually convert tables by using the following ALTER TABLE statement:

    mysql> ALTER TABLE tbl_name ENGINE=InnoDB;
    
  • The tables you intend to load must be defined with a primary key. You can add a primary key using the following syntax:

  • mysql> ALTER TABLE tbl_name ADD PRIMARY KEY (column);
    

    Adding a primary key is a table-rebuilding operation.

    Primary key columns defined with column prefixes are not supported.

    Load time is affected if the primary key contains more than one column, or if the primary key column is not an INTEGER column. The impact on MySQL performance during load, change propagation, and query processing depends on factors such as data properties, available resources (compute, memory, and network), and the rate of transaction processing on the DB System.

  • Identify all of the tables that your queries access to ensure that you load all of them into MySQL HeatWave. If a query accesses a table that is not loaded into MySQL HeatWave, it will not be offloaded to MySQL HeatWave for processing.

  • To minimize the number of MySQL HeatWave nodes required for your data, exclude table columns that are not accessed by your queries.

  • The maximum column widths supported for different string and text data types are as follows:

    Table 4.2 Supported Column Widths

    String/Text Datatype Column Width Supported from MySQL 9.2.2 Column Width Supported Prior to MySQL 9.2.2
    MEDIUMTEXT 4192192 bytes 65532 bytes
    LONGTEXT 4192192 bytes 65532 bytes
    JSON 4192192 bytes 65532 bytes
    TEXT 65535 bytes 65532 bytes
    CHAR 65532 bytes 65532 bytes
    VARCHAR 65532 bytes 65532 bytes
    VECTOR 65532 bytes 65532 bytes

    This applies to intermediate relations and final query results as well.

  • The number of columns per table cannot exceed 1017.

  • To save space in memory, set CHAR, VARCHAR, and TEXT type column lengths to the minimum length required for the longest string value.

  • Data is partitioned by the table primary key when no data placement keys are defined. Only consider defining data placement keys if partitioning data by the primary key does not provide suitable performance.

    Reserve the use of data placement keys for the most time-consuming queries. In such cases, define data placement keys on:

    • The most frequently used JOIN keys.

    • The keys of the longest running queries.

    Consider using Auto Data Placement for data placement recommendations.

  • Apply dictionary encoding to CHAR, VARCHAR, and TEXT columns whenever appropriate. It reduces memory consumption on the MySQL HeatWave Cluster nodes. Use the following criteria when selecting string columns for dictionary encoding:

    1. The column is not used as a key in JOIN queries.

    2. Your queries do not perform operations such as LIKE, SUBSTR, CONCAT, etc., on the column. Variable-length encoding supports string functions and operators as well as LIKE predicates; dictionary encoding does not.

    3. The column has a limited number of distinct values. Dictionary encoding is best suited to columns with a limited number of distinct values, such as country columns.

    4. The column is expected to have few new values added during change propagation. Avoid dictionary encoding for columns with a high number of inserts and updates. Adding a significant number of a new, unique values to a dictionary encoded column can cause a change propagation failure.

    When in doubt about choosing an encoding type, use variable-length encoding, which is applied by default when tables are loaded into MySQL HeatWave, or use Auto Encoding to obtain encoding recommendations.

  • Increase the number of read threads

    For medium to large tables, increase the number of read threads to 32 by setting the innodb_parallel_read_threads variable on the DB System.

    mysql> SET SESSION innodb_parallel_read_threads = 32;
    

    If the DB System is not busy, you can increase the value to 64.

    Tip

    The Auto Parallel Load utility automatically optimizes the number of parallel read threads for each table. See Section 4.2.5, “Load Data Using Auto Parallel Load”. For users of MySQL HeatWave on AWS, the number of parallel read threads is also optimized when loading data from the MySQL HeatWave Console. See Manage Data in MySQL HeatWave with Workspaces in the MySQL HeatWave on AWS Service Guide.

  • Load tables concurrently

    If you have many small and medium tables (less than 20GB in size), load tables from multiple sessions:

    Session 1: 
    mysql> ALTER TABLE supplier SECONDARY_LOAD;
    
    Session 2: 
    mysql> ALTER TABLE parts SECONDARY_LOAD;
    
    Session 3: 
    mysql> ALTER TABLE region SECONDARY_LOAD;
    
    Session 4: 
    mysql> ALTER TABLE partsupp SECONDARY_LOAD;
    
  • Avoid or reduce conflicting operations

    Data load operations share resources with other OLTP DML and DDL operations on the DB System. To improve load performance, avoid or reduce conflicting DDL and DML operations. For example, avoid running DDL and large DML operations on the LINEITEM table while executing an ALTER TABLE LINEITEM SECONDARY_LOAD operation.

What's Next