MySQL HeatWave User Guide

10.1.1  HEATWAVE_LOAD

MySQL 9.0.0 adds support for Lakehouse Incremental Load with the refresh_external_tables option, see: Section 4.3.11.1, “Refresh Data Using Incremental Load”.

mysql> CALL sys.HEATWAVE_LOAD(input_list,[options]);

input_list: {
 JSON_ARRAY(input [,input] ...)
}

options: {
 JSON_OBJECT("key","value"[,"key","value"] ...)
      "key","value": {
      ["mode",{"normal"|"dryrun"|"validation"}]
      ["output",{"normal"|"compact"|"silent"|"help"}]
      ["sql_mode","sql_mode"]
      ["policy",{"disable_unsupported_columns"|"not_disable_unsupported_columns"}]
      ["set_load_parallelism",{true|false}]
      ["auto_enc",JSON_OBJECT("mode",{"off"|"check"})]
      ["refresh_external_tables",{true|false}]          
      }
}

input: {
 'db_name' | db_object
}

db_object: {
 JSON_OBJECT("key","value"[,"key","value"] ...)
      "key","value": {
      "db_name": "db_name",
      ["tables": JSON_ARRAY(table [, table] ...)]
      ["exclude_tables": JSON_ARRAY(table [, table] ...)]
      }
}

table: {
 'table_name' | table_object
}

table_object: {
 JSON_OBJECT("key","value"[,"key","value"] ...)
      "key","value": {
      "table_name": "table_name",
      ['engine_attribute': engine_attribute_object],
      ['columns': JSON_ARRAY('column_name' [, 'column_name', ...])],
      ['exclude_columns': JSON_ARRAY('column_name' [, 'column_name', ...])]
      }
}

engine_attribute_object: {
 JSON_OBJECT("key","value"[,"key","value"] ...)
      "key","value": {
      "sampling": true|false,
      "dialect": {dialect_section},
      "file": JSON_ARRAY(file_section [, file_section]...) 
      }
}

MySQL 8.4.0 adds support for the following:

mysql> CALL sys.HEATWAVE_LOAD(input_list,[options]);

input_list: {
 JSON_ARRAY(input [,input] ...)
}

options: {
 JSON_OBJECT("key","value"[,"key","value"] ...)
      "key","value": {
      ["mode",{"normal"|"dryrun"|"validation"}]
      ["output",{"normal"|"compact"|"silent"|"help"}]
      ["sql_mode","sql_mode"]
      ["policy",{"disable_unsupported_columns"|"not_disable_unsupported_columns"}]
      ["set_load_parallelism",{true|false}]
      ["auto_enc",JSON_OBJECT("mode",{"off"|"check"})]
      }
}

input: {
 'db_name' | db_object
}

db_object: {
 JSON_OBJECT("key","value"[,"key","value"] ...)
      "key","value": {
      "db_name": "db_name",
      ["tables": JSON_ARRAY(table [, table] ...)]
      ["exclude_tables": JSON_ARRAY(table [, table] ...)]
      }
}

table: {
 'table_name' | table_object
}

table_object: {
 JSON_OBJECT("key","value"[,"key","value"] ...)
      "key","value": {
      "table_name": "table_name",
      ['engine_attribute': engine_attribute_object],
      ['columns': JSON_ARRAY('column_name' [, 'column_name', ...])],
      ['exclude_columns': JSON_ARRAY('column_name' [, 'column_name', ...])]
      }
}

engine_attribute_object: {
 JSON_OBJECT("key","value"[,"key","value"] ...)
      "key","value": {
      "sampling": true|false,
      "dialect": {dialect_section},
      "file": JSON_ARRAY(file_section [, file_section]...) 
      }
}

Use input_list to define what to load. input_list is a JSON array and requires one or more valid input which can be either a valid schema name or a db_object. An empty array is permitted to view the Auto Parallel Load command-line help, see Use the Auto Parallel Load Command-Line Help. This is backwards compatible with db_list.

Use key-value pairs in JSON format to specify parameters. MySQL HeatWave uses the default setting if there is no option setting. Use NULL to specify no arguments.

For syntax examples, see Quickstart: Load Data Using Lakehouse Auto Parallel Load Examples.

Auto Parallel Load options is a JSON object literal that includes:

The db_object is a JSON object literal that includes:

Deprecated Auto Parallel Load Syntax

Before MySQL 8.4.0, the following syntax will be deprecated in a future release.

mysql> CALL sys.HEATWAVE_LOAD(db_list,[options]);

db_list: {
 JSON_ARRAY(["schema_name","schema_name"] ...)
}

options: {
 JSON_OBJECT("key","value"[,"key","value"] ...)
      "key","value": {
      ["mode",{"normal"|"dryrun"}]
      ["output",{"normal"|"compact"|"silent"|"help"}]
      ["sql_mode","sql_mode"]
      ["policy",{"disable_unsupported_columns"|"not_disable_unsupported_columns"}]
      ["exclude_list",JSON_ARRAY(schema_name_1, schema_name_2.table_name_1, schema_name_3.table_name_2.column_name_1, ...)]
      ["set_load_parallelism",{true|false}]
      ["auto_enc",JSON_OBJECT("mode",{"off"|"check"})]
      ["external_tables",JSON_ARRAY(db_object [, db_object]... )]
      }
}

db_object: {
 JSON_OBJECT("key","value"[,"key","value"] ...)
      "key","value": {
      "db_name": "name",
      "tables": JSON_ARRAY(table [, table] ...)
      }
}

table: {
 JSON_OBJECT("key","value"[,"key","value"] ...)
      "key","value": {
      "table_name": "name",
      "sampling": true|false,
      "dialect": {dialect_section},
      "file": JSON_ARRAY(file_section [, file_section]...) 
      }
}

db_list specifies the schemas to load. The list is a JSON array and requires one or more valid schema names. An empty array is permitted to view the Auto Parallel Load command-line help.

Auto Parallel Load options is a JSON object literal. The following options will be deprecated in a future release:

  • exclude_list: Defines a list of schemas, tables, and columns to exclude from the load script. Names must be fully qualified without backticks.

    Use db_object with tables, exclude_tables, columns or exclude_columns instead.

    Auto Parallel Load automatically excludes database objects that cannot be offloaded, according to the default policy setting. These objects need not be specified explicitly in the exclude list. System schemas, non-InnoDB tables, tables that are already loaded in MySQL HeatWave, and columns explicitly defined as NOT SECONDARY are automatically excluded.

  • external_tables: non-InnoDB tables which do not store any data, but refer to data stored externally. For the external_tables syntax, see: Section 4.3.8.5, “Use Lakehouse Auto Parallel Load with external_tables Option”.

    Use db_object with tables or exclude_tables instead.

Quickstart: Load Data Using Lakehouse Auto Parallel Load Examples

  • Load a single schema with default options.

    mysql> CALL sys.heatwave_load(JSON_ARRAY("tpch"),NULL);
    
  • Load multiple schemas with default options.

    mysql> CALL sys.heatwave_load(JSON_ARRAY("tpch","airportdb","employees","sakila"),NULL);
    
  • Load multiple schemas with the not_disable_unsupported_columns policy, which causes tables with unsupported columns to be excluded from the load operation. Unsupported columns are those with unsupported data types.

    mysql> CALL sys.heatwave_load(JSON_ARRAY("tpch","airportdb","employees","sakila"), 
          JSON_OBJECT("policy","not_disable_unsupported_columns"));
    
  • Load multiple schemas, excluding specified tables and a particular column:

    mysql> CALL sys.heatwave_load(JSON_ARRAY("tpch","airportdb"), 
          JSON_OBJECT("exclude_list",JSON_ARRAY("tpch.orders","airportdb.employee.salary")));
    
  • Load tables that begin with an hw prefix from a schema named schema_customer_1.

    mysql> SET @exc_list = (SELECT JSON_OBJECT('exclude_list',
                 JSON_ARRAYAGG(CONCAT(table_schema,'.',table_name))) 
                 FROM information_schema.tables 
                 WHERE table_schema = 'schema_customer_1' 
                 AND table_name NOT LIKE 'hw%');
    mysql> CALL sys.heatwave_load(JSON_ARRAY('schema_customer_1'), @exc_list);
    
  • Load all schemas with tables that start with an hw prefix.

    mysql> SET @db_list = (SELECT json_arrayagg(schema_name) FROM information_schema.schemata);
    mysql> SET @exc_list = (SELECT JSON_OBJECT('exclude_list',
          JSON_ARRAYAGG(CONCAT(table_schema,'.',table_name))) 
          FROM information_schema.tables 
          WHERE table_schema NOT IN
          ('mysql','information_schema', 'performance_schema','sys') 
          AND table_name NOT LIKE 'hw%');
    mysql> CALL sys.heatwave_load(@db_list, @exc_list);
    

    You can check db_list and exc_list using SELECT JSON_PRETTY(@db_list); and SELECT JSON_PRETTY(@exc_list);

  • Call Auto Parallel Load from a stored procedure:

    DROP PROCEDURE IF EXISTS auto_load_wrapper;
    DELIMITER //
    CREATE PROCEDURE auto_load_wrapper()
    BEGIN
    -- AUTOMATED INPUT
    SET @db_list = (SELECT JSON_ARRAYAGG(schema_name) FROM information_schema.schemata);
    SET @exc_list = (SELECT JSON_ARRAYAGG(CONCAT(table_schema,'.',table_name)) 
    FROM information_schema.tables WHERE table_schema = "db0");
    
    CALL sys.heatwave_load(@db_list, JSON_OBJECT("output","silent","exclude_list", 
    CAST(@exc_list AS JSON)));
    
    -- CUSTOM OUTPUT
    SELECT log as 'Unsupported objects' FROM sys.heatwave_autopilot_report WHERE type="warn" 
    AND stage="VERIFICATION" and log like "%Unsupported%";
    SELECT Count(*) AS "Total Load commands Generated" 
    FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id;
    
    END //
    DELIMITER ;
    
    CALL auto_load_wrapper(); 
    

Use the Auto Parallel Load Command-Line Help

To view Auto Parallel Load command-line help, issue the following statement:

mysql> CALL sys.heatwave_load(JSON_ARRAY(),JSON_OBJECT("output","help"));

The command-line help provides usage documentation for the Auto Parallel Load utility.