MySQL HeatWave User Guide
MySQL 9.0.0 adds support for Lakehouse Incremental Load with the
refresh_external_tables
option, see:
Section 4.3.11.1, “Refresh Data Using Incremental Load”.
mysql> CALL sys.HEATWAVE_LOAD(input_list
,[options
]);
input_list
: {
JSON_ARRAY(input
[,input
] ...)
}
options
: {
JSON_OBJECT("key
","value
"[,"key
","value
"] ...)
"key","value"
: {
["mode",{"normal"|"dryrun"|"validation"}]
["output",{"normal"|"compact"|"silent"|"help"}]
["sql_mode","sql_mode
"]
["policy",{"disable_unsupported_columns"|"not_disable_unsupported_columns"}]
["set_load_parallelism",{true|false}]
["auto_enc",JSON_OBJECT("mode",{"off"|"check"})]
["refresh_external_tables",{true|false}]
}
}
input
: {
'db_name
' | db_object
}
db_object
: {
JSON_OBJECT("key
","value
"[,"key
","value
"] ...)
"key","value"
: {
"db_name": "db_name
",
["tables": JSON_ARRAY(table
[, table
] ...)]
["exclude_tables": JSON_ARRAY(table
[, table
] ...)]
}
}
table
: {
'table_name
' | table_object
}
table_object
: {
JSON_OBJECT("key
","value
"[,"key
","value
"] ...)
"key","value"
: {
"table_name": "table_name
",
['engine_attribute': engine_attribute_object
],
['columns': JSON_ARRAY('column_name
' [, 'column_name
', ...])],
['exclude_columns': JSON_ARRAY('column_name
' [, 'column_name
', ...])]
}
}
engine_attribute_object
: {
JSON_OBJECT("key
","value
"[,"key
","value
"] ...)
"key","value"
: {
"sampling": true|false,
"dialect": {dialect_section
},
"file": JSON_ARRAY(file_section
[, file_section
]...)
}
}
MySQL 8.4.0 adds support for the following:
An input_list
JSON array replaces the
db_list
JSON array. This adds an include
list to exactly specify the tables and columns to load for a
set of queries. It is no longer necessary to include a
complete schema, and exclude unnecessary tables and columns.
input_list
is backwards compatible with
db_list
.
A validation
mode for external files.
mysql> CALL sys.HEATWAVE_LOAD(input_list
,[options
]);
input_list
: {
JSON_ARRAY(input
[,input
] ...)
}
options
: {
JSON_OBJECT("key
","value
"[,"key
","value
"] ...)
"key","value"
: {
["mode",{"normal"|"dryrun"|"validation"}]
["output",{"normal"|"compact"|"silent"|"help"}]
["sql_mode","sql_mode
"]
["policy",{"disable_unsupported_columns"|"not_disable_unsupported_columns"}]
["set_load_parallelism",{true|false}]
["auto_enc",JSON_OBJECT("mode",{"off"|"check"})]
}
}
input
: {
'db_name
' | db_object
}
db_object
: {
JSON_OBJECT("key
","value
"[,"key
","value
"] ...)
"key","value"
: {
"db_name": "db_name
",
["tables": JSON_ARRAY(table
[, table
] ...)]
["exclude_tables": JSON_ARRAY(table
[, table
] ...)]
}
}
table
: {
'table_name
' | table_object
}
table_object
: {
JSON_OBJECT("key
","value
"[,"key
","value
"] ...)
"key","value"
: {
"table_name": "table_name
",
['engine_attribute': engine_attribute_object
],
['columns': JSON_ARRAY('column_name
' [, 'column_name
', ...])],
['exclude_columns': JSON_ARRAY('column_name
' [, 'column_name
', ...])]
}
}
engine_attribute_object
: {
JSON_OBJECT("key
","value
"[,"key
","value
"] ...)
"key","value"
: {
"sampling": true|false,
"dialect": {dialect_section
},
"file": JSON_ARRAY(file_section
[, file_section
]...)
}
}
Use input_list
to define what to load.
input_list
is a JSON array and requires one
or more valid input
which can be either a
valid schema name or a db_object
. An empty
array is permitted to view the Auto Parallel Load command-line help, see
Use the Auto Parallel Load Command-Line Help. This is
backwards compatible with db_list
.
Use key-value pairs in JSON
format to specify
parameters. MySQL HeatWave uses the default setting if there is no
option setting. Use NULL
to specify no
arguments.
For syntax examples, see Quickstart: Load Data Using Lakehouse Auto Parallel Load Examples.
Auto Parallel Load options
is a JSON object literal that
includes:
mode
: Defines the Auto Parallel Load operational mode.
Permitted values are:
normal
: The default. Generates and
executes the load script.
dryrun
: Generates a load script only.
Auto Parallel Load executes in dryrun
mode
automatically if the MySQL HeatWave Cluster is not active.
validation
: Only use with
Lakehouse. validation
performs the
same checks as dryrun
and also
validates external files before loading. It follows all
the options
and the load
configuration, for example column information,
sql_mode
,
is_strict_mode
and
allow_missing_files
, but does not
load any tables. It uses schema inference and might
modify the schema, see:
About Lakehouse Auto Parallel Load Schema Inference.
validation
is faster than a full
load, particularly for large tables. The memory
requirement is similar to running a full load.
validation
requires created tables.
output
: Defines how Auto Parallel Load produces
output. Permitted values are:
normal
: The default. Produces
summarized output and sends it to
stdout
and to the
heatwave_autopilot_report
table. See
Section 5.8.8, “Autopilot Report Table”.
silent
: Sends output to the
heatwave_autopilot_report
table only.
See Section 5.8.8, “Autopilot Report Table”. The
silent
output type is useful if
human-readable output is not required; when the output
is consumed by a script, for example. For an example of
a stored procedure with an Auto Parallel Load call that uses the
silent
output type, see
Quickstart: Load Data Using Lakehouse Auto Parallel Load Examples.
compact
: Produces compact output.
help
: Displays Auto Parallel Load command-line
help. See
Use the Auto Parallel Load Command-Line Help.
sql_mode
: Defines the SQL mode used while
loading tables. Auto Parallel Load does not support the MySQL global or
session sql_mode
variable.
To run Auto Parallel Load with a non-oci-default SQL mode configuration,
specify the configuration using the Auto Parallel Load
sql_mode
option as a string value. If no
SQL modes are specified, the default OCI SQL mode
configuration is used.
For information about SQL modes, see Server SQL Modes.
policy
: Defines the policy for handling
of tables containing columns with unsupported data types.
Permitted values are:
disable_unsupported_columns
: The
default. Disable columns with unsupported data types and
include the table in the load script. Columns that are
explicitly pre-defined as NOT
SECONDARY
are ignored (they are neither
disabled or enabled).
Auto Parallel Load does not generate statements to disable columns
that are explicitly defined as NOT
SECONDARY
.
not_disable_unsupported_columns
:
Exclude the table from the load script if the table
contains a column with an unsupported data type.
A column with an unsupported data type that is
explicitly defined as a NOT SECONDARY
column does not cause the table to be excluded. For
information about defining columns as NOT
SECONDARY
, see
Section 4.2.6.1, “Exclude Table Columns”.
set_load_parallelism
: Enabled by default.
Optimizes load parallelism based on machine-learning models
by optimizing the
innodb_parallel_read_threads
variable setting before loading each table.
auto_enc
: Checks if there is enough
memory for string column encoding. Settings include:
mode
: Defines the
auto_enc
operational mode. Permitted
values are:
off
: Disables the
auto_enc
option. No memory checks
are performed.
check
: The default. Checks if
there is enough memory on the MySQL node for
dictionary-encoded columns and if there is enough
root heap memory for variable-length column encoding
overhead. Dictionary-encoded columns require memory
on the MySQL node for dictionaries. For each loaded
table, 64KB of memory, the default heap segment
size, must be allocated from the root heap for
variable-length column encoding overhead. If there
is not enough memory, Auto Parallel Load executes in
dryrun
mode and prints a warning
about insufficient memory. The
auto_enc
option runs
check
mode if it is not specified
explicitly and set to off
. For
more information, see
Memory Estimation for String Column Encoding.
refresh_external_tables
: Only use with
Lakehouse. Set to true
to refresh
external tables. See:
Section 4.3.11.1, “Refresh Data Using Incremental Load”.
Set to false
, the default setting, to
only load new tables, unloaded tables, and not refresh
external tables.
This option only works with mode
set to
normal
. All other mode
settings ignore this option.
The db_object
is a JSON object literal that
includes:
db_name
: The name of the database to
load.
Use one or other of the following, but not both. The use of both parameters will throw an error.
tables
: An optional JSON array of
table
to include in the load.
exclude_tables
: As of MySQL 8.4.0, an
optional JSON array of table
to
exclude from the load.
table
: Either a valid table name or a
table_object
.
As of MySQL 8.4.0, table_object
is a JSON
object literal that includes:
table_name
: The name of the table to
load.
engine_attribute
: A JSON object
literal that includes:
sampling
: Only use with
Lakehouse. If set to true
, the
default setting, Lakehouse Auto Parallel Load samples the data to infer
the schema and collect statistics.
If set to false
, Lakehouse Auto Parallel Load performs
a full scan to infer the schema and collect
statistics. Depending on the size of the data, this
can take a long time.
Auto Parallel Load uses the inferred schema to generate
CREATE TABLE
statements. The statistics are used to estimate
storage requirements and load times. See:
About Lakehouse Auto Parallel Load Schema Inference.
For dialect
and
file
, see:
Section 4.3.6, “Lakehouse External Table Syntax”.
Use one or other of the following, but not both. The use of both parameters will throw an error.
columns
: An optional JSON array
of column_name
to include in the
load.
exclude_columns
: An optional JSON
array of column_name
to exclude
from the load.
Before MySQL 8.4.0, the following syntax will be deprecated in a future release.
mysql> CALL sys.HEATWAVE_LOAD(db_list
,[options
]);
db_list
: {
JSON_ARRAY(["schema_name
","schema_name
"] ...)
}
options
: {
JSON_OBJECT("key
","value
"[,"key
","value
"] ...)
"key","value"
: {
["mode",{"normal"|"dryrun"}]
["output",{"normal"|"compact"|"silent"|"help"}]
["sql_mode","sql_mode
"]
["policy",{"disable_unsupported_columns"|"not_disable_unsupported_columns"}]
["exclude_list",JSON_ARRAY(schema_name_1
, schema_name_2.table_name_1
, schema_name_3.table_name_2.column_name_1
, ...)]
["set_load_parallelism",{true|false}]
["auto_enc",JSON_OBJECT("mode",{"off"|"check"})]
["external_tables",JSON_ARRAY(db_object
[, db_object
]... )]
}
}
db_object
: {
JSON_OBJECT("key
","value
"[,"key
","value
"] ...)
"key","value"
: {
"db_name": "name",
"tables": JSON_ARRAY(table
[, table
] ...)
}
}
table
: {
JSON_OBJECT("key
","value
"[,"key
","value
"] ...)
"key","value"
: {
"table_name": "name
",
"sampling": true|false,
"dialect": {dialect_section
},
"file": JSON_ARRAY(file_section
[, file_section
]...)
}
}
db_list
specifies the schemas to load. The
list is a JSON array and requires one or more valid schema
names. An empty array is permitted to view the Auto Parallel Load
command-line help.
Auto Parallel Load options
is a JSON object literal. The
following options will be deprecated in a future release:
exclude_list
: Defines a list of
schemas, tables, and columns to exclude from the load
script. Names must be fully qualified without backticks.
Use db_object
with
tables
,
exclude_tables
,
columns
or
exclude_columns
instead.
Auto Parallel Load automatically excludes database objects that cannot
be offloaded, according to the default
policy
setting. These objects need not
be specified explicitly in the exclude list. System
schemas, non-InnoDB
tables, tables that
are already loaded in MySQL HeatWave, and columns explicitly
defined as NOT SECONDARY
are
automatically excluded.
external_tables
: non-InnoDB tables
which do not store any data, but refer to data stored
externally. For the external_tables
syntax, see: Section 4.3.8.5, “Use Lakehouse Auto Parallel Load with external_tables Option”.
Use db_object
with
tables
or
exclude_tables
instead.
Load a single schema with default options.
mysql> CALL sys.heatwave_load(JSON_ARRAY("tpch"),NULL);
Load multiple schemas with default options.
mysql> CALL sys.heatwave_load(JSON_ARRAY("tpch","airportdb","employees","sakila"),NULL);
Load multiple schemas with the
not_disable_unsupported_columns
policy,
which causes tables with unsupported columns to be
excluded from the load operation. Unsupported columns are
those with unsupported data types.
mysql> CALL sys.heatwave_load(JSON_ARRAY("tpch","airportdb","employees","sakila"),
JSON_OBJECT("policy","not_disable_unsupported_columns"));
Load multiple schemas, excluding specified tables and a particular column:
mysql> CALL sys.heatwave_load(JSON_ARRAY("tpch","airportdb"),
JSON_OBJECT("exclude_list",JSON_ARRAY("tpch.orders","airportdb.employee.salary")));
Load tables that begin with an “hw” prefix
from a schema named schema_customer_1
.
mysql>SET @exc_list = (SELECT JSON_OBJECT('exclude_list', JSON_ARRAYAGG(CONCAT(table_schema,'.',table_name))) FROM information_schema.tables WHERE table_schema = 'schema_customer_1' AND table_name NOT LIKE 'hw%');
mysql>CALL sys.heatwave_load(JSON_ARRAY('schema_customer_1'), @exc_list);
Load all schemas with tables that start with an “hw” prefix.
mysql>SET @db_list = (SELECT json_arrayagg(schema_name) FROM information_schema.schemata);
mysql>SET @exc_list = (SELECT JSON_OBJECT('exclude_list', JSON_ARRAYAGG(CONCAT(table_schema,'.',table_name))) FROM information_schema.tables WHERE table_schema NOT IN ('mysql','information_schema', 'performance_schema','sys') AND table_name NOT LIKE 'hw%');
mysql>CALL sys.heatwave_load(@db_list, @exc_list);
You can check db_list
and
exc_list
using SELECT
JSON_PRETTY(@db_list);
and SELECT
JSON_PRETTY(@exc_list);
Call Auto Parallel Load from a stored procedure:
DROP PROCEDURE IF EXISTS auto_load_wrapper; DELIMITER // CREATE PROCEDURE auto_load_wrapper() BEGIN -- AUTOMATED INPUT SET @db_list = (SELECT JSON_ARRAYAGG(schema_name) FROM information_schema.schemata); SET @exc_list = (SELECT JSON_ARRAYAGG(CONCAT(table_schema,'.',table_name)) FROM information_schema.tables WHERE table_schema = "db0"); CALL sys.heatwave_load(@db_list, JSON_OBJECT("output","silent","exclude_list", CAST(@exc_list AS JSON))); -- CUSTOM OUTPUT SELECT log as 'Unsupported objects' FROM sys.heatwave_autopilot_report WHERE type="warn" AND stage="VERIFICATION" and log like "%Unsupported%"; SELECT Count(*) AS "Total Load commands Generated" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id; END // DELIMITER ; CALL auto_load_wrapper();