MySQL HeatWave User Guide
Do not use the external_tables option as
of MySQL 8.4.0. This option will be deprecated in a future
release. Use the input_list JSON array
instead. See
HEATWAVE_LOAD
and full descriptions for input_list.
The Auto Parallel Load external_tables option is a JSON
array that includes one or more db_object
items. It allows you to set the details for schemas and tables
to load.
While it is possible to define the entire load command on a
single line, for readability the configuration is divided
into option definitions using
SET.
Define the following option sets:
Define the name of the database which will store the data.
mysql> SET @db_list = '["tpch"]';
This assumes that Lakehouse Auto Parallel Load will analyze the data, infer the table structure, and create the database and all tables.
Define the db_object parameters that
will load data from three external sources with Avro,
CSV and Parquet format files:
mysql> SET @ext_tables = '[{
"db_name": "tpch",
"tables": [{
"table_name": "supplier_pq",
"dialect": {"format": "parquet"},
"file": [{
"prefix": "src_data/parquet/tpch/supplier/",
"bucket": "myBucket",
"namespace": "myNamespace",
"region": "myRegion"}]
},
{
"table_name": "nation_csv",
"dialect": {
"format": "csv",
"field_delimiter": "|",
"record_delimiter": "|\\n",
"has_header": true
},
"file": [{
"par": "https://objectstorage.../nation.csv"}]
},
{
"table_name": "region_avro",
"dialect": {"format": "avro"},
"file": [{
"par": "https://objectstorage.../region.avro"}]
}]
}]';
Define the @options variable with
SET:
mysql> SET @options = JSON_OBJECT('external_tables', CAST(@ext_tables AS JSON));
Setting mode to
dryrun generates the load script but
does not create or load the external tables. For
example:
mysql> SET @options = JSON_OBJECT('mode', 'dryrun', 'external_tables', CAST(@ext_tables AS JSON));
To implement the changes as part of the load command,
set mode to
normal. This is the default, and it
is not necessary to add it to the command.
Exclude columns from the loading process with the
exclude_list option. See
HEATWAVE_LOAD.
Lakehouse Auto Parallel Load infers the column names for Avro and Parquet files,
and also for CSV files if has_header is
true. For these situations, use the
column names with the exclude_list
option.
If the table already exists, but has no data, use the
existing column names with the
exclude_list option.
For CSV files if has_header is
false, use the generated schema names
with the exclude_list option. These are:
col_1, col_2,
col_3 ...