MySQL HeatWave User Guide
Do not use the external_tables
option as
of MySQL 8.4.0. This option will be deprecated in a future
release. Use the input_list
JSON array
instead. Refer to the
Auto
Parallel Load Syntax and full descriptions for
input_list
.
The Auto Parallel Load external_tables
option is a JSON
array that includes one or more db_object
items. It allows you to set the details for schemas and tables
to load.
While it is possible to define the entire load command on a
single line, for readability the configuration is divided
into option definitions using
SET
.
Define the following option sets:
Define the name of the database which will store the data.
mysql> SET @db_list = '["tpch"]';
This assumes that Lakehouse Auto Parallel Load will analyze the data, infer the table structure, and create the database and all tables.
Define the db_object
parameters that
will load data from three external sources with Avro,
CSV and Parquet format files:
mysql> SET @ext_tables = '[{
"db_name": "tpch",
"tables": [{
"table_name": "supplier_pq",
"dialect": {"format": "parquet"},
"file": [{
"prefix": "src_data/parquet/tpch/supplier/",
"bucket": "myBucket",
"namespace": "myNamespace",
"region": "myRegion"}]
},
{
"table_name": "nation_csv",
"dialect": {
"format": "csv",
"field_delimiter": "|",
"record_delimiter": "|\\n",
"has_header": true
},
"file": [{
"par": "https://objectstorage.../nation.csv"}]
},
{
"table_name": "region_avro",
"dialect": {"format": "avro"},
"file": [{
"par": "https://objectstorage.../region.avro"}]
}]
}]';
Define the @options
variable with
SET
:
mysql> SET @options = JSON_OBJECT('external_tables', CAST(@ext_tables AS JSON));
Setting mode
to
dryrun
generates the load script but
does not create or load the external tables. For
example:
mysql> SET @options = JSON_OBJECT('mode', 'dryrun', 'external_tables', CAST(@ext_tables AS JSON));
To implement the changes as part of the load command,
set mode
to
normal
. This is the default, and it
is not necessary to add it to the command.
Exclude columns from the loading process with the
exclude_list
option. See
Auto Parallel Load
Syntax.
Lakehouse Auto Parallel Load infers the column names for Avro and Parquet files,
and also for CSV files if has_header
is
true
. For these situations, use the
column names with the exclude_list
option.
If the table already exists, but has no data, use the
existing column names with the
exclude_list
option.
For CSV files if has_header
is
false
, use the generated schema names
with the exclude_list
option. These are:
col_1
, col_2
,
col_3
...