MySQL HeatWave User Guide
Available as of MySQL 9.0.0, Lakehouse Incremental Load allows you to refresh the data in an external table.
This topic contains the following sections:
Load structured data using Lakehouse Auto Parallel Load or manually.
Lakehouse Incremental Load uses the existing engine_attribute
settings from the initial load. Therefore, when you refresh
data, you cannot include engine_attribute
settings in subsequent loads of data.
You can add or remove individual files in Object Storage if
they still match the defined file pattern or file prefix in
the engine_attribute
settings for the
initial load. If you want to change the files in the
engine_attribute
settings for subsequent
loads, see how to
Add or
Remove Files Using Selecting Load (supported as of
MySQL 9.1.2).
To run Lakehouse Incremental Load, load the data using Lakehouse Auto Parallel Load and set the Auto Parallel Load
refresh_external_tables
option to
true
.
If a subsequent call to Auto Parallel Load includes tables that are not yet loaded, then Auto Parallel Load loads them for the first time.
A call to Auto Parallel Load might contain both loaded and unloaded tables. Those that are unloaded are loaded, and those that are already loaded are refreshed.
The following example performs an initial automatic load of three files in an Object Storage folder, and then does a subsequent refresh of the data in the external table by loading an additional two files into the table. The example uses resource principals to load the data.
Upload
the files to load into Object Storage. This
example creates the Object Storage folder
data_files
and uploads the files
data_file_1.csv
,
data_file_2.csv
, and
data_file_3.csv
.
Create a session variable with the characteristics of
the input_list
to load the external
files. To review all syntax options, see
Auto
Parallel Load Syntax.
mysql> SET @input_list = '[{
"db_name": "data_tables",
"tables": [{
"table_name": "data_table_1",
"engine_attribute": {
"dialect": {"format": "csv", "has_header": true},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/"}]
}
}]
}]';
Where:
@input_list
is the name of the
session variable.
db_name
identifies the database
name to store the table:
data_tables
. The database is
automatically created if it does not exist.
table_name
sets the table name to
store the data: data_table_1
. The
table is automatically created if it does not exist.
engine_attribute
defines the
parameters of the external file.
format
defines the format of the
external file: csv
.
has_header
identifies a header in
the external file. Auto Parallel Load then infers the column
names from the first row in the file.
file
defines the uniform resource
identifiers (URI) to access the files. Replace these
values with your own. The
data_files/
prefix defines the
Object Storage folder storing the files to load.
If you are on MySQL 9.1.2 and earlier, you need to
update dialect
with the
field delimiter
and record
delimiter
parameters. As of MySQL 9.2.0,
MySQL HeatWave Lakehouse can automatically detect these values. See
Lakehouse
External Table Syntax to learn more.
Run the HEATWAVE_LOAD
command. If you
set the options to NULL
, the data
loads in normal mode by default.
mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), NULL);
Review the SCHEMA CREATION
,
TABLE LOAD
, and LOAD
SUMMARY
sections for any errors or warnings
during the load, and to confirm the external table
loaded successfully.
Optionally, use the new database and query the number of rows in the table to confirm data from all files loaded successfully.
mysql>USE data_tables;
mysql>SELECT COUNT(*) FROM data_table_1;
+----------+ | COUNT(*) | +----------+ | 1384 | +----------+
After doing the initial load of data, this example refreshes
the data in the table data_table_1
with
the loading of additional files into the table.
Upload
the new files to load into Object Storage. This
example uploads two additional files to the same
data_files
Object Storage folder:
data_file_4.csv
and
data_file_5.csv
.
Create a new session variable with the
input_list
to set up the incremental
load. The configuration must not have any
engine_attribute
parameters.
mysql> SET @input_list = '[{"db_name": "data_tables", "tables": [{"table_name": "data_table_1"}]}]';
Create an options
variable that
includes the options to load the data in
normal
mode and enable
refresh_external_tables
.
mysql> SET @options = JSON_OBJECT('mode', 'normal','refresh_external_tables', TRUE);
Run the HEATWAVE_LOAD
command with
the input_list
and
options
variables. Tables that are
identified to be refreshed are marked with a
+
symbol. If the files for the table
have not changed, then no changes are made to the table.
The estimates in the output do not cover the tables that
are refreshed.
mysql> CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), @options);
+------------------------------------------+
| INITIALIZING HEATWAVE AUTO PARALLEL LOAD |
+------------------------------------------+
| Version: 4.34 |
| |
| Load Mode: normal |
| Load Policy: disable_unsupported_columns |
| Output Mode: normal |
| |
+------------------------------------------+
+--------------------------------------------------------------------------------------------------------------------+
| LAKEHOUSE AUTO SCHEMA INFERENCE |
+--------------------------------------------------------------------------------------------------------------------+
| Verifying external lakehouse tables: 1 |
| |
| SCHEMA TABLE TABLE IS RAW NUM. OF ESTIMATED SUMMARY OF |
| NAME NAME CREATED FILE SIZE COLUMNS ROW COUNT ISSUES |
| ------ ----- -------- --------- ------- --------- ---------- |
| `data_tables` `data_table_1` YES + 40.06 KiB 8 1.38 K |
| |
| New schemas to be created: 0 |
| External lakehouse tables to be created: 0 |
| External lakehouse tables to be refreshed (marked with +): 1 |
| |
+--------------------------------------------------------------------------------------------------------------------+
+------------------------------------------------------------------------+
| OFFLOAD ANALYSIS |
+------------------------------------------------------------------------+
| Verifying input schemas: 1 |
| User excluded items: 0 |
| |
| SCHEMA OFFLOADABLE OFFLOADABLE SUMMARY OF |
| NAME TABLES COLUMNS ISSUES |
| ------ ----------- ----------- ---------- |
| `data_tables` 1 8 |
| |
| Total offloadable schemas: 1 |
| |
+------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------------------------------------------------+
| CAPACITY ESTIMATION |
+-----------------------------------------------------------------------------------------------------------------------------+
| Default encoding for string columns: VARLEN (unless specified in the schema) |
| Estimating memory footprint for 1 schema(s) |
| Estimates do not include 1 external lakehouse table(s) being refreshed. |
| |
| TOTAL ESTIMATED ESTIMATED TOTAL DICTIONARY VARLEN ESTIMATED |
| SCHEMA OFFLOADABLE HEATWAVE NODE MYSQL NODE STRING ENCODED ENCODED LOAD |
| NAME TABLES FOOTPRINT FOOTPRINT COLUMNS COLUMNS COLUMNS TIME |
| ------ ----------- --------- --------- ------- ---------- ------- --------- |
| `data_tables` 1 0 bytes 0 bytes 3 0 3 0 ps |
| |
| Sufficient MySQL host memory available to load all tables. |
| Sufficient HeatWave cluster memory available to load all tables. |
| |
+-----------------------------------------------------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------------------------------------------------------------+
| EXECUTING LOAD SCRIPT |
+---------------------------------------------------------------------------------------------------------------------------------------+
| HeatWave Load script generated |
| Retrieve load script containing 3 generated DDL command(s) using the query below: |
| Deprecation Notice: "heatwave_load_report" will be deprecated, please switch to "heatwave_autopilot_report" |
| SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id; |
| |
| Adjusting load parallelism dynamically per internal/external table. |
| Using current parallelism of 32 thread(s) as maximum for internal tables. |
| |
| Warning: Executing the generated script may alter column definitions and secondary engine flags in the schema |
| |
| Using SQL_MODE: ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION |
| |
| Proceeding to load 1 table(s) into HeatWave. |
| |
| Applying changes will take approximately 26.00 ms |
| |
+---------------------------------------------------------------------------------------------------------------------------------------+
+----------------------------------------------+
| TABLE LOAD |
+----------------------------------------------+
| TABLE (1 of 1): `data_tables`.`data_table_1` |
| Commands executed successfully: 3 of 3 |
| Warnings encountered: 1 |
| Table load succeeded! |
| Total columns loaded: 8 |
| Elapsed time: 3.66 s |
| |
+----------------------------------------------+
+----------------------------------------------------------------------------------+
| LOAD SUMMARY |
+----------------------------------------------------------------------------------+
| |
| SCHEMA TABLES TABLES COLUMNS LOAD |
| NAME LOADED FAILED LOADED DURATION |
| ------ ------ ------ ------- -------- |
| `data_tables` 1 0 8 3.66 s |
| |
| Total errors encountered: 0 |
| Total warnings encountered: 2 |
| Retrieve the associated logs from the report table using the query below: |
| SELECT log FROM sys.heatwave_autopilot_report WHERE type IN ('error', 'warn'); |
| |
+----------------------------------------------------------------------------------+
Optionally, query the number of rows in the table again to confirm the data from the two new files successfully loaded.
mysql> SELECT COUNT(*) FROM data_table_1;
+----------+
| COUNT(*) |
+----------+
| 4130 |
+----------+
Learn more about the following: