MySQL HeatWave User Guide
MySQL includes a bulk load extension to the
LOAD DATA
statement that is only
available in MySQL HeatWave on AWS. It can do the following:
Optimize the loading of data sorted by primary key.
Optimize the loading of unsorted data.
Optimize the loading of data from an object store.
Optimize the loading of data from a series of files.
Load a MySQL Shell dump file (As of MySQL 8.4.0).
Load ZSTD compressed CSV files (As of MySQL 8.4.0).
Monitor bulk load progress with the Performance Schema (As of MySQL 8.4.0).
Large data support (As of MySQL 9.0.0)
Use a second session to monitor bulk load progress:
If the data is sorted, there is a single stage:
loading
.
If the data is unsorted, there are two stages:
sorting
and loading
.
See: Section 11.2.1.2, “Bulk Ingest Data to MySQL Server Limitations”.
LOAD DATA
with
ALGORITHM=BULK
supports the following data
types.
Table 4.1 Supported Data Types for Bulk Ingest
Data Type | Available MySQL version | Comments |
---|---|---|
INT |
All | |
SMALLINT |
All | |
TINYINT |
All | |
BIGINT |
All | |
CHAR |
All | |
BINARY |
All | |
VARCHAR |
All | Before MySQL 9.0.0, the record must fit in the page as there is no large data support. |
VARBINARY |
All | Before MySQL 9.0.0, the record must fit in the page as there is no large data support. |
NUMERIC |
All | |
DECIMAL |
All | |
UNSIGNED NUMERIC |
All | |
UNSIGNED DECIMAL |
All | |
DOUBLE |
All | |
FLOAT |
All | |
DATE |
All | |
DATETIME |
All | |
BIT |
As of MySQL 9.0.0 | |
ENUM |
As of MySQL 9.0.0 | |
JSON |
As of MySQL 9.0.0 | |
SET |
As of MySQL 9.0.0 | |
TIMESTAMP |
As of MySQL 9.0.0 | |
YEAR |
As of MySQL 9.0.0 | |
TINYBLOB |
As of MySQL 9.0.0 | |
BLOB |
As of MySQL 9.0.0 | |
MEDIUMBLOB |
As of MySQL 9.0.0 | |
LONGBLOB |
As of MySQL 9.0.0 | |
TINYTEXT |
As of MySQL 9.0.0 | |
TEXT |
As of MySQL 9.0.0 | |
MEDIUMTEXT |
As of MySQL 9.0.0 | |
LONGTEXT |
As of MySQL 9.0.0 | |
GEOMETRY |
As of MySQL 9.0.0 | |
GEOMETRYCOLLECTION |
As of MySQL 9.0.0 | |
POINT |
As of MySQL 9.0.0 | |
MULTIPOINT |
As of MySQL 9.0.0 | |
LINESTRING |
As of MySQL 9.0.0 | |
MULTILINESTRING |
As of MySQL 9.0.0 | |
POLYGON |
As of MySQL 9.0.0 | |
MULTIPOLYGON |
As of MySQL 9.0.0 | |
VECTOR |
As of MySQL 9.3.0 |
For the data types that MySQL HeatWave supports, see: Section 4.2.1, “Supported Data Types for MySQL HeatWave”
See: Section 11.2.1.2, “Bulk Ingest Data to MySQL Server Limitations”.
mysql>LOAD DATA [LOW_PRIORITY | CONCURRENT] [FROM] {INFILE | URL | URI | S3} {'
file_prefix
' | 'options
'} [COUNTN
] [IN PRIMARY KEY ORDER] INTO TABLEtbl_name
[CHARACTER SETcharset_name
] [COMPRESSION = {'ZSTD'}] [{FIELDS | COLUMNS} [TERMINATED BY 'string
'] [[OPTIONALLY] ENCLOSED BY 'char
'] [ESCAPED BY 'char
'] ] [LINES [TERMINATED BY 'string
'] ] [IGNOREnumber
{LINES | ROWS}] [PARALLEL =number
] [MEMORY =M
] [ALGORITHM = BULK]options
: { JSON_OBJECT("key
","value
"[,"key
","value
"] ...)"key","value"
: { "url-prefix","prefix
" ["url-sequence-start",0
] ["url-suffix","suffix
"] ["url-prefix-last-append","@
"] ["is-dryrun",{true|false}] } }
The additional LOAD DATA
clauses are:
FROM
: Makes the statement more
readable.
URL
: A URL accessible with a HTTP GET
request.
URI
: Available as of MySQL 9.4.0. The
file location of the URI.
S3
: The AWS S3 file location.
This requires the user privilege
LOAD_FROM_S3
.
See: Section 11.2.1.2, “Bulk Ingest Data to MySQL Server Limitations”.
COUNT
: The number of files in a series
of files.
For COUNT 5
and
file_prefix
set to
data.csv.
, the five files would be:
data.csv.1
,
data.csv.2
,
data.csv.3
,
data.csv.4
, and
data.csv.5
.
IN PRIMARY KEY ORDER
: Use when the data
is already sorted. The values should be in ascending order
within the file.
For a file series, the primary keys in each file must be disjoint and in ascending order from one file to the next.
PARALLEL
: The number of concurrent
threads to use. A typical value might be 16, 32 or 48. The
default value is 16.
PARALLEL
does not require
CONCURRENT
.
MEMORY
: The amount of memory to use. A
typical value might be 512M or 4G. The default value is
1G.
ALGORITHM
: Set to
BULK
for bulk load. The file format is
CSV.
COMPRESSION
: The file compression
algorithm. Bulk load supports the ZSTD algorithm.
options
is a JSON object literal that
includes:
url-prefix
: The common URL prefix
for the files to load.
url-sequence-start
: The sequence
number for the first file.
The default value is 1, and the minimum value is 0. The value cannot be a negative number. The value can be a string or a number, for example, "134", or "default".
url-suffix
: The file suffix.
url-prefix-last-append
: The string
to append to the prefix of the last file.
This supports MySQL Shell dump files.
is-dryrun
: Set to
true
to run basic checks and report
if bulk load is possible on the given table. The
default value is false
.
To enable is-dryrun
, use any of the
following values: true
,
"true"
, "1"
,
"on"
or 1
.
To disable is-dryrun
, use any of
the following values: false
,
"false"
, "0"
,
"off"
or 0
.
LOAD DATA
with
ALGORITHM=BULK
does not support these
clauses:
LOAD DATA [LOCAL] [REPLACE | IGNORE] [PARTITION (partition_name
[,partition_name
] ...)] ] [LINES [STARTING BY 'string
'] ] [(col_name_or_user_var
[,col_name_or_user_var
] ...)] [SETcol_name
={expr
| DEFAULT} [,col_name
={expr
| DEFAULT}] ...]
An example that loads unsorted data from AWS S3 with 48 concurrent threads and 4G of memory:
mysql>GRANT LOAD_FROM_S3 ON *.* TO load_user@localhost;
mysql>LOAD DATA FROM S3 's3-us-east-1://innodb-bulkload-dev-1/lineitem.tbl' INTO TABLE lineitem FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' PARALLEL = 48 MEMORY = 4G ALGORITHM=BULK;
An example that loads eight files of sorted data from AWS
S3. The file_prefix
ends with a period.
The files are lineitem.tbl.1
,
lineitem.tbl.2
, ...
lineitem.tbl.8
:
mysql>GRANT LOAD_FROM_S3 ON *.* TO load_user@localhost;
mysql>LOAD DATA FROM S3 's3-us-east-1://innodb-bulkload-dev-1/lineitem.tbl.' COUNT 8 IN PRIMARY KEY ORDER INTO TABLE lineitem FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' ALGORITHM=BULK;
An example that performs a dry run on a sequence of MySQL Shell dump files compressed with the ZSTD algorithm:
mysql>GRANT LOAD_FROM_URL ON *.* TO load_user@localhost;
mysql>LOAD DATA FROM URL '{"url-prefix","https://example.com/bucket/test@lineitem@","url-sequence-start",0,"url-suffix",".tsv.zst","url-prefix-last-append","@","is-dryrun",true}' COUNT 20 INTO TABLE lineitem CHARACTER SET ???? COMPRESSION = {'ZSTD'} FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 20000 LINES ALGORITHM=BULK;
An example that loads data with the URI keyword (supported as of MySQL 9.4.0):
mysql>GRANT LOAD_FROM_URL ON *.* TO load_user@localhost;
mysql>LOAD DATA FROM URI 'https://data_files.com/data_files_1.tbl' INTO TABLE lineitem FIELDS TERMINATED BY "|" OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' ALGORITHM=BULK;
An example that monitors bulk load progress in a second session.
Review the list of stages with the following query:
mysql> SELECT NAME, ENABLED, TIMED FROM performance_schema.setup_instruments
WHERE ENABLED='YES' AND NAME LIKE "stage/bulk_load%";
Enable the events_stages_current
with the following query:
mysql> UPDATE performance_schema.setup_consumers
SET ENABLED = 'YES' WHERE NAME LIKE 'events_stages_current';
Use one session to run bulk load, and monitor progress in a second session:
mysql> SELECT thread_id, event_id, event_name, WORK_ESTIMATED, WORK_COMPLETED
FROM performance_schema.events_stages_current;
--------------
SELECT thread_id, event_id, event_name, WORK_ESTIMATED, WORK_COMPLETED FROM performance_schema.events_stages_current
--------------
+-----------+----------+----------------------------------+----------------+----------------+
| thread_id | event_id | event_name | WORK_ESTIMATED | WORK_COMPLETED |
+-----------+----------+----------------------------------+----------------+----------------+
| 49 | 5 | stage/bulk_load_unsorted/sorting | 1207551343 | 583008145 |
+-----------+----------+----------------------------------+----------------+----------------+