MySQL HeatWave User Guide
If you choose not to Load Structured Data Using Lakehouse Auto Parallel Load, you can load external data manually.
This topic contains the following sections:
Prepare to load data by reviewing the following:
Review Lakehouse External Table Syntax for loading data manually.
Depending on the version of MySQL you are using, use the
appropriate CREATE TABLE
statement.
As of MySQL 9.4.0, you can use the CREATE
EXTERNAL TABLE
statement, which automatically
sets ENGINE
to
lakehouse
, and
SECONDARY_ENGINE
to
rapid
.
In versions earlier than MySQL 9.4.0, you must use the
CREATE TABLE
statement, and manually
set ENGINE
to
lakehouse
, and
SECONDARY_ENGINE
to
rapid
.
For example, you can use the following command in MySQL 9.4.0:
mysql>CREATE EXTERNAL TABLE table_1(col_1 int, col_2 int, col_3 int) ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
mysql>ALTER TABLE table_1 SECONDARY_LOAD;
For earlier versions, you must use the following command:
mysql>CREATE TABLE table_1(col_1 int, col_2 int, col_3 int) ENGINE=lakehouse SECONDARY_ENGINE = rapid ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
mysql>ALTER TABLE table_1 SECONDARY_LOAD;
As of MySQL 9.4.0, you can set options when creating external tables using SQL syntax. For earlier versions, you must set options using JSON syntax.
The following example sets options usings SQL syntax:
mysql>CREATE EXTERNAL TABLE table_1(col_1 int, col_2 int, col_3 int) FILE_FORMAT = (FORMAT csv) FILES = (URI = 'oci://mybucket@mynamespace/data_files/data_file_1.csv');
mysql>ALTER TABLE table_1 SECONDARY_LOAD;
The following example sets options using JSON syntax:
mysql>CREATE EXTERNAL TABLE table_1(col_1 int, col_2 int, col_3 int) ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"}, "file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
mysql>ALTER TABLE table_1 SECONDARY_LOAD;
To review external table options in JSON syntax, see Lakehouse External Table JSON Syntax.
To review external table options in SQL syntax, see Lakehouse External Table SQL Syntax.
To demonstrate how to load data manually, the following example loads a single file and selects to use a pre-authenticated request (PAR).
The CSV file in this example is from
Bank
Marketing. To use this file, visit
Bank
Marketing and download the
bank+marketing.zip
file. Unzip the file,
and then unzip the bank.zip
file. Refer
to the bank.csv
file.
To load external data manually:
Prepare the files to load in the proper format. See Supported File Formats.
Upload the files to load into Object Storage. See Uploading an Object Storage Object to a Bucket in Oracle Cloud Infrastructure Documentation.
Select the method to load the files: PAR, resource principals, or uniform resource identifier (URI). To learn more about each method, see the following:
In the terminal window, create and use the database to store the table.
mysql>CREATE DATABASE bank_data;
mysql>USE DATABASE bank_data;
Set up the
CREATE
TABLE
statement and the
ENGINE_ATTRIBUTE
options to specify
the parameters needed to process the external files. As
of MySQL 9.4.0, you can use the CREATE EXTERNAL
TABLE
statement. In versions earlier than
9.4.0, you must use the CREATE TABLE
statement and set ENGINE
to
lakehouse
, and
SECONDARY_ENGINE
to
rapid
. See
CREATE TABLE Statement.
Ensure that the table has the correct data type for each
column. For this example, columns are defined according
to the data in the bank.csv
file. See
the following to learn more:
As of MySQL 9.4.0, you can use SQL syntax to set options for external tables. For earlier versions, you must set options using JSON syntax. To learn more, see Lakehouse External Table JSON Syntax and Lakehouse External Table SQL Syntax.
The following example uses JSON syntax:
mysql> CREATE EXTERNAL TABLE bank_marketing(
age int,
job varchar(255),
marital varchar(255),
education varchar(255),
default1 varchar(255),
balance float,
housing varchar(255),
loan varchar(255),
contact varchar(255),
day int,
month varchar(255),
duration float,
campaign int,
pdays float,
previous float,
poutcome varchar(255),
y varchar(255)
)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "has_header": true},
"file": [{ "par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/.../n/tenant_1/b/bucket_1/o/bank.csv"}]}';
Where:
The CREATE EXTERNAL TABLE
statement creates the table
bank_marketing
and automatically
sets ENGINE
to
lakehouse
, and
SECONDARY_ENGINE
to
rapid
.
Each column for the table is defined according to
the bank.csv
file.
format
defines the format of the
external file: csv
.
has_header
identifies a header in
the external file.
par
sets the pre-authenticated
request link to access the file. Replace the PAR in
the example with your own.
The following example uses SQL syntax:
mysql> CREATE EXTERNAL TABLE bank_marketing(
age int,
job varchar(255),
marital varchar(255),
education varchar(255),
default1 varchar(255),
balance float,
housing varchar(255),
loan varchar(255),
contact varchar(255),
day int,
month varchar(255),
duration float,
campaign int,
pdays float,
previous float,
poutcome varchar(255),
y varchar(255)
)
FILE_FORMAT = (FORMAT csv HEADER ON)
FILES = (URL = 'https://objectstorage.us-ashburn-1.oraclecloud.com/p/.../n/tenant_1/b/bucket_1/o/bank.csv');
If you are on MySQL 9.1.2 and earlier, you need to
update dialect
with the
field delimiter
and record
delimiter
parameters. As of MySQL 9.2.0,
MySQL HeatWave Lakehouse can automatically detect these values. See
Lakehouse
External Table Syntax to learn more.
Use the ALTER TABLE
and
SECONDARY_LOAD
commands to load the
data and create the external table.
mysql> ALTER TABLE bank_marketing SECONDARY_LOAD;
Warning (code 3877): Command executed during preprocessing: 'ALTER TABLE `bank_data`.`bank_marketing`
ENGINE_ATTRIBUTE='{"file": [{"par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/.../n/tenant_1/b/bucket_1/o/bank.csv"}],
"dialect": {"format": "csv", "has_header": true, "field_delimiter": ";", "record_delimiter": "\\n"}}''.
Review the message to confirm the external table is
successfully created. For MySQL 9.1.2 and later, you can
also review the field_delimiter
and
record_delimiter
values that MySQL HeatWave
automatically detected for the table.
Optionally, query five rows of the table to confirm the data is loaded and accessible.
mysql> SELECT * FROM bank_marketing.bank_train LIMIT 5;
+-----+-------------+----------+-----------+----------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
| age | job | marital | education | default1 | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y |
+-----+-------------+----------+-----------+----------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
| 37 | services | married | secondary | no | 4760 | yes | no | cellular | 8 | may | 182 | 2 | 169 | 2 | failure | no |
| 32 | technician | single | secondary | no | 2979 | no | no | cellular | 25 | may | 156 | 1 | -1 | 0 | unknown | no |
| 43 | management | married | tertiary | no | 690 | yes | no | cellular | 6 | aug | 171 | 3 | -1 | 0 | unknown | no |
| 50 | blue-collar | divorced | secondary | no | 203 | yes | no | telephone | 19 | nov | 265 | 1 | 127 | 4 | other | no |
| 34 | blue-collar | married | secondary | no | 322 | yes | no | cellular | 20 | apr | 10 | 3 | -1 | 0 | unknown | no |
+-----+-------------+----------+-----------+----------+---------+---------+------+-----------+-----+-------+----------+----------+-------+----------+----------+----+
Once you confirm the table successfully loaded into Lakehouse, you can use the data for the following: