MySQL HeatWave User Guide
MySQL HeatWave Lakehouse has the following limitations for Parquet files.
NaN values in Parquet files are loaded as NULL.
As of MySQL 9.3.2, Lakehouse supports the
VECTOR
data type. Consider
the following limitations:
To load Parquet files with the
VECTOR
data type (from
Parquet strings or from Parquet (numeric) Lists), you
must
load
the table manually with
is_strict_mode
set to
false
. For loading simple lists as
JSON
, you can load the
data
manually
or
automatically.
To learn more, see
Vector Data Type
and
External
Table Syntax.
If you need to manually create a table but do not
have the appropriate table definition, you can load
the table automatically in dryrun
mode and review the load script. You also need to
convert columns from
JSON
to
VECTOR
as needed. See
the example commands below to do this.
mysql>SET @input_list = '[{ "db_name": "data_db", "tables": [{ "table_name": "table_1", "engine_attribute": {"dialect": {"format": "parquet" "embed_model_id": "minilm"}, "file": [{"par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/.../n/tenancy_1/b/bucket_1/o/data_file_1.parquet"}]} }]}]';
mysql>CALL sys.HEATWAVE_LOAD(CAST(@input_list AS JSON), JSON_OBJECT('mode', 'dryrun'));
mysql>SELECT log->>"$.sql" AS "Load Script" FROM sys.heatwave_autopilot_report WHERE type = "sql" ORDER BY id;
You cannot give the
VECTOR
data type a
default value other than NULL
.
You cannot directly convert
VECTOR
to
JSON
or
JSON
to
VECTOR
.
To convert VECTOR
to JSON
, first
convert VECTOR
to a
STRING data
type by using the
VECTOR_TO_STRING()
function. Then, cast this string as
JSON
.
For example, if col_1
in
table_1
is a
VECTOR
column that
you want to convert to
JSON
, you can do
the following:
SELECT CAST(VECTOR_TO_STRING(col_1) as JSON) FROM table_1;
To convert JSON
to
VECTOR
, first
convert JSON
to a
STRING data
type, and then use the
STRING_TO_VECTOR()
function to convert to
VECTOR
.
For example, if col_1
in
table_1
is a
JSON
column that
you want to convert to
VECTOR
, you can do
the following:
SELECT STRING_TO_VECTOR(CAST(col_1 as NCHAR)) FROM table_1;
If you run Vector_to_string(col)
as
a
Vector_to_string()
function, it fails if there are any
0x
(default for invalid vectors)
values. To avoid errors, filter out these values.
As of MySQL 9.1.2, you can load Parquet files with row groups that total up to 10GB with large MySQL HeatWave Cluster shapes.
Lakehouse does not support the following data types in
Parquet files. Lakehouse marks columns with these data
types as NOT SECONDARY
, and does not
load them.
BSON
ENUM
Interval
List
Not supported before MySQL 9.3.2. As of MySQL
9.3.2, limited support is available for Parquet
Lists in Lakehouse. Simple (non-nested) lists of
the following Parquet data types are now
supported: INT_8
,
INT_16
,
INT_32
,
INT_64
,
UINT_8
,
UINT_16
,
UINT_32
,
UINT_64
,
FLOAT
(32-bit),
DOUBLE
,
STRING
, and
BOOL
.
Map
Unknown
UUID
Do not use strict SQL mode if the inferred schema differs from the table schema.