MySQL AI User Guide
The ML_EMBED_TABLE
routine runs multiple
embedding generations in a batch, in parallel.
This topic contains the following sections:
To learn about the privileges you need to run this routine, see Section 5.3, “Required Privileges for using GenAI”.
mysql>CALL sys.ML_EMBED_TABLE('
InputTableColumn
', 'OutputTableColumn
'[,options
]);options
: JSON_OBJECT(keyvalue
[,keyvalue
] ...)keyvalue
: { 'model_id', {'ModelID
'} |'truncate', {true|false} |'batch_size',BatchSize
|'details_column', 'ErrorDetailsColumnName
' }
Following are ML_EMBED_TABLE
parameters:
InputTableColumn
: specifies the
names of the input database, table, and column that
contains the text to encode. The
InputTableColumn
is specified
in the following format:
DBName
.TableName
.ColumnName
.
The specified input table can be an internal or external table.
The specified input table must already exist, must not be empty, and must have a primary key.
The input column must already exist and must contain
text
or varchar
values.
The input column must not be a part of the primary key
and must not have NULL
values or
empty strings.
There must be no backticks used in the
DBName
,
TableName
, or
ColumnName
and there must
be no period used in the
DBName
or
TableName
.
OutputTableColumn
: specifies
the names of the database, table, and column where the
generated embeddings are stored. The
OutputTableColumn
is specified
in the following format:
DBName
.TableName
.ColumnName
.
The specified output table must be an internal table.
If the specified output table already exists, then it
must be the same as the input table. And, the
specified output column must not already exist in the
input table. A new
VECTOR
column is added
to the table. External tables are read only. So if
input table is an external table, then it cannot be
used to store the output.
If the specified output table doesn't exist, then a
new table is created. The new output table has key
columns which contains the same primary key values as
the input table and a VECTOR
column
that stores the generated embeddings.
There must be no backticks used in the
DBName
,
TableName
, or
ColumnName
and there must
be no period used in the
DBName
or
TableName
.
options
: specifies optional parameters
as key-value pairs in JSON format. It can include the
following parameters:
model_id
: specifies the embedding
model to use for encoding the text. Default value is
multilingual-e5-small
. Possible
values are:
all_minilm_l12_v2
or
minilm
multilingual-e5-small
To view the lists of available embedding models, see In-Database Embedding Model.
truncate
: specifies whether to
truncate inputs longer than the maximum token size.
Default value is true
.
batch_size
: specifies the batch
size for the routine. This option is supported for
internal tables only. Default value is
1000
. Possible values are integer
values between 1
and
1000
.
details_column
: specifies a name
for the output table column that is created for adding
details of errors encountered for rows that aren't
processed successfully by the routine. Ensure that a
column by the specified name does not already exist in
the table. Default value is
details
.
Generate embeddings for text stored in
demo_db.input_table.Input
using the
all_minilm_l12_v2
embedding model, and save
the generated embeddings in the output table
demo_db.output_table.Output
:
mysql> CALL sys.ML_EMBED_TABLE("demo_db.input_table.Input", "demo_db.output_table.Output", JSON_OBJECT("model_id", "all_minilm_l12_v2"));