MySQL HeatWave User Guide

10.3.8 ML_EMBED_TABLE

The ML_EMBED_TABLE routine runs multiple embedding generations in a batch, in parallel.

Note

In versions older than MySQL 9.2.1, to alter an existing table or create a new table, MySQL requires you to set the sql-require-primary-key system variable to 0.

This routine is available as of MySQL 9.0.1-u1.

This topic contains the following sections:

To learn about the privileges you need to run this routine, see Section 7.3, “MySQL HeatWave GenAI Roles and Privileges”.

ML_EMBED_TABLE Syntax

mysql> CALL sys.ML_EMBED_TABLE('InputTableColumn', 'OutputTableColumn'[, options]);
    
options: JSON_OBJECT(keyvalue[, keyvalue] ...)
keyvalue: 
{
  'model_id', {'ModelID'}
  |'truncate', {true|false}
  |'batch_size', BatchSize
  |'details_column', 'ErrorDetailsColumnName'
}

Following are ML_EMBED_TABLE parameters:

  • InputTableColumn: specifies the names of the input database, table, and column that contains the text to encode. The InputTableColumn is specified in the following format: DBName.TableName.ColumnName.

    • The specified input table can be an internal or external table.

    • The specified input table must already exist, must not be empty, and must have a primary key.

    • The input column must already exist and must contain text or varchar values.

    • The input column must not be a part of the primary key and must not have NULL values or empty strings.

    • There must be no backticks used in the DBName, TableName, or ColumnName and there must be no period used in the DBName or TableName.

  • OutputTableColumn: specifies the names of the database, table, and column where the generated embeddings are stored. The OutputTableColumn is specified in the following format: DBName.TableName.ColumnName.

    • The specified output table must be an internal table.

    • If the specified output table already exists, then it must be the same as the input table. And, the specified output column must not already exist in the input table. A new VECTOR column is added to the table. External tables are read only. So if input table is an external table, then it cannot be used to store the output.

    • If the specified output table doesn't exist, then a new table is created. The new output table has key columns which contains the same primary key values as the input table and a VECTOR column that stores the generated embeddings.

    • There must be no backticks used in the DBName, TableName, or ColumnName and there must be no period used in the DBName or TableName.

  • options: specifies optional parameters as key-value pairs in JSON format. It can include the following parameters:

    • model_id: specifies the embedding model to use for encoding the text. Possible values are:

      • multilingual-e5-small:

        • As of MySQL 9.3.0, used as the default embedding model.

        • As of MySQL 9.2.1, can be used for encoding text in any supported language.

          In earlier versions of MySQL, can be used for encoding text or files in supported languages other than English.

        • This embedding model is available as of MySQL 9.0.1-u1.

      • all_minilm_l12_v2:

        • In MySQL 9.2.2 and earlier versions, used as the default model embedding.

        • As of MySQL 9.2.1, can be used for encoding text in any supported language.

          In earlier versions of MySQL, can be used for encoding text in English only.

      • cohere.embed-english-v3.0:

        • As of MySQL 9.2.1, can be used for encoding text in any supported language.

          In earlier versions of MySQL, can be used for encoding text in English only.

        • This embedding model is available as of MySQL 9.0.1-u1.

      • cohere.embed-multilingual-v3.0:

        • As of MySQL 9.2.1, can be used for encoding text in any supported language.

          In earlier versions of MySQL, can be used for encoding text in supported languages other than English.

        • This embedding model is available as of MySQL 9.0.1-u1.

      • As of MySQL 9.4.0, to use an OCI Generative AI service Dedicated AI Cluster model, specify the model endpoint OCID. For more information, see Creating an Endpoint in Generative AI.

      To view the lists of available embedding models, see MySQL HeatWave In-Database Embedding Models and OCI Generative AI Service Embedding Models. To view the list of supported languages, see Languages.

    • truncate: specifies whether to truncate inputs longer than the maximum token size. Default value is true.

    • batch_size: specifies the batch size for the routine. This parameter is supported for internal tables only. Default value is 1000. Possible values are integer values between 1 and 1000.

    • details_column: specifies a name for the output table column that is created for adding details of errors encountered for rows that aren't processed successfully by the routine. Ensure that a column by the specified name does not already exist in the table. Default value is details.

      This parameter is available as of MySQL 9.3.0.

Syntax Examples

Generate embeddings for text stored in demo_db.input_table.Input using the all_minilm_l12_v2 embedding model, and save the generated embeddings in the output table demo_db.output_table.Output:

mysql> CALL sys.ML_EMBED_TABLE("demo_db.input_table.Input", "demo_db.output_table.Output", JSON_OBJECT("model_id", "all_minilm_l12_v2"));

See Also

Generate Vector Embeddings - Run Batch Queries