MySQL HeatWave User Guide
As of MySQL 9.2.1, MySQL HeatWave GenAI lets you use tables containing your
own vector embedding to run retrieval-augemented generation
(RAG) with vector search. The ML_RAG
and
ML_RAG_TABLE
routines let you specify the
table column names to use as filters for finding relevant tables
for context retrieval.
In addition to the specified column names, the
ML_RAG
and ML_RAG_TABLE
routines use the name of the embedding model used to embed the
input query to find relevant embedding tables for context
retrieval.
Following sections in this topic describe how you can use your own embedding table for context retrieval:
Review the MySQL HeatWave GenAI requirements and privileges.
Create a table that satisfies the following criteria:
To qualify as a valid embedding table, the table must contain the following columns:
A string column containing the text segments.
A vector column containing the vector embeddings of the text segments.
A comment on the vector column must specify the name of the embedding model used to generate the vector embeddings.
Following is an example of a valid embedding table that can be used for context retrieval:
mysql>CREATE TABLE demo_table (id INT AUTO_INCREMENT, demo_text TEXT, string_embedding TEXT, demo_embedding VECTOR (3) COMMENT 'GENAI_OPTIONS=EMBED_MODEL_ID=demo_embedding_model', primary key (id));
mysql>INSERT INTO demo_table (demo_text, string_embedding) VALUES('MySQL is an open-source RDBMS that is widely used for its scalability, reliability, and security.', '[0,1,0]');
mysql>INSERT INTO demo_table (demo_text, string_embedding) VALUES('AI refers to the development of machines that can think and act like humans.', '[0,0,1]');
mysql>INSERT INTO demo_table (demo_text, string_embedding) VALUES('ML is a subset of AI that uses algorithms and statistical models to improve performance on tasks by learning from data.', '[0,1,1]');
mysql>UPDATE demo_table SET demo_embedding=STRING_TO_VECTOR(string_embedding);
mysql>ALTER TABLE demo_table DROP COLUMN string_embedding;
To learn how to generate vector embeddings and embedding tables using MySQL HeatWave GenAI, see Generating Vector Embeddings.
If you want to use an inbuilt vector store table along with your own embedding table, complete the steps to set up the vector store.
To Run Batch Queries, add the natural-language queries to a column in a new or existing table. To use the name of an embedding model that is not available in MySQL HeatWave for running RAG, also add the vector embeddings of the input queries to a column of the input table.
To create and store the sample embedding tables required
for running the steps in this topic, you can create and
use a new database demo_db
:
mysql>CREATE DATABASE demo_db;
mysql>USE demo_db;
To use an embedding table containing vector embeddings from an
embedding model that is available in MySQL HeatWave, you can set the
vector_store_columns
parameter to specify
the columns and column names used by the
ML_RAG
routine to filter tables for context
retrieval. However, since the inbuilt vector store tables only
use the predefined column names, if you change a column name
used for filtering tables, the inbuilt vector store tables are
filtered out and not used for context retrieval.
The example in this section uses the following table:
mysql>CREATE TABLE demo_minilm_table (id INT AUTO_INCREMENT, demo_text_column TEXT, primary key (id));
mysql>INSERT INTO demo_minilm_table (demo_text_column) VALUES('MySQL is an open-source RDBMS that is widely used for its scalability, reliability, and security.');
mysql>INSERT INTO demo_minilm_table (demo_text_column) VALUES('AI refers to the development of machines that can think and act like humans.');
mysql>INSERT INTO demo_minilm_table (demo_text_column) VALUES('ML is a subset of AI that uses algorithms and statistical models to improve performance on tasks by learning from data.');
mysql>CALL sys.ML_EMBED_TABLE('demo_db.demo_minilm_table.demo_text_column', 'demo_db.demo_minilm_table.demo_embedding_column', JSON_OBJECT('model_id', 'all_minilm_l12_v2'));
To run RAG, perform the following steps:
To change the column names to use to filter tables for context retrieval, then set the routine options as shown below:
mysql> SET @options = JSON_OBJECT(
"vector_store_columns", JSON_OBJECT("segment", "TextSegmentColumnName
", "segment_embedding", "VectorEmbeddingColumnName
"),
"embed_model_id", "EmbeddingModelName
",
"model_options", JSON_OBJECT("language", "Language
")
);
Replace the following:
TextSegmentColumnName
: the
name of the embedding table column that contains the
text segments in natural language. Default value is
segment
.
VectorEmbeddingColumnName
:
the name of the embedding table column that contains
vector embeddings of the natural-language text
segments. Default value is
segment_embedding
.
EmbeddingModelName
: the
name of the embedding model to use to generate the
vector embeddings for the input query. The routine
uses this embedding model name to find relevant tables
for context retrieval. Default value is
minilm
if the output language is
set to English and
multilingual-e5-small
if the output
language is set to a language other than English.
For possible values, to view the list of available embedding models, see MySQL HeatWave In-Database Embedding Models and OCI Generative AI Service Embedding Models.
Language
: the two-letter
ISO 639-1
code for the language you
want to use for generating the output. The
model_option
option parameter
language
is required only if you
want to use a language other than English. Default
language is en
, which is English.
To view the list of supported languages, see
Languages.
For example:
mysql> SET @options = JSON_OBJECT(
"vector_store_columns", JSON_OBJECT("segment", "demo_text_column", "segment_embedding", "demo_embedding_column"),
"embed_model_id", "all_minilm_l12_v2", "model_options", JSON_OBJECT("language", "en")
);
In this example, all embedding tables containing a string
column demo_text_column
and a vector
column demo_embedding_column
, which
contains vector embeddings from
all_minilm_l12_v2
, are used for context
retrieval.
Similarly, you can use the
vector_store_columns
parameter to
specify the following column names for the routine to
filter relevant tables for context retrieval:
document_name
: name of a column
containing the document names. This column can be of
any data type supported by MySQL HeatWave. Default value is
document_name
.
document_id
: name of an integer
column containing the document IDs. Default value is
document_id
.
metadata
: name of a JSON column
containing additional table metadata. Default value is
metadata
.
segment_number
: name of an integer
column containing the segment numbers. Default value
is segment_number
.
Since these are optional columns, if these column values are not set, then the routine does not use these columns to filter tables.
To define your natural-language query, set the
@query
variable:
SET @query="AddYourQuery
";
Replace AddYourQuery
with your
natural-language query.
For example:
mysql> SET @query="What is AutoML?";
To retrieve the augmented prompt and generate the output,
use the ML_RAG
routine:
mysql> CALL sys.ML_RAG(@query,@output,@options);
Print the output:
mysql> SELECT JSON_PRETTY(@output);
The output is similar to the following:
| { "text": " AutoML is a machine learning technique that uses algorithms and models to automate the process of selecting the best features, tuning hyperparameters, and building predictive models. It allows users to easily create accurate machine learning models without requiring extensive expertise in data science or programming.", "license": "Your use of this llama model is subject to your Oracle agreements and this llama license agreement: https://downloads.mysql.com/docs/LLAMA_32_3B_INSTRUCT-license.pdf", "citations": [ { "segment": "AI refers to the development of machines that can think and act like humans.", "distance": 0.733, "document_name": "" }, { "segment": "MySQL is an open-source RDBMS that is widely used for its scalability, reliability, and security.", "distance": 0.8234, "document_name": "" }, { "segment": "Machine learning is a subset of AI that uses algorithms and statistical models to improve performance on tasks by learning from data.", "distance": 0.8282, "document_name": "" } ], "vector_store": [ "`demo_db`.`demo_minilm_table`" ], "retrieval_info": { "method": "n_citations", "threshold": 0.8282 } } |
The vector_store
section lists the name
of the embedding table that is used to retrieve context
for generating the output.
To use a table containing vector embeddings from an embedding
model that is not available in MySQL HeatWave, the
ML_RAG
routine lets you provide the vector
embedding of the input query and the name of the embedding
model that you used to embed the input query as well as the
vector embeddings stored in your embedding table. When you
provide the vector embedding of the input query, the routine
skips embedding the query and proceeds with the similarity
search, context retrieval, and RAG. However, in this case, you
cannot use the inbuilt vector store tables for context
retrieval.
The example in this section uses the following table:
mysql>CREATE TABLE demo_table (id INT AUTO_INCREMENT, demo_text TEXT, string_embedding TEXT, demo_embedding VECTOR (3) COMMENT 'GENAI_OPTIONS=EMBED_MODEL_ID=demo_embedding_model', primary key (id));
mysql>INSERT INTO demo_table (demo_text, string_embedding) VALUES('MySQL is an open-source RDBMS that is widely used for its scalability, reliability, and security.', '[0,1,0]');
mysql>INSERT INTO demo_table (demo_text, string_embedding) VALUES('AI refers to the development of machines that can think and act like humans.', '[0,0,1]');
mysql>INSERT INTO demo_table (demo_text, string_embedding) VALUES('ML is a subset of AI that uses algorithms and statistical models to improve performance on tasks by learning from data.', '[0,1,1]');
mysql>UPDATE demo_table SET demo_embedding=STRING_TO_VECTOR(string_embedding);
mysql>ALTER TABLE demo_table DROP COLUMN string_embedding;
To run RAG using a table that contains vector embeddings from an embedding model that is not available in MySQL HeatWave, perform the following steps:
Provide the vector embedding of the input query:
SET @query_embedding = to_base64(string_to_vector('VectorEmbeddingOfTheQuery
'));
Replace
VectorEmbeddingOfTheQuery
with
the vector embedding of your input query.
For example:
SET @query_embedding = to_base64(string_to_vector('[0,1,0]'));
To specify column names for the ML_RAG
routine to find relevant tables for context retrieval, set
the routine options:
mysql> SET @options = JSON_OBJECT(
"vector_store_columns", JSON_OBJECT("segment", "TextSegmentColumnName
", "segment_embedding", "VectorEmbeddingColumnName
"),
"embed_model_id", "EmbeddingModelName
",
"query_embedding", @query_embedding,
"model_options", JSON_OBJECT("language", "Language
")
);
Replace the following:
TextSegmentColumnName
: the
name of the embedding table column that contains the
text segments in natural language.
VectorEmbeddingColumnName
:
the name of the embedding table column that contains
vector embeddings of the natural-language text
segments.
EmbeddingModelName
: the
name of the embedding model that you used to generate
the vector embeddings for the input query and
embedding tables.
Language
: the two-letter
ISO 639-1
code for the language you
want to use for generating the output. The
model_option
option parameter
language
is required only if you
want to use a language other than English. Default
language is en
, which is English.
To view the list of supported languages, see
Languages.
For example:
mysql> SET @options = JSON_OBJECT(
"vector_store_columns", JSON_OBJECT("segment", "demo_text", "segment_embedding", "demo_embedding"),
"embed_model_id", "demo_embedding_model",
"query_embedding", @query_embedding,
"model_options", JSON_OBJECT("language", "en")
);
In this example, embedding tables containing a string
column demo_text
and a vector column
demo_embeddings
which contains vector
embeddings from demo_embedding_model
are used for context retrieval.
Similarly, you can use the
vector_store_columns
parameter to
specify the following column names for the routine to
filter relevant tables for context retrieval:
document_name
: name of a column
containing the document names. This column can be of
any data type supported by MySQL HeatWave.
document_id
: name of an integer
column containing the document IDs.
metadata
: name of a JSON column
containing additional table metadata.
segment_number
: name of an integer
column containing the segment numbers.
Since these are optional columns, if these column values are not set, then the routine does not use these columns to filter tables.
To define your natural-language query, set the
@query
variable:
SET @query="AddYourQuery
";
Replace AddYourQuery
with your
natural-language query.
For example:
mysql> SET @query="What is AutoML?";
To retrieve the augmented prompt, use the
ML_RAG
routine:
mysql> CALL sys.ML_RAG(@query,@output,@options);
Print the output:
mysql> SELECT JSON_PRETTY(@output);
The output is similar to the following:
| { "text": " AutoML is a subfield of machine learning that focuses on automating the process of building and training machine learning models. It uses techniques such as algorithm selection, hyperparameter tuning, and model ensembling to automatically generate high-performing models with minimal human intervention. AutoML has the potential to democratize access to machine learning by making it easier for non-experts to build and deploy models.", "license": "Your use of this llama model is subject to your Oracle agreements and this llama license agreement: https://downloads.mysql.com/docs/LLAMA_32_3B_INSTRUCT-license.pdf", "citations": [ { "segment": "MySQL is an open-source RDBMS that is widely used for its scalability, reliability, and security.", "distance": 0.0, "document_name": "" }, { "segment": "Machine learning is a subset of AI that uses algorithms and statistical models to improve performance on tasks by learning from data.", "distance": 0.2929, "document_name": "" }, { "segment": "AI refers to the development of machines that can think and act like humans.", "distance": 1.0, "document_name": "" } ], "vector_store": [ "`demo_db`.`demo_table`" ], "retrieval_info": { "method": "n_citations", "threshold": 1.0 } } |
The vector_store
section lists the name
of the embedding table that is used to retrieve context
for generating the output.
By default, the ML_RAG
routine uses all
predefined columns and column names available in the inbuilt
vector store table to filter tables for context retrieval.
This means that if your embedding table does not contain all
columns that are available in an inbuilt vector store table,
then your embedding table is filtered out and is not used for
context retrieval by the routine.
Therefore, if you want to use an inbuilt vector store table along with your own embedding table for context retrieval, your embedding table must satisfy the following additional requirements:
Since the inbuilt vector store tables, which are set up using Asynchronous Load or Auto Parallel Load, use predefined column names, the column names in your embedding tables must match the predefined inbuilt vector store table column names as given below:
segment
: name of the mandatory
string column containing the text segments.
segment_embedding
: name of the
mandatory vector column containing the vector
embeddings of the text segments.
document_name
: name of the optional
column containing the document names. This column can
be of any data type supported by MySQL HeatWave.
document_id
: name of the optional
integer column containing the document IDs.
metadata
: name of the optional JSON
column containing metadata for the table.
segment_number
: name of the
optional integer column containing segment number.
The vector embeddings in your embedding table must be from the same embedding model as the vector store table.
The example in this section uses the vector store table
demo_embeddings
created in the section
Ingesting
Files Using the URI with Asynchronous Load with the
following table:
mysql>CREATE TABLE demo_e5_table (id INT AUTO_INCREMENT, segment TEXT, primary key (id));
mysql>INSERT INTO demo_e5_table (segment) VALUES('MySQL is an open-source RDBMS that is widely used for its scalability, reliability, and security.');
mysql>INSERT INTO demo_e5_table (segment) VALUES('AI refers to the development of machines that can think and act like humans.');
mysql>INSERT INTO demo_e5_table (segment) VALUES('Machine learning is a subset of AI that uses algorithms and statistical models to improve performance on tasks by learning from data.');
mysql>CALL sys.ML_EMBED_TABLE('demo_db.demo_e5_table.segment', 'demo_db.demo_e5_table.segment_embedding', JSON_OBJECT('model_id', 'multilingual-e5-small'));
To run RAG using an inbuilt vector store table and your embedding table, perform the following steps:
Set the routine options:
If your embedding table contains all the mandatory and optional columns as the inbuilt vector store table, then set the routine options as shown below:
mysql> SET @options = JSON_OBJECT(
"embed_model_id", "EmbeddingModelName
",
"model_options", JSON_OBJECT("language", "Language
"
)
);
EmbeddingModelName
: the
name of the embedding model to use to generate the
vector embeddings for the input query. The routine
uses this embedding model name to find relevant
tables for context retrieval. As of MySQL 9.3.0,
default value is
multilingual-e5-small
.
For possible values, to view the list of available embedding models, see MySQL HeatWave In-Database Embedding Models and OCI Generative AI Service Embedding Models.
Language
: the
two-letter ISO 639-1
code for
the language you want to use for generating the
output. The model_option
option
parameter language
is required
only if you want to use a language other than
English. Default language is
en
, which is English. To view
the list of supported languages, see
Languages.
For example:
mysql> SET @options = JSON_OBJECT("embed_model_id", "multilingual-e5-small", "model_options", JSON_OBJECT("language", "en"));
If your embedding table contains the same mandatory
columns as that of an inbuilt vector store table,
similar to demo_e5_table
, which
are:
A text column with the name
segment
.
A vector column
segment_embedding
.
Then, set the routine options as shown below:
mysql> SET @options = JSON_OBJECT(
"vector_store_columns", JSON_OBJECT("segment", "segment", "segment_embedding", "segment_embedding"),
"embed_model_id", "EmbeddingModelName
",
"model_options", JSON_OBJECT("language", "Language
"
));
For example:
mysql> SET @options = JSON_OBJECT(
"vector_store_columns", JSON_OBJECT("segment", "segment", "segment_embedding", "segment_embedding"),
"embed_model_id", "multilingual-e5-small",
"model_options", JSON_OBJECT("language", "en")
);
In this example, both embedding tables and vector
store tables that contain a string column
segment
and a vector column
segment_embedding
which contains
vector embeddings from
multilingual-e5-small
are used for
context retrieval.
To define your natural-language query, set the
@query
variable:
SET @query="AddYourQuery
";
Replace AddYourQuery
with your
natural-language query.
For example:
mysql> SET @query="What is AutoML?";
To retrieve the augmented prompt and generate the output,
use the ML_RAG
routine:
mysql> CALL sys.ML_RAG(@query,@output,@options);
Print the output:
mysql> SELECT JSON_PRETTY(@output);
The output is similar to the following:
| { "text": " AutoML is a machine learning technique that uses algorithms to automatically generate and optimize models for specific tasks, without the need for manual intervention. It combines the power of machine learning with the ease of use of traditional programming tools, allowing users to quickly and easily create accurate models for their data.", "license": "Your use of this llama model is subject to your Oracle agreements and this llama license agreement: https://downloads.mysql.com/docs/LLAMA_32_3B_INSTRUCT-license.pdf", "citations": [ { "segment": "| { \"text\": \" AutoML is a machine learning technique that uses algorithms to automatically generate and optimize models for specific tasks, without the need for manual intervention. It combines the power of machine learning with the ease of use of traditional programming tools, allowing users to quickly and easily create accurate models for their data.\", \"citations\": [ {\n \"segment\": \"What is MySQL?\", \"distance\": 0.7121, \"document_name\": \"\" },\n {", "distance": 0.0725, "document_name": "" }, { "segment": "| { \"text\": \" AutoML is a machine learning technique that automates the process of selecting, training, and evaluating machine learning models. It involves using algorithms and techniques to automatically identify the best model for a given dataset and optimize its hyperparameters without requiring manual intervention from data analysts or ML practitioners. AutoML can be used in various stages of the machine learning pipeline, including data preprocessing, feature engineering, model", "distance": 0.0743, "document_name": "" }, { "segment": "| { \"text\": \" AutoML is a subfield of machine learning that focuses on automating the process of building and training machine learning models. It involves using algorithms and techniques to automatically select features, tune hyperparameters, and evaluate model performance, without requiring human intervention. AutoML can be used for a variety of tasks, including classification, regression, clustering, and anomaly detection.\", \"citations\": [ {", "distance": 0.0762, "document_name": "" } ], "vector_store": [ "`demo_db`.`demo_embeddings`", "`demo_db`.`demo_e5_table`" ], "retrieval_info": { "method": "n_citations", "threshold": 0.0762 } } |
The vector_store
section lists the
names of the vector store table,
demo_embeddings
, and embedding table,
demo_e5_table
that are used to retrieve
context for generating the output.
To run multiple RAG queries in parallel, use the
ML_RAG_TABLE
routine. This method is faster than running the
ML_RAG
routine multiple times.
To run the steps in this section, you can use the same sample
table demo_e5_table
as section
Use Your Embedding Table With a Vector Store Table, and
create the following table to store input queries for batch
processing:
mysql>CREATE TABLE input_table (id INT AUTO_INCREMENT, Input TEXT, primary key (id));
mysql>INSERT INTO input_table (Input) VALUES('What is HeatWave Lakehouse?');
mysql>INSERT INTO input_table (Input) VALUES('What is HeatWave AutoML?');
mysql>INSERT INTO input_table (Input) VALUES('What is HeatWave GenAI?');
To run batch queries using ML_RAG_TABLE
,
perform the following steps:
To specify column names for the
ML_RAG_TABLE
routine to find relevant
tables for context retrieval, set the routine options:
mysql> SET @options = JSON_OBJECT(
"vector_store_columns", JSON_OBJECT("segment", "TextSegmentColumnName
", "segment_embedding", "VectorEmbeddingColumnName
"),
"embed_model_id", "EmbeddingModelName
",
"model_options", JSON_OBJECT("language", "Language
")
);
Replace the following:
TextSegmentColumnName
: the
name of the embedding table column that contains the
text segments in natural language. If multiple tables
contain a string column with the same name, they are
all used for context retrieval. Default value is
segment
.
VectorEmbeddingColumnName
:
the name of the embedding table column that contains
vector embeddings of the natural-language text
segments. If multiple tables contain a vector column
with the same name which contain embeddings from the
specified embedding model, they are all used for
context retrieval. Default value is
segment_embedding
.
EmbeddingModelName
: the
name of the embedding model to use to generate the
vector embeddings for the input query. The routine
uses this embedding model name to find tables
generated using the same model for context retrieval.
Default value is minilm
if the
output language is set to English and
multilingual-e5-small
if the output
language is set to a language other than English.
Language
: the two-letter
ISO 639-1
code for the language you
want to use for generating the output. The
model_option
option parameter
language
is required only if you
want to use a language other than English. Default
language is en
, which is English.
To view the list of supported languages, see
Languages.
For example:
mysql> SET @options = JSON_OBJECT(
"vector_store_columns", JSON_OBJECT("segment", "segment", "segment_embedding", "segment_embedding"),
"embed_model_id", "multilingual-e5-small",
"model_options", JSON_OBJECT("language", "en")
);
In this example, only embedding tables containing a string
column demo_text
and a vector column
demo_embeddings
which contains vector
embeddings from multilingual-e5-small
are used for context retrieval. Since the inbuilt vector
store tables use predefined column names, if you change
the column names to any value other than the default
value, then the vector store tables are filtered out and
are not used for context retrieval.
To learn more about the available routine options, see ML_RAG_TABLE Syntax.
Similarly, you can use the
vector_store_columns
parameter to
specify the following column names for the routine to
filter relevant tables for context retrieval:
document_name
: name of a column
containing the document names. This column can be of
data type supported by MySQL HeatWave. Default value is
document_name
.
document_id
: name of an integer
column containing the document IDs. Default value is
document_id
.
metadata
: name of a JSON column
containing additional table metadata. Default value is
metadata
.
segment_number
: name of an integer
column containing the segment numbers. Default value
is segment_number
.
Since these are optional columns, if these column values are not set, then the routine does not use these columns to filter tables.
If you are using an embedding model that is not available
in MySQL HeatWave, then you must also provide the vector
embeddings of the input queries. You can specify name of
the input table column that contains the vector embeddings
of the input queries using the
embed_column
parameter. However, in
this case, you cannot use the inbuilt vector store tables
for context retrieval.
In the ML_RAG_TABLE
routine, specify
the table columns containing the input queries and for
storing the generated outputs:
mysql> CALL sys.ML_RAG_TABLE("InputDBName.InputTableName.InputColumn
", "OutputDBName.OutputTableName.OutputColumn
", @options);
Replace the following:
InputDBName
: the name of
the database that contains the table column where your
input queries are stored.
InputTableName
: the name of
the table that contains the column where your input
queries are stored.
InputColumn
: the name of
the column that contains input queries.
OutputDBName
: the name of
the database that contains the table where you want to
store the generated outputs. This can be the same as
the input database.
OutputTableName
: the name
of the table where you want to create a new column to
store the generated outputs. This can be the same as
the input table. If the specified table doesn't exist,
a new table is created.
OutputColumn
: the name for
the new column where you want to store the output
generated for the input queries.
For example:
mysql> CALL sys.ML_RAG_TABLE("demo_db.input_table.Input", "demo_db.output_table.Output", @options);
View the contents of the output table:
mysql> SELECT * FROM output_table\G
*************************** 1. row ***************************
id: 1
Output: { "text": "HeatWave Lakehouse is a feature of the HeatWave platform that enables query processing on data resident in Object Storage. The source data is read from Object Storage, transformed to the memory optimized HeatWave format, stored in the HeatWave persistence storage layer in Object Storage, and then loaded to HeatWave cluster memory for in-memory query processing. It allows you to create tables which point to external data sources and uses the Lakehouse Engine as the primary engine with Rapid as the secondary engine for data storage.",
"error": null,
"license": "Your use of this llama model is subject to your Oracle agreements and this llama license agreement: https://downloads.mysql.com/docs/LLAMA_32_3B_INSTRUCT-license.pdf",
"citations": [
{
"segment": "See: Chapter 4, HeatWave GenAI.\n1.5 HeatWave Lakehouse The Lakehouse feature of HeatWave enables query processing on data resident in Object Storage. The source data is read from Object Storage, transformed to the memory optimized HeatWave format, stored in the HeatWave persistence storage layer in Object Storage, and then loaded to HeatWave cluster memory.\n• Provides in-memory query processing on data resident in Object Storage.\n• Data is not loaded into the MySQL InnoDB storage layer.",
"distance": 0.1032,
"document_name": ""
},
{
"segment": "The Lakehouse Engine enables you to create tables which point to external data sources.\nFor HeatWave Lakehouse, lakehouse is the primary engine, and rapid is the secondary engine.\n5.1.3 Data Storage",
"distance": 0.106,
"document_name": ""
},
{
"segment": "The Lakehouse feature of HeatWave enables query processing on data resident in Object Storage. The source data is read from Object Storage, transformed to the memory optimized HeatWave format, stored in the HeatWave persistence storage layer in Object Storage, and then loaded to HeatWave cluster memory.\n• Provides in-memory query processing on data resident in Object Storage.\n• Data is not loaded into the MySQL InnoDB storage layer.",
"distance": 0.1063,
"document_name": ""
}
],
"vector_store": [
"`demo_db`.`demo_embeddings`",
"`demo_db`.`demo_e5_table`"
],
"retrieval_info": {
"method": "n_citations",
"threshold": 0.1063
}
}
*************************** 2. row ***************************
id: 2
Output: { "text": "HeatWave AutoML is a feature of MySQL HeatWave that makes it easy to use machine learning, whether you are a novice user or an experienced ML practitioner. It analyzes the characteristics of the data and creates an optimized machine learning model that can be used to generate predictions and explanations. The data and models never leave MySQL HeatWave, saving time and effort while keeping the data and models secure. HeatWave AutoML is optimized for HeatWave shapes and scaling, and all HeatWave AutoML makes it easy to use machine learning, whether you are a novice user or an experienced ML practitioner. You provide the data, and HeatWave AutoML analyzes the characteristics of the data and creates an optimized machine learning model that you can use to generate predictions and explanations. An ML model makes predictions by identifying patterns in your data and applying those patterns to unseen data. HeatWave AutoML explanations help you understand how predictions are made,",
"error": null,
"citations": [
{
"segment": "| HeatWave AutoML is a feature of MySQL HeatWave that makes it easy to use machine learning, whether you are a novice user or an experienced ML practitioner. It analyzes the characteristics of the data and creates an optimized machine learning model that can be used to generate predictions and explanations. The data and models never leave MySQL HeatWave, saving time and effort while keeping the data and models secure. HeatWave AutoML is optimized for HeatWave shapes and scaling, and all",
"distance": 0.0561,
"document_name": ""
},
{
"segment": "HeatWave AutoML makes it easy to use machine learning, whether you are a novice user or an experienced ML practitioner. You provide the data, and HeatWave AutoML analyzes the characteristics of the data and creates an optimized machine learning model that you can use to generate predictions and explanations. An ML model makes predictions by identifying patterns in your data and applying those patterns to unseen data. HeatWave AutoML explanations help you understand how predictions are made,",
"distance": 0.0598,
"document_name": ""
},
{
"segment": "2. HeatWave AutoML analyzes the training data, trains an optimized machine learning model, and stores the model in a model catalog on the MySQL DB System. See Section 3.14.1, “The Model Catalog”.\n115",
"distance": 0.0669,
"document_name": ""
}
],
"vector_store": [
"`demo_db`.`demo_embeddings`",
"`demo_db`.`demo_e5_table`"
],
"retrieval_info": {
"method": "n_citations",
"threshold": 0.0669
}
}
*************************** 3. row ***************************
id: 3
Output: { "text": "HeatWave GenAI is a feature of HeatWave that allows users to communicate with unstructured data in HeatWave using natural-language queries. It uses a familiar SQL interface which makes it easy to use for content generation, summarization, and retrieval-augmented generation (RAG).",
"error": null,
"citations": [
{
"segment": "4.1 HeatWave GenAI Overview HeatWave GenAI is a feature of HeatWave that lets you communicate with unstructured data in HeatWave using natural-language queries. It uses a familiar SQL interface which makes it is easy to use for content generation, summarization, and retrieval-augmented generation (RAG).",
"distance": 0.0521,
"document_name": ""
},
{
"segment": "HeatWave Chat also provides a graphical interface integrated with the Visual Studio Code plugin for MySQL Shell.\nBenefits\nHeatWave GenAI lets you integrate generative AI into the applications, providing an integrated end-to-end pipeline including vector store generation, vector search with RAG, and an inbuilt chatbot.\nSome key benefits of using HeatWave GenAI are described below:",
"distance": 0.0781,
"document_name": ""
},
{
"segment": "HeatWave GenAI retrieves content from the vector store and provides it as context to the LLM along with the query. This process of generating an augmented prompt is called retrieval-augmented generation (RAG), and it helps HeatWave GenAI produce more contextually relevant, personalised, and accurate results.\n• HeatWave Chat",
"distance": 0.0811,
"document_name": ""
}
],
"vector_store": [
"`demo_db`.`demo_embeddings`",
"`demo_db`.`demo_e5_table`"
],
"retrieval_info": {
"method": "n_citations",
"threshold": 0.0811
}
}
As of MySQL 9.3.0, the output table generated using the
ML_RAG_TABLE
routine contains an
additional details for error reporting. In case the
routine fails to generate output for specific rows,
details of the errors encountered and default values used
are added for the rows in the output column.
If you created a new database for testing the steps in this topic, ensure that you delete the database to avoid being billed for it:
mysql> DROP DATABASE demo_db;
Learn how to Start a Conversational Chat.