MySQL AI User Guide
The VECTOR_STORE_LOAD routine generates
vector embedding for the specified files or folders that are ,
and loads the embeddings into a new vector store table.
This routine creates an asynchronous task which loads vector store tables in the background. It also returns a query that you can run to track the status of the vector store load task that is running in the background.
This topic contains the following sections:
To learn about the privileges you need to run this routine, see Section 5.3, “Required Privileges for using GenAI”.
mysql>CALL sys.VECTOR_STORE_LOAD('URI'[,options]);options: JSON_OBJECT(keyvalue[,keyvalue]...)keyvalue: { 'format', 'Format' |'schema_name', 'SchemaName' |'table_name', 'TableName' |'language', 'Language' |'embed_model_id', 'ModelID' |'description', 'Description' |'ocr', {true|false} }
Following are VECTOR_STORE_LOAD parameters:
URI: specifies a single unique
reference index (URI) pertaining to a file or folder to be
ingested into the vector store, or a JSON array of URIs
pertaining to multiple files or folders to be ingested
into the vector store.
A URI is considered to be one of the following:
A
glob
pattern, if it contains at least one unescaped
? or *
character.
A prefix, if it is not a pattern and ends with a
/ character like a folder path.
A file path, if it is neither a glob pattern nor a prefix.
options: specifies optional parameters
as key-value pairs in JSON format. It can include the
following parameters:
format: specifies the format of
files to be loaded. Default value is
auto_unstructured, which means
all supported types of files are loaded. Possible
values are pdf,
pptx, ppt,
txt, html,
docx, doc, and
auto_unstructured.
schema_name: specifies the name
of the schema where the vector embeddings are to be
loaded. By default, this procedure uses the current
schema from the session.
table_name: specifies the name of
the vector store table to create. By default, the
routine generates a unique table name with format
vector_store_data_x, where
x is a counter.
language: specifies the text
content language used in the files to be ingested
into the vector store. To set the value of the
language parameter, use the
two-letter ISO 639-1 code for the
language.
Default value is en.
For possible values, to view the list of supported languages, see Section 5.4, “Supported LLM, Embedding Model, and Languages”.
embed_model_id: specifies the
embedding model to use for encoding the text.
Default value is
multilingual-e5-small.
For possible values, to view the list of available embedding models, see In-Database Embedding Model.
description: specifies a
description of document collection being loaded.
Default value is NULL.
ocr: specifies whether to enable
or disable
Optical
Character Recognition (OCR). If set to
false, disables OCR. Default
value is true, which means OCR is
enabled by default. Default value is
true.
Specifying the file to ingest, using the current database, auto-generated name for the vector store table, and default values for all options:
mysql> CALL sys.VECTOR_STORE_LOAD('file:///var/lib/mysql-files/demo-directory/heatwave-en.pdf', NULL);
Specifying the file to ingest, using the current database, and specifying the name of the vector store table to be created:
mysql> CALL sys.VECTOR_STORE_LOAD('file:///var/lib/mysql-files/demo-directory/heatwave-en.pdf', '{"table_name": "demo_embeddings"}');
Specifying additional options such the schema name, table
name, language, format, and table description in
VECTOR_STORE_LOAD:
mysql> CALL sys.VECTOR_STORE_LOAD('file:///var/lib/mysql-files/german_files/de*', '{"schema_name": "demo_db", "table_name": "german_embeddings", "language": "de", "description": "Vector store table containing German PDF files."}');
Tracking the progress of a load task by using the task
query displayed as output for the
VECTOR_STORE_LOAD routine:
SELECT mysql_tasks.task_status_brief("+------------------------------------------------------------------------------------------+ | mysql_tasks.task_status_brief("TaskID");TaskID") | +------------------------------------------------------------------------------------------+ | {"data": null, "status": "COMPLETED", "message": "Execution finished.", "progress": 100} | +------------------------------------------------------------------------------------------+