MySQL AI User Guide
The VECTOR_STORE_LOAD
routine generates
vector embedding for the specified files or folders that are ,
and loads the embeddings into a new vector store table.
This topic contains the following sections:
To learn about the privileges you need to run this routine, see Section 5.3, “Required Privileges for using GenAI”.
mysql>CALL sys.VECTOR_STORE_LOAD('
URI
'[,options
]);options
: JSON_OBJECT(keyvalue
[,keyvalue
]...)keyvalue
: { 'format', 'Format
' |'schema_name', 'SchemaName
' |'table_name', 'TableName
' |'language', 'Language
' |'embed_model_id', 'ModelID
' |'description', 'Description
' |'ocr', {true|false} }
Following are VECTOR_STORE_LOAD
parameters:
URI
: specifies the unique
reference index (URI) of the files or folders to be
ingested into the vector store.
A URI is considered to be one of the following:
A
glob
pattern, if it contains at least one unescaped
?
or *
character.
A prefix, if it is not a pattern and ends with a
/
character like a folder path.
A file path, if it is neither a glob pattern nor a prefix.
options
: specifies optional parameters
as key-value pairs in JSON format. It can include the
following parameters:
format
: specifies the format of
files to be loaded. Default value is
auto_unstructured
, which means
all supported types of files are loaded. Possible
values are pdf
,
pptx
, ppt
,
txt
, html
,
docx
, doc
, and
auto_unstructured
.
schema_name
: specifies the name
of the schema where the vector embeddings are to be
loaded. By default, this procedure uses the current
schema from the session.
table_name
: specifies the name of
the vector store table to create. By default, the
routine generates a unique table name with format
vector_store_data_x
, where
x
is a counter.
language
: specifies the text
content language used in the files to be ingested
into the vector store. To set the value of the
language
parameter, use the
two-letter ISO 639-1
code for the
language.
Default value is en
.
For possible values, to view the list of supported languages, see Section 5.4, “Supported LLM, Embedding Model, and Languages”.
embed_model_id
: specifies the
embedding model to use for encoding the text.
Default value is
multilingual-e5-small
.
For possible values, to view the list of available embedding models, see In-Database Embedding Model.
description
: specifies a
description of document collection being loaded.
Default value is NULL
.
ocr
: specifies whether to enable
or disable
Optical
Character Recognition (OCR). If set to
false
, disables OCR. Default
value is true
, which means OCR is
enabled by default. Default value is
true
.
Specifying the file to ingest, using the current database, auto-generated name for the vector store table, and default values for all options:
mysql> CALL sys.VECTOR_STORE_LOAD('file:///var/lib/mysql-files/demo-directory/heatwave-en.pdf', NULL);
Specifying the file to ingest, using the current database, and specifying the name of the vector store table to be created:
mysql> CALL sys.VECTOR_STORE_LOAD('file:///var/lib/mysql-files/demo-directory/heatwave-en.pdf', '{"table_name": "demo_embeddings"}');
Specifying additional options such the schema name, table
name, language, format, and table description in
VECTOR_STORE_LOAD
:
mysql> CALL sys.VECTOR_STORE_LOAD('file:///var/lib/mysql-files/german_files/de*', '{"schema_name": "demo_db", "table_name": "german_embeddings", "language": "de", "description": "Vector store table containing German PDF files."}');