10 Embedding
Explore embedding as a machine learning technique that transforms data in numeric dimensions that are represented as vectors to enable content similarity search and other applications.
10.1 About Vector Embeddings
Transformer models, also known as embedding models, are used to convert various types of data, such as words, sentences, documents, images, and more, into numerical vectors that capture their semantics.
These vectors are represented as points in a multidimensional space, where the proximity of points reflects the semantic similarity of the data they represent. Put differently, vector embeddings are a way of representing various types of data, like text, images, videos, or music, as points in a multidimensional space. The locations of these points and their proximity to others are semantically meaningful. This transformation enables machine learning algorithms to process and analyze data more effectively, and compute various distance metrics to find similar content. Creating vector embeddings involves training machine learning models, often neural networks, on large data sets to learn patterns and relationships within the data. This process transforms the data into numerical vectors, each uniquely representing a data point in a high-dimensional vector space. Applications of vector embeddings span a wide range of fields, particularly in natural language processing (NLP), search engines, and recommendation systems to name a few.
10.2 Pretrained Models for Generating Embeddings
Many pretrained models exist that generate embeddings for various data types, such as words, text sentences, images, and so on. These pretrained models often require pre-processing or post-processing operations or both.
As an example, most models for creating sentence embeddings from text require a pre-processing step called tokenization. Tokenization is a process to convert a sequence of text into smaller parts, called tokens. The embedding model then processes the tokens as input. Further post-processing might also be necessary for the output of these pretrained sentence transformers. One such post-processing operation is pooling. Pooling in text embeddings is a technique used to aggregate and reduce the dimensionality of individual word or token embeddings within a text sequence. This process involves combining the features of multiple embeddings to form a single, fixed-size representation of the entire text. For example, pooling methods can be employed to perform aggregation functions such as mean, max, or others. Another post-processing operation is normalization. Normalization in text embeddings is a process that adjusts the individual embeddings to have a uniform scale or distribution. This step involves transforming the embeddings so that they conform to a specific structure, often aiming to have a consistent length or scale across the data set.
Therefore, you need to use pretrained models that are augmented with
pre-processing and post-processing operations to generate embeddings. This document
illustrates examples that use the my_embedding_model.onnx
model as an
augmented ONNX format
model.
If you want to download and convert a pretrained model to an ONNX format model and
augment the model with pre-processing and post-processing steps, see Import ONNX Models and Generate Embeddings.
10.3 Data Types for ONNX Embedding Models
ONNX defines its own data types. When you import ONNX models into Oracle Database, their data types are automatically mapped to SQL data types.
10.3.1 Attribute Data Type for ONNX Embedding Models
For a text embedding model, the input is a string. Therefore, the
supported data type are VARCHAR2
, CLOB
,
NVARCHAR2
, and NCLOB
. This means that there is a limit
on the size of input strings to 4000 bytes (32767 bytes if the maximum string size is set to
extended).
The USER_MINING_MODEL_ATTRIBUTES
view reports the SQL data
types for the input of a model.
USER_MINING_MODEL_ATTRIBUTES
view reports selected attributes for a
model. The input data type of an embedding model here is VARCHAR2
.
SELECT model_name, attribute_name, attribute_type, data_type, vector_info
FROM user_mining_model_attributes
WHERE model_name = 'DOC_MODEL'
ORDER BY ATTRIBUTE_NAME;
The
output is as
follows:MODEL_NAME ATTRIBUTE_NAME ATTRIBUTE_TYPE DATA_TYPE VECTOR_INFO
-------------------- -------------------- -------------------- ---------- ---------------
DOC_MODEL INPUT_STRING TEXT VARCHAR2
DOC_MODEL ORA$ONNXTARGET VECTOR VECTOR VECTOR(128,FLOA
T32)
10.3.2 Target Data Type for ONNX Embedding Models
The output of a text embedding model is an embedding vector.
Therefore, the target data type is VECTOR
. Use the
VECTOR_EMBEDDING
SQL scoring function to generate vectors
from an embedding model.
For more detail on VECTOR
data type, see Create Tables Using the
VECTOR Data Type. To learn more about
VECTOR_EMBEDDING
SQL operator, see Oracle Database SQL
Language Reference.
10.4 Examples: Static Data Dictionary Views
You can use the Oracle Machine Learning static data dictionary views to view information such as available models, attributes of an ONNX embedding model and others. Values to support ONNX embedding models have been added.
Database administrator (DBA) and USER versions of the views are also available.
This section lists the examples of the impacted data dictionary views of ONNX embedding model.
10.4.1 Example: ALL_MINING_MODEL_ATTRIBUTES
You, as a current user, can view the attributes of a machine
learning model by querying the ALL_MINING_MODEL_ATTRIBUTES
view.
Here is an example of the model attributes of an embedding model. The name of the
ONNX embedding model is DOC_MODEL
:
SELECT model_name, attribute_name, attribute_type, data_type, vector_info
FROM user_mining_model_attributes
WHERE model_name = 'DOC_MODEL'
ORDER BY ATTRIBUTE_NAME;
The output is as follows:
MODEL_NAME ATTRIBUTE_NAME ATTRIBUTE_TYPE DATA_TYPE VECTOR_INFO
-------------------- -------------------- -------------------- ---------- ---------------
DOC_MODEL INPUT_STRING TEXT VARCHAR2
DOC_MODEL ORA$ONNXTARGET VECTOR VECTOR VECTOR(128,FLOA
T32)
10.4.2 Example: ALL_MINING_MODELS
You can check machine learning models available to you as a current user by
querying the ALL_MINING_MODELS
view.
Here is an example of model details of an embedding model. The name of the ONNX
embedding model is DOC_MODEL
:
SELECT MODEL_NAME, MINING_FUNCTION, ALGORITHM,
ALGORITHM_TYPE, MODEL_SIZE
FROM user_mining_models
WHERE model_name = 'DOC_MODEL'
ORDER BY MODEL_NAME;
The output is as follows:
MODEL_NAME MINING_FUNCTION ALGORITHM ALGORITHM_ MODEL_SIZE
-------------------- ------------------------------ -------------------- ---------- ----------
DOC_MODEL EMBEDDING ONNX NATIVE 17762137
10.5 Scoring: Generate Vector Embeddings
After importing the ONNX embedding model into the Database, you can
generate embedding vectors using the VECTOR_EMBEDDING
SQL scoring function.
The VECTOR_EMBEDDING
SQL scoring function returns
VECTOR(dimension, type)
. The embedding models define the number of
dimensions of the output vector of the VECTOR_EMBEDDING
operator. To
learn about the VECTOR_EMBEDDING
SQL scoring operator, see VECTOR_EMBEDDING.
Example
The following example generates vector embeddings with "hello" as the
input, utilizing the pretrained ONNX format model
my_embedding_model.onnx
imported into the Database. For
complete example, see Import ONNX Models into Oracle Database End-to-End Example.
SELECT TO_VECTOR(VECTOR_EMBEDDING(doc_model USING 'hello' as data)) AS embedding;
--------------------------------------------------------------------------------
[-9.76553112E-002,-9.89954844E-002,7.69771636E-003,-4.16760892E-003,-9.69305634E-002,
-3.01141385E-002,-2.63396613E-002,-2.98553891E-002,5.96499592E-002,4.13885899E-002,
5.32859489E-002,6.57707453E-002,-1.47056757E-002,-4.18472625E-002,4.1588001E-002,
-2.86354572E-002,-7.56499246E-002,-4.16395674E-003,-1.52879998E-001,6.60010576E-002,
-3.9013084E-002,3.15719917E-002,1.2428958E-002,-2.47651711E-002,-1.16851285E-001,
-7.82847106E-002,3.34323719E-002,8.03267583E-002,1.70483496E-002,-5.42407483E-002,
6.54291287E-002,-4.81935125E-003,6.11041225E-002,6.64106477E-003,-5.47
Note:
You can define the outputs explicitly in the metadata or implicitly. The system assumes a single output for a model if you don't specify the output in the metadata.
If a scoring function does not comply as per the description in Supported SQL Scoring Functions, you will receive an ORA-40290 error when performing the scoring operation on your data. Additionally, any unsupported scoring functions will raise the ORA-40290 error.
See Also:
A complete list of SQL scoring functions supported for ONNX models, in Oracle Machine Learning for SQL User’s Guide.10.5.1 Treatment of Missing Data During Scoring
ONNX does not support representation for non-existent values; that is,
there is no equivalent to NULL
for SQL.
Further, if the input values are not specified, then the ONNX embedding models fail to run.
- Absent attribute: If fewer attributes are used for scoring than were
specified during model import (input), then you receive an error when you
perform scoring. That is, if at least one of the input value is not specified in
the
USING
clause of a scoring operator with ONNX model, then the query will not compile. NULL
attribute: If any of the attributes has aNULL
value, then the scoring operator does not perform inference of the model with the ONNX Runtime and returns aNULL
result immediately. If you want to change this behavior, then provide an appropriate replacement to the NULL value, either by using anNVL
expression as input attribute (for example,NVL(input_attribute, default_value) AS input_attribute);
) or by specifying a default value for this input attribute using the JSON metadata when importing the model.
10.6 Import ONNX Models into Oracle Database End-to-End Example
Learn to import a pretrained embedding model that is in ONNX format and generate vector embeddings.
Follow the steps below to import a pertained ONNX formatted embedding model into the Oracle Database.
Prepare Your Data Dump Directory
Prepare your data dump directory and provide the necessary access and
privileges to dmuser
.
- Choose from:
-
If you already have a pretrained ONNX embedding model, store it in your working folder.
-
If you do not have pretrained embedding model in ONNX format, perform the steps listed in Convert Pretrained Models to ONNX Format.
-
-
Login to SQL*Plus as
SYSDBA
in your PDB.CONN sys/<password>@pdb as sysdba;
-
Grant the
DB_DEVELOPER_ROLE
todmuser
.GRANT DB_DEVELOPER_ROLE TO dmuser identified by <password>;
-
Grant
CREATE MINING MODEL
privilege todmuser
.GRANT create mining model TO dmuser;
-
Set your working folder as the data dump directory (
DM_DUMP
) to load the ONNX embedding model.CREATE OR REPLACE DIRECTORY DM_DUMP as '<work directory path>';
-
Grant
READ
permissions on theDM_DUMP
directory todmuser
.GRANT READ ON DIRECTORY dm_dump TO dmuser;
-
Grant
WRITE
permissions on theDM_DUMP
directory todmuser
.GRANT WRITE ON DIRECTORY dm_dump TO dmuser;
-
Drop the model if it already exits.
exec DBMS_VECTOR.DROP_ONNX_MODEL(model_name => 'doc_model', force => true);
Import ONNX Model Into the Database
You created a data dump directory and now you load the ONNX
model into the Database. Use the
DBMS_VECTOR.LOAD_ONNX_MODEL
procedure to load the model. The
DBMS_VECTOR.LOAD_ONNX_MODEL
procedure facilitates the process of
importing ONNX format model into the Oracle Database. In this example, the procedure loads
an ONNX model file, named my_embedding_model.onnx
from the
DM_DUMP
directory, into the Database as doc_model
,
specifying its use for embedding tasks.
-
Connect as
dmuser
.CONN dmuser/<password>@<pdbname>;
-
Load the ONNX model into the Database.
If the ONNX model to be imported already includes an output tensor named
embeddingOutput
and an input string tensor nameddata
, JSON metadata is unnecessary. Embedding models converted from OML4Py follow this convention and can be imported without the JSON metadata.EXECUTE DBMS_VECTOR.LOAD_ONNX_MODEL( 'DM_DUMP', 'my_embedding_model.onnx', 'doc_model');
Alternately, you can load the ONNX embedding model by specifying the JSON metadata.
EXECUTE DBMS_VECTOR.LOAD_ONNX_MODEL( 'DM_DUMP', 'my_embedding_model.onnx', 'doc_model', JSON('{"function" : "embedding", "embeddingOutput" : "embedding", "input": {"input": ["DATA"]}}'));
LOAD_ONNX_MODEL
declares these parameters:
-
DM_DUMP
: specifies the directory name of the data dump.Note:
Ensure that theDM_DUMP
directory is defined. -
my_embedding_model
: is aVARCHAR2
type parameter that specifies the name of the ONNX model. -
doc_model
: This parameter is a user-specified name under which the model is stored in the Oracle Database. -
The JSON metadata associated with the ONNX model is declared as:
"function" : "embedding"
: Indicates the function name for text embedding model."embeddingOutput" : "embedding"
: Specifies the output variable which contains the embedding results. -
"input": {"input": ["DATA"]}
: Specifies a JSON object ("input"
) that describes the input expected by the model. It specifies that there is an input named"input"
, and its value should be an array with one element,"DATA"
. This indicates that the model expects a single string input to generate embeddings.
See LOAD_ONNX_MODEL Procedure to learn about the PL/SQL procedure.
Query Model Statistics
Note:
DOC_MODEL is the user-specified name of the embedding text model.-
Query
USER_MINING_MODEL_ATTRIBUTES
view.SELECT model_name, attribute_name, attribute_type, data_type, vector_info FROM user_mining_model_attributes WHERE model_name = 'DOC_MODEL' ORDER BY ATTRIBUTE_NAME;
To learn about
USER_MINING_MODEL_ATTRIBUTES
view, see USER_MINING_MODEL_ATTRIBUTES. -
Query
USER_MINING_MODELS
view.SELECT MODEL_NAME, MINING_FUNCTION, ALGORITHM, ALGORITHM_TYPE, MODEL_SIZE FROM user_mining_models WHERE model_name = 'DOC_MODEL' ORDER BY MODEL_NAME;
To learn about
USER_MINING_MODELS
view, see USER_MINING_MODELS. -
Check model statistics by viewing the model detail views. Query the
DM$VMDOC_MODEL
view.SELECT * FROM DM$VMDOC_MODEL ORDER BY NAME;
To learn about model details views for ONNX embedding models, see Model Details Views for ONNX Models.
-
Query the
DM$VPDOC_MODEL
model detail view.SELECT * FROM DM$VPDOC_MODEL ORDER BY NAME;
-
Query the
DM$VJDOC_MODEL
model detail view.SELECT * FROM DM$VJDOC_MODEL;
Generate Embeddings
Apply the model and generate vector embeddings for your input. Here, the input is hello.
Generate vector embeddings using the VECTOR_EMBEDDING
function.
SELECT TO_VECTOR(VECTOR_EMBEDDING(doc_model USING 'hello' as data)) AS embedding;
To learn about the VECTOR_EMBEDDING
SQL
function, see VECTOR_EMBEDDING. You can use the
UTL_TO_EMBEDDING
function in the DBMS_VECTOR_CHAIN
PL/SQL package to generate vector embeddings generically through REST
endpoints. To explore
these functions, see the example Convert Text String to Embedding.
Example: Importing a Pretrained ONNX Model to Oracle Database
The following presents a comprehensive step-by-step example of importing ONNX embedding and generating vector embeddings.
conn sys/<password>@pdb as sysdba
grant db_developer_role to dmuser identified by dmuser;
grant create mining model to dmuser;
create or replace directory DM_DUMP as '<work directory path>';
grant read on directory dm_dump to dmuser;
grant write on directory dm_dump to dmuser;
>conn dmuser/<password>@<pdbname>;
–- Drop the model if it exits
exec DBMS_VECTOR.DROP_ONNX_MODEL(model_name => 'doc_model', force => true);
-- Load Model
EXECUTE DBMS_VECTOR.LOAD_ONNX_MODEL(
'DM_DUMP',
'my_embedding_model.onnx',
'doc_model',
JSON('{"function" : "embedding", "embeddingOutput" : "embedding"}'));
/
--check the attributes view
set linesize 120
col model_name format a20
col algorithm_name format a20
col algorithm format a20
col attribute_name format a20
col attribute_type format a20
col data_type format a20
SQL> SELECT model_name, attribute_name, attribute_type, data_type, vector_info
FROM user_mining_model_attributes
WHERE model_name = 'DOC_MODEL'
ORDER BY ATTRIBUTE_NAME;
OUTPUT:
MODEL_NAME ATTRIBUTE_NAME ATTRIBUTE_TYPE DATA_TYPE VECTOR_INFO
-------------------- -------------------- -------------------- ---------- ---------------
DOC_MODEL INPUT_STRING TEXT VARCHAR2
DOC_MODEL ORA$ONNXTARGET VECTOR VECTOR VECTOR(128,FLOA
T32)
SQL> SELECT MODEL_NAME, MINING_FUNCTION, ALGORITHM,
ALGORITHM_TYPE, MODEL_SIZE
FROM user_mining_models
WHERE model_name = 'DOC_MODEL'
ORDER BY MODEL_NAME;
OUTPUT:
MODEL_NAME MINING_FUNCTION ALGORITHM ALGORITHM_ MODEL_SIZE
-------------------- ------------------------------ -------------------- ---------- ----------
DOC_MODEL EMBEDDING ONNX NATIVE 17762137
SQL> select * from DM$VMDOC_MODEL ORDER BY NAME;
OUTPUT:
NAME VALUE
---------------------------------------- ----------------------------------------
Graph Description Graph combining g_8_torch_jit and torch_
jit
g_8_torch_jit
torch_jit
Graph Name g_8_torch_jit_torch_jit
Input[0] input:string[1]
Output[0] embedding:float32[?,128]
Producer Name onnx.compose.merge_models
Version 1
6 rows selected.
SQL> select * from DM$VPDOC_MODEL ORDER BY NAME;
OUTPUT:
NAME VALUE
---------------------------------------- ----------------------------------------
batching False
embeddingOutput embedding
SQL> select * from DM$VJDOC_MODEL;
OUTPUT:
METADATA
--------------------------------------------------------------------------------
{"function":"embedding","embeddingOutput":"embedding","input":{"input":["DATA"]}}
--apply the model
SQL> SELECT TO_VECTOR(VECTOR_EMBEDDING(doc_model USING 'hello' as data)) AS embedding;
--------------------------------------------------------------------------------
[-9.76553112E-002,-9.89954844E-002,7.69771636E-003,-4.16760892E-003,-9.69305634E-002,
-3.01141385E-002,-2.63396613E-002,-2.98553891E-002,5.96499592E-002,4.13885899E-002,
5.32859489E-002,6.57707453E-002,-1.47056757E-002,-4.18472625E-002,4.1588001E-002,
-2.86354572E-002,-7.56499246E-002,-4.16395674E-003,-1.52879998E-001,6.60010576E-002,
-3.9013084E-002,3.15719917E-002,1.2428958E-002,-2.47651711E-002,-1.16851285E-001,
-7.82847106E-002,3.34323719E-002,8.03267583E-002,1.70483496E-002,-5.42407483E-002,
6.54291287E-002,-4.81935125E-003,6.11041225E-002,6.64106477E-003,-5.47
Oracle AI Vector Search SQL Scenario
To learn how you can chunk database-concepts23ai.pdf and oracle-ai-vector-search-users-guide.pdf, generate vector embeddings, and perform similarity search using vector indexes, see Quick Start SQL.
10.6.1 Alternate Method to Import ONNX Models
Use the DBMS_DATA_MINING.IMPORT_ONNX_MODEL
procedure to
import the model and declare the input name. The following procedure uses a PL/SQL helper block
that facilitates the process of importing ONNX format model into the Oracle Database. The
function reads the model file from the server's file system and imports it into the Database.
DBMS_DATA_MINING
PL/SQL package.
-
Connect as
dmuser
.CONN dmuser/<password>@<pdbname>;
-
Run the following helper PL/SQL block:
DECLARE m_blob BLOB default empty_blob(); m_src_loc BFILE ; BEGIN DBMS_LOB.createtemporary (m_blob, FALSE); m_src_loc := BFILENAME('DM_DUMP', 'my_embedding_model.onnx'); DBMS_LOB.fileopen (m_src_loc, DBMS_LOB.file_readonly); DBMS_LOB.loadfromfile (m_blob, m_src_loc, DBMS_LOB.getlength (m_src_loc)); DBMS_LOB.CLOSE(m_src_loc); DBMS_DATA_MINING.import_onnx_model ('doc_model', m_blob, JSON('{"function" : "embedding", "embeddingOutput" : "embedding", "input": {"input": ["DATA"]}}')); DBMS_LOB.freetemporary (m_blob); END; /
The code sets up a
BLOB
object and aBFILE
locator, creates a temporaryBLOB
for storing themy_embedding_model.onnx
file from theDM_DUMP
directory, and reads its contents into theBLOB
. It then closes the file and uses the content to import an ONNX model into the database with specified metadata, before releasing the temporaryBLOB
resources.
IMPORT_ONNX_MODEL
procedure is as
follows: DBMS_DATA_MINING.IMPORT_ONNX_MODEL(model_data, model_name,
metadata)
. This procedure loads IMPORT_ONNX_MODEL
from the
DBMS_DATA_MINING
package to import the ONNX model into the Database using
the name provided in model_name
, the BLOB content in m_blob
,
and the associated metadata
.
-
doc_model
: This parameter is a user-specified name under which the imported model is stored in the Oracle Database. -
m_blob
: This is a model data inBLOB
that holds the ONNX representation of the model. -
"function" : "embedding"
: Indicates the function name for text embedding model. -
"embeddingOutput" : "embedding"
: Specifies the output variable which contains the embedding results. -
"input": {"input": ["DATA"]}
: Specifies a JSON object ("input"
) that describes the input expected by the model. It specifies that there is an input named"input"
, and its value should be an array with one element,"DATA"
. This indicates that the model expects a single string input to generate embeddings.
Alternately, the
DBMS_DATA_MINING.IMPORT_ONNX_MODEL
procedure can also accept a
BLOB
argument representing an ONNX file stored and loaded from OCI Object
Storage. The following is an example to load an ONNX model stored in an OCI Object
Storage.
DECLARE
model_source BLOB := NULL;
BEGIN
-- get BLOB holding onnx model
model_source := DBMS_CLOUD.GET_OBJECT(
credential_name => 'myCredential',
object_uri => 'https://objectstorage.us-phoenix -1.oraclecloud.com/' ||
'n/namespace -string/b/bucketname/o/myONNXmodel.onnx');
DBMS_DATA_MINING.IMPORT_ONNX_MODEL(
"myonnxmodel",
model_source,
JSON('{ function : "embedding" })
);
END;
/
This PL/SQL block starts by initializing a model_source
variable as a BLOB
type, initially set to NULL. It then retrieves an ONNX
model from Oracle Cloud Object Storage using the DBMS_CLOUD.GET_OBJECT
procedure, specifying the credentials (OBJ_STORE_CRED
) and the URI of the
model. The ONNX model resides in a specific bucket named bucketname
in this
case, and is accessible through the provided URL. Then, the script loads the ONNX model into
the model_source
BLOB. The
DBMS_DATA_MINING.IMPORT_ONNX_MODEL
procedure then imports this model into
the Oracle Database as myonnxmodel
. During the import, a JSON metadata
specifies the model's function as embedding
, for embedding
operations.
See IMPORT_ONNX_MODEL Procedure and GET_OBJECT Procedure and Function to learn about the PL/SQL procedure.
Example: Importing a Pretrained ONNX Model to Oracle Database
The following presents a comprehensive step-by-step example of importing ONNX embedding and generating vector embeddings.
conn sys/<password>@pdb as sysdba
grant db_developer_role to dmuser identified by dmuser;
grant create mining model to dmuser;
create or replace directory DM_DUMP as '<work directory path>';
grant read on directory dm_dump to dmuser;
grant write on directory dm_dump to dmuser;
>conn dmuser/<password>@<pdbname>;
–- Drop the model if it exits
exec DBMS_VECTOR.DROP_ONNX_MODEL(model_name => 'doc_model', force => true);
-- Load Model
EXECUTE DBMS_VECTOR.LOAD_ONNX_MODEL(
'DM_DUMP',
'my_embedding_model.onnx',
'doc_model',
JSON('{"function" : "embedding", "embeddingOutput" : "embedding"}'));
/
--Alternately, load the model
EXECUTE DBMS_DATA_MINING.IMPORT_ONNX_MODEL(
'my_embedding_model.onnx',
'doc_model',
JSON('{"function" : "embedding",
"embeddingOutput" : "embedding",
"input": {"input": ["DATA"]}}')
);
--check the attributes view
set linesize 120
col model_name format a20
col algorithm_name format a20
col algorithm format a20
col attribute_name format a20
col attribute_type format a20
col data_type format a20
SQL> SELECT model_name, attribute_name, attribute_type, data_type, vector_info
FROM user_mining_model_attributes
WHERE model_name = 'DOC_MODEL'
ORDER BY ATTRIBUTE_NAME;
OUTPUT:
MODEL_NAME ATTRIBUTE_NAME ATTRIBUTE_TYPE DATA_TYPE VECTOR_INFO
-------------------- -------------------- -------------------- ---------- ---------------
DOC_MODEL INPUT_STRING TEXT VARCHAR2
DOC_MODEL ORA$ONNXTARGET VECTOR VECTOR VECTOR(128,FLOA
T32)
SQL> SELECT MODEL_NAME, MINING_FUNCTION, ALGORITHM,
ALGORITHM_TYPE, MODEL_SIZE
FROM user_mining_models
WHERE model_name = 'DOC_MODEL'
ORDER BY MODEL_NAME;
OUTPUT:
MODEL_NAME MINING_FUNCTION ALGORITHM ALGORITHM_ MODEL_SIZE
-------------------- ------------------------------ -------------------- ---------- ----------
DOC_MODEL EMBEDDING ONNX NATIVE 17762137
SQL> select * from DM$VMDOC_MODEL ORDER BY NAME;
OUTPUT:
NAME VALUE
---------------------------------------- ----------------------------------------
Graph Description Graph combining g_8_torch_jit and torch_
jit
g_8_torch_jit
torch_jit
Graph Name g_8_torch_jit_torch_jit
Input[0] input:string[1]
Output[0] embedding:float32[?,128]
Producer Name onnx.compose.merge_models
Version 1
6 rows selected.
SQL> select * from DM$VPDOC_MODEL ORDER BY NAME;
OUTPUT:
NAME VALUE
---------------------------------------- ----------------------------------------
batching False
embeddingOutput embedding
SQL> select * from DM$VJDOC_MODEL;
OUTPUT:
METADATA
--------------------------------------------------------------------------------
{"function":"embedding","embeddingOutput":"embedding","input":{"input":["DATA"]}}
--apply the model
SQL> SELECT TO_VECTOR(VECTOR_EMBEDDING(doc_model USING 'hello' as data)) AS embedding;
--------------------------------------------------------------------------------
[-9.76553112E-002,-9.89954844E-002,7.69771636E-003,-4.16760892E-003,-9.69305634E-002,
-3.01141385E-002,-2.63396613E-002,-2.98553891E-002,5.96499592E-002,4.13885899E-002,
5.32859489E-002,6.57707453E-002,-1.47056757E-002,-4.18472625E-002,4.1588001E-002,
-2.86354572E-002,-7.56499246E-002,-4.16395674E-003,-1.52879998E-001,6.60010576E-002,
-3.9013084E-002,3.15719917E-002,1.2428958E-002,-2.47651711E-002,-1.16851285E-001,
-7.82847106E-002,3.34323719E-002,8.03267583E-002,1.70483496E-002,-5.42407483E-002,
6.54291287E-002,-4.81935125E-003,6.11041225E-002,6.64106477E-003,-5.47