Embedding

8.1 About Vector Embeddings

Transformer models, also known as embedding models, are used to convert various types of data, such as words, sentences, documents, images, and more, into numerical vectors that capture their semantics.

These vectors are represented as points in a multidimensional space, where the proximity of points reflects the semantic similarity of the data they represent. Put differently, vector embeddings are a way of representing various types of data, like text, images, videos, or music, as points in a multidimensional space. The locations of these points and their proximity to others are semantically meaningful. This transformation enables machine learning algorithms to process and analyze data more effectively, and compute various distance metrics to find similar content. Creating vector embeddings involves training machine learning models, often neural networks, on large data sets to learn patterns and relationships within the data. This process transforms the data into numerical vectors, each uniquely representing a data point in a high-dimensional vector space. Applications of vector embeddings span a wide range of fields, particularly in natural language processing (NLP), search engines, and recommendation systems to name a few.

Parent topic: Embedding

8.2 Pretrained Models for Generating Embeddings

Many pretrained models exist that generate embeddings for various data types, such as words, text sentences, images, and so on. These pretrained models often require pre-processing or post-processing operations or both.

As an example, most models for creating sentence embeddings from text require a pre-processing step called tokenization. Tokenization is a process to convert a sequence of text into smaller parts, called tokens. The embedding model then processes the tokens as input. Further post-processing might also be necessary for the output of these pretrained sentence transformers. One such post-processing operation is pooling. Pooling in text embeddings is a technique used to aggregate and reduce the dimensionality of individual word or token embeddings within a text sequence. This process involves combining the features of multiple embeddings to form a single, fixed-size representation of the entire text. For example, pooling methods can be employed to perform aggregation functions such as mean, max, or others. Another post-processing operation is normalization. Normalization in text embeddings is a process that adjusts the individual embeddings to have a uniform scale or distribution. This step involves transforming the embeddings so that they conform to a specific structure, often aiming to have a consistent length or scale across the data set.

Therefore, you need to use pretrained models that are augmented with pre-processing and post-processing operations to generate embeddings. This document illustrates examples that use the my_embedding_model.onnx model as an augmented ONNX format model. If you want to download and convert a pretrained model to an ONNX format model and augment the model with pre-processing and post-processing steps, see Import ONNX Models and Generate Embeddings.

Parent topic: Embedding

8.3 Data Types for ONNX Embedding Models

ONNX defines its own data types. When you import ONNX models into Oracle Database, their data types are automatically mapped to SQL data types.

Attribute Data Type for ONNX Embedding Models
For a text embedding model, the input is a string. Therefore, the supported data type are VARCHAR2, CLOB, NVARCHAR2, and NCLOB. This means that there is a limit on the size of input strings to 4000 bytes (32767 bytes if the maximum string size is set to extended).
Target Data Type for ONNX Embedding Models
The output of a text embedding model is an embedding vector. Therefore, the target data type is VECTOR. Use the VECTOR_EMBEDDING SQL scoring function to generate vectors from an embedding model.

Parent topic: Embedding

8.3.1 Attribute Data Type for ONNX Embedding Models

For a text embedding model, the input is a string. Therefore, the supported data type are VARCHAR2, CLOB, NVARCHAR2, and NCLOB. This means that there is a limit on the size of input strings to 4000 bytes (32767 bytes if the maximum string size is set to extended).

The USER_MINING_MODEL_ATTRIBUTES view reports the SQL data types for the input of a model.

For example, the USER_MINING_MODEL_ATTRIBUTES view reports selected attributes for a model. The input data type of an embedding model here is VARCHAR2.

SELECT model_name, attribute_name, attribute_type, data_type, vector_info
FROM user_mining_model_attributes
WHERE model_name = 'DOC_MODEL'
ORDER BY ATTRIBUTE_NAME;

The output is as follows:

MODEL_NAME           ATTRIBUTE_NAME       ATTRIBUTE_TYPE       DATA_TYPE  VECTOR_INFO
-------------------- -------------------- -------------------- ---------- ---------------
DOC_MODEL                INPUT_STRING         TEXT                 VARCHAR2
DOC_MODEL                ORA$ONNXTARGET       VECTOR               VECTOR     VECTOR(128,FLOA
                                                                          T32)

Parent topic: Data Types for ONNX Embedding Models

8.3.2 Target Data Type for ONNX Embedding Models

The output of a text embedding model is an embedding vector. Therefore, the target data type is VECTOR. Use the VECTOR_EMBEDDING SQL scoring function to generate vectors from an embedding model.

For more detail on VECTOR data type, see Create Tables Using the VECTOR Data Type. To learn more about VECTOR_EMBEDDING SQL operator, see Oracle Database SQL Language Reference.

Parent topic: Data Types for ONNX Embedding Models

8.4 Examples: Static Data Dictionary Views

You can use the Oracle Machine Learning static data dictionary views to view information such as available models, attributes of an ONNX embedding model and others. Values to support ONNX embedding models have been added.

Database administrator (DBA) and USER versions of the views are also available.

This section lists the examples of the impacted data dictionary views of ONNX embedding model.

Example: ALL_MINING_MODEL_ATTRIBUTES
You, as a current user, can view the attributes of a machine learning model by querying the ALL_MINING_MODEL_ATTRIBUTES view.
Example: ALL_MINING_MODELS
You can check machine learning models available to you as a current user by querying the ALL_MINING_MODELS view.

Parent topic: Embedding

8.4.1 Example: ALL_MINING_MODEL_ATTRIBUTES

You, as a current user, can view the attributes of a machine learning model by querying the ALL_MINING_MODEL_ATTRIBUTES view.

Here is an example of the model attributes of an embedding model. The name of the ONNX embedding model is DOC_MODEL:

SELECT model_name, attribute_name, attribute_type, data_type, vector_info
FROM user_mining_model_attributes
WHERE model_name = 'DOC_MODEL'
ORDER BY ATTRIBUTE_NAME;

The output is as follows:


MODEL_NAME           ATTRIBUTE_NAME       ATTRIBUTE_TYPE       DATA_TYPE  VECTOR_INFO
-------------------- -------------------- -------------------- ---------- ---------------
DOC_MODEL                INPUT_STRING         TEXT                 VARCHAR2
DOC_MODEL                ORA$ONNXTARGET       VECTOR               VECTOR     VECTOR(128,FLOA
                                                                          T32)

8.4.2 Example: ALL_MINING_MODELS

You can check machine learning models available to you as a current user by querying the ALL_MINING_MODELS view.

Here is an example of model details of an embedding model. The name of the ONNX embedding model is DOC_MODEL:

SELECT MODEL_NAME, MINING_FUNCTION, ALGORITHM,
ALGORITHM_TYPE, MODEL_SIZE
FROM user_mining_models
WHERE model_name = 'DOC_MODEL'
ORDER BY MODEL_NAME;

The output is as follows:


MODEL_NAME           MINING_FUNCTION                ALGORITHM            ALGORITHM_ MODEL_SIZE
-------------------- ------------------------------ -------------------- ---------- ----------
DOC_MODEL                EMBEDDING                      ONNX                 NATIVE       17762137

8.5 Scoring: Generate Vector Embeddings

After importing the ONNX embedding model into the Database, you can generate embedding vectors using the VECTOR_EMBEDDING SQL scoring function.

The VECTOR_EMBEDDING SQL scoring function returns VECTOR(dimension, type). The embedding models define the number of dimensions of the output vector of the VECTOR_EMBEDDING operator. To learn about the VECTOR_EMBEDDING SQL scoring operator, see VECTOR_EMBEDDING.

Example

The following example generates vector embeddings with "hello" as the input, utilizing the pretrained ONNX format model my_embedding_model.onnx imported into the Database. For complete example, see Import ONNX Models and Generate Embeddings.

SELECT TO_VECTOR(VECTOR_EMBEDDING(doc_model USING 'hello' as data)) AS embedding;
--------------------------------------------------------------------------------
[-9.76553112E-002,-9.89954844E-002,7.69771636E-003,-4.16760892E-003,-9.69305634E-002,
-3.01141385E-002,-2.63396613E-002,-2.98553891E-002,5.96499592E-002,4.13885899E-002,
5.32859489E-002,6.57707453E-002,-1.47056757E-002,-4.18472625E-002,4.1588001E-002,
-2.86354572E-002,-7.56499246E-002,-4.16395674E-003,-1.52879998E-001,6.60010576E-002,
-3.9013084E-002,3.15719917E-002,1.2428958E-002,-2.47651711E-002,-1.16851285E-001,
-7.82847106E-002,3.34323719E-002,8.03267583E-002,1.70483496E-002,-5.42407483E-002,
6.54291287E-002,-4.81935125E-003,6.11041225E-002,6.64106477E-003,-5.47

Note:

You can define the outputs explicitly in the metadata or implicitly. The system assumes a single output for a model if you don't specify the output in the metadata.

If a scoring function does not comply as per the description in Supported SQL Scoring Functions, you will receive an ORA-40290 error when performing the scoring operation on your data. Additionally, any unsupported scoring functions will raise the ORA-40290 error.

Treatment of Missing Data During Scoring
ONNX does not support representation for non-existent values; that is, there is no equivalent to NULL for SQL.

8.5.1 Treatment of Missing Data During Scoring

ONNX does not support representation for non-existent values; that is, there is no equivalent to NULL for SQL.

Further, if the input values are not specified, then the ONNX embedding models fail to run.

Absent attribute: If fewer attributes are used for scoring than were specified during model import (input), then you receive an error when you perform scoring. That is, if at least one of the input value is not specified in the USING clause of a scoring operator with ONNX model, then the query will not compile.
NULL attribute: If any of the attributes has a NULL value, then the scoring operator does not perform inference of the model with the ONNX Runtime and returns a NULL result immediately. If you want to change this behavior, then provide an appropriate replacement to the NULL value, either by using an NVL expression as input attribute (for example, NVL(input_attribute, default_value) AS input_attribute);) or by specifying a default value for this input attribute using the JSON metadata when importing the model.

Parent topic: Scoring: Generate Vector Embeddings

8.6 Import ONNX Models and Generate Embeddings

Learn to import a pretrained embedding model that is in ONNX format and generate vector embeddings.

Follow the steps below to import a pertained ONNX formatted embedding model into the Oracle Database.

Prepare Your Data Dump Directory

Prepare your data dump directory and provide the necessary access and privileges to dmuser.

Choose from:
1. If you already have a pretrained ONNX embedding model, store it in your working folder.
2. If you do not have pretrained embedding model in ONNX format, perform the steps listed in Convert Pretrained Models to ONNX Format.
Login to SQL*Plus as SYSDBA in your PDB.
```
CONN sys/<password>@pdb as sysdba;
```

Grant the DB_DEVELOPER_ROLE to dmuser.


GRANT DB_DEVELOPER_ROLE TO dmuser identified by <password>;

Grant CREATE MINING MODEL privilege to dmuser.
```
GRANT create mining model TO dmuser;
```
Set your working folder as the data dump directory (DM_DUMP) to load the ONNX embedding model.
```
CREATE OR REPLACE DIRECTORY DM_DUMP as '<work directory path>';
```
Grant READ permissions on the DM_DUMP directory to dmuser.
```
GRANT READ ON DIRECTORY dm_dump TO dmuser;
```
Grant WRITE permissions on the DM_DUMP directory to dmuser.
```
GRANT WRITE ON DIRECTORY dm_dump TO dmuser;
```

Drop the model if it already exits.

exec DBMS_VECTOR.DROP_ONNX_MODEL(model_name => 'doc_model', force => true);

Import ONNX Model Into the Database

You created a data dump directory and now you load the ONNX model into the Database. Use the DBMS_VECTOR.LOAD_ONNX_MODEL procedure to load the model. The DBMS_VECTOR.LOAD_ONNX_MODEL procedure facilitates the process of importing ONNX format model into the Oracle Database. In this example, the procedure loads an ONNX model file, named my_embedding_model.onnx from the DM_DUMP directory, into the Database as doc_model, specifying its use for embedding tasks.

Connect as dmuser.
```
CONN dmuser/<password>@<pdbname>;
```
Load the ONNX model into the Database.

If the ONNX model to be imported already includes an output tensor named embeddingOutput and an input string tensor named data, JSON metadata is unnecessary. Embedding models converted from OML4Py follow this convention and can be imported without the JSON metadata.
```
EXECUTE DBMS_VECTOR.LOAD_ONNX_MODEL(
  'DM_DUMP',
 'my_embedding_model.onnx',
 'doc_model');
```
Alternately, you can load the ONNX embedding model by specifying the JSON metadata.
```
EXECUTE DBMS_VECTOR.LOAD_ONNX_MODEL(
	'DM_DUMP', 
	'my_embedding_model.onnx', 
	'doc_model', 
	JSON('{"function" : "embedding", "embeddingOutput" : "embedding", "input": {"input": ["DATA"]}}'));
```

The procedure LOAD_ONNX_MODEL declares these parameters:

DM_DUMP: specifies the directory name of the data dump.

Note:
Ensure that the DM_DUMP directory is defined.
my_embedding_model: is a VARCHAR2 type parameter that specifies the name of the ONNX model.
doc_model: This parameter is a user-specified name under which the model is stored in the Oracle Database.
The JSON metadata associated with the ONNX model is declared as:

"function" : "embedding": Indicates the function name for text embedding model.

"embeddingOutput" : "embedding": Specifies the output variable which contains the embedding results.
"input": {"input": ["DATA"]}: Specifies a JSON object ("input") that describes the input expected by the model. It specifies that there is an input named "input", and its value should be an array with one element, "DATA". This indicates that the model expects a single string input to generate embeddings.

See LOAD_ONNX_MODEL Procedure to learn about the PL/SQL procedure.

Query Model Statistics

You can view model attributes and learn about the model by querying machine learning dictionary views and model detail views.

Note:

DOC_MODEL is the user-specified name of the embedding text model.

Query USER_MINING_MODEL_ATTRIBUTES view.

SELECT model_name, attribute_name, attribute_type, data_type, vector_info
FROM user_mining_model_attributes
WHERE model_name = 'DOC_MODEL'
ORDER BY ATTRIBUTE_NAME;

To learn about USER_MINING_MODEL_ATTRIBUTES view, see USER_MINING_MODEL_ATTRIBUTES.

Query USER_MINING_MODELS view.

SELECT MODEL_NAME, MINING_FUNCTION, ALGORITHM,
ALGORITHM_TYPE, MODEL_SIZE
FROM user_mining_models
WHERE model_name = 'DOC_MODEL'
ORDER BY MODEL_NAME;

To learn about USER_MINING_MODELS view, see USER_MINING_MODELS.

Check model statistics by viewing the model detail views. Query the DM$VMDOC_MODEL view.
```
SELECT * FROM DM$VMDOC_MODEL ORDER BY NAME;
```
To learn about model details views for ONNX embedding models, see Model Details Views for ONNX Models.

Query the DM$VPDOC_MODEL model detail view.

SELECT * FROM DM$VPDOC_MODEL ORDER BY NAME;

Query the DM$VJDOC_MODEL model detail view.
```
SELECT * FROM DM$VJDOC_MODEL;
```

Generate Embeddings

Apply the model and generate vector embeddings for your input. Here, the input is hello.

Generate vector embeddings using the VECTOR_EMBEDDING function.

SELECT TO_VECTOR(VECTOR_EMBEDDING(doc_model USING 'hello' as data)) AS embedding;

To learn about the VECTOR_EMBEDDING SQL function, see VECTOR_EMBEDDING. You can use the UTL_TO_EMBEDDING function in the DBMS_VECTOR_CHAIN PL/SQL package to generate vector embeddings generically through REST endpoints. To explore these functions, see the example Convert Text String to Embedding.

Example: Importing a Pretrained ONNX Model to Oracle Database

The following presents a comprehensive step-by-step example of importing ONNX embedding and generating vector embeddings.

conn sys/<password>@pdb as sysdba
grant db_developer_role to dmuser identified by dmuser;
grant create mining model to dmuser;
 
create or replace directory DM_DUMP as '<work directory path>';
grant read on directory dm_dump to dmuser;
grant write on directory dm_dump to dmuser;
>conn dmuser/<password>@<pdbname>;

–- Drop the model if it exits								  
exec DBMS_VECTOR.DROP_ONNX_MODEL(model_name => 'doc_model', force => true);

-- Load Model
EXECUTE DBMS_VECTOR.LOAD_ONNX_MODEL(
	'DM_DUMP', 
	'my_embedding_model.onnx', 
	'doc_model', 
	JSON('{"function" : "embedding", "embeddingOutput" : "embedding"}'));
/
 
--check the attributes view
set linesize 120
col model_name format a20
col algorithm_name format a20
col algorithm format a20
col attribute_name format a20
col attribute_type format a20
col data_type format a20 

SQL> SELECT model_name, attribute_name, attribute_type, data_type, vector_info
FROM user_mining_model_attributes
WHERE model_name = 'DOC_MODEL'
ORDER BY ATTRIBUTE_NAME;
 
 
OUTPUT:
 
MODEL_NAME           ATTRIBUTE_NAME       ATTRIBUTE_TYPE       DATA_TYPE  VECTOR_INFO
-------------------- -------------------- -------------------- ---------- ---------------
DOC_MODEL                INPUT_STRING         TEXT                 VARCHAR2
DOC_MODEL                ORA$ONNXTARGET       VECTOR               VECTOR     VECTOR(128,FLOA
                                                                          T32)
 
 
 
SQL> SELECT MODEL_NAME, MINING_FUNCTION, ALGORITHM,
ALGORITHM_TYPE, MODEL_SIZE
FROM user_mining_models
WHERE model_name = 'DOC_MODEL'
ORDER BY MODEL_NAME;
 
OUTPUT:
MODEL_NAME           MINING_FUNCTION                ALGORITHM            ALGORITHM_ MODEL_SIZE
-------------------- ------------------------------ -------------------- ---------- ----------
DOC_MODEL                EMBEDDING                      ONNX                 NATIVE       17762137
 
 
 
SQL> select * from DM$VMDOC_MODEL ORDER BY NAME;
 
OUTPUT:
NAME                                     VALUE
---------------------------------------- ----------------------------------------
Graph Description                        Graph combining g_8_torch_jit and torch_
                                         jit
                                         g_8_torch_jit
 
 
 
                                         torch_jit
 
 
Graph Name                               g_8_torch_jit_torch_jit
Input[0]                                 input:string[1]
Output[0]                                embedding:float32[?,128]
Producer Name                            onnx.compose.merge_models
Version                                  1
 
6 rows selected.
 
 
SQL> select * from DM$VPDOC_MODEL ORDER BY NAME;
 
OUTPUT:
NAME                                     VALUE
---------------------------------------- ----------------------------------------
batching                                 False
embeddingOutput                          embedding
 
 
SQL> select * from DM$VJDOC_MODEL;
 
OUTPUT:
METADATA
--------------------------------------------------------------------------------
{"function":"embedding","embeddingOutput":"embedding","input":{"input":["DATA"]}}
 
 
 
--apply the model
SQL> SELECT TO_VECTOR(VECTOR_EMBEDDING(doc_model USING 'hello' as data)) AS embedding;
  
--------------------------------------------------------------------------------
[-9.76553112E-002,-9.89954844E-002,7.69771636E-003,-4.16760892E-003,-9.69305634E-002,
-3.01141385E-002,-2.63396613E-002,-2.98553891E-002,5.96499592E-002,4.13885899E-002,
5.32859489E-002,6.57707453E-002,-1.47056757E-002,-4.18472625E-002,4.1588001E-002,
-2.86354572E-002,-7.56499246E-002,-4.16395674E-003,-1.52879998E-001,6.60010576E-002,
-3.9013084E-002,3.15719917E-002,1.2428958E-002,-2.47651711E-002,-1.16851285E-001,
-7.82847106E-002,3.34323719E-002,8.03267583E-002,1.70483496E-002,-5.42407483E-002,
6.54291287E-002,-4.81935125E-003,6.11041225E-002,6.64106477E-003,-5.47

Oracle AI Vector Search SQL Scenario

To learn how you can chunk database-concepts23ai.pdf and oracle-ai-vector-search-users-guide.pdf, generate vector embeddings, and perform similarity search using vector indexes, see Quick Start SQL.

Alternate Method to Import ONNX Models
Use the DBMS_DATA_MINING.IMPORT_ONNX_MODEL procedure to import the model and declare the input name. The following procedure uses a PL/SQL helper block that facilitates the process of importing ONNX format model into the Oracle Database. The function reads the model file from the server's file system and imports it into the Database.

Parent topic: Embedding

8.6.1 Alternate Method to Import ONNX Models

Use the DBMS_DATA_MINING.IMPORT_ONNX_MODEL procedure to import the model and declare the input name. The following procedure uses a PL/SQL helper block that facilitates the process of importing ONNX format model into the Oracle Database. The function reads the model file from the server's file system and imports it into the Database.

Perform the following steps to import ONNX model into the Database using DBMS_DATA_MINING PL/SQL package.

Connect as dmuser.
```
CONN dmuser/<password>@<pdbname>;
```

Run the following helper PL/SQL block:

DECLARE
    m_blob BLOB default empty_blob();
    m_src_loc BFILE ;
    BEGIN
    DBMS_LOB.createtemporary (m_blob, FALSE);
    m_src_loc := BFILENAME('DM_DUMP', 'my_embedding_model.onnx');
    DBMS_LOB.fileopen (m_src_loc, DBMS_LOB.file_readonly);
    DBMS_LOB.loadfromfile (m_blob, m_src_loc, DBMS_LOB.getlength (m_src_loc));
    DBMS_LOB.CLOSE(m_src_loc);
    DBMS_DATA_MINING.import_onnx_model ('doc_model', m_blob, JSON('{"function" : "embedding", "embeddingOutput" : "embedding", "input": {"input": ["DATA"]}}'));
    DBMS_LOB.freetemporary (m_blob);
    END;
    /

The code sets up a BLOB object and a BFILE locator, creates a temporary BLOB for storing the my_embedding_model.onnx file from the DM_DUMP directory, and reads its contents into the BLOB. It then closes the file and uses the content to import an ONNX model into the database with specified metadata, before releasing the temporary BLOB resources.

The schema of the IMPORT_ONNX_MODEL procedure is as follows:

DBMS_DATA_MINING.IMPORT_ONNX_MODEL(model_data, model_name,
      metadata)

. This procedure loads IMPORT_ONNX_MODEL from the DBMS_DATA_MINING package to import the ONNX model into the Database using the name provided in model_name, the BLOB content in m_blob, and the associated metadata.

doc_model: This parameter is a user-specified name under which the imported model is stored in the Oracle Database.
m_blob: This is a model data in BLOB that holds the ONNX representation of the model.
"function" : "embedding": Indicates the function name for text embedding model.
"embeddingOutput" : "embedding": Specifies the output variable which contains the embedding results.
"input": {"input": ["DATA"]}: Specifies a JSON object ("input") that describes the input expected by the model. It specifies that there is an input named "input", and its value should be an array with one element, "DATA". This indicates that the model expects a single string input to generate embeddings.

Alternately, the DBMS_DATA_MINING.IMPORT_ONNX_MODEL procedure can also accept a BLOB argument representing an ONNX file stored and loaded from OCI Object Storage. The following is an example to load an ONNX model stored in an OCI Object Storage.

DECLARE
  model_source BLOB := NULL;
BEGIN
  -- get BLOB holding onnx model 
  model_source := DBMS_CLOUD.GET_OBJECT(
    credential_name => 'myCredential',
    object_uri => 'https://objectstorage.us-phoenix -1.oraclecloud.com/' ||
      'n/namespace -string/b/bucketname/o/myONNXmodel.onnx'); 
 
  DBMS_DATA_MINING.IMPORT_ONNX_MODEL(
    "myonnxmodel",
    model_source,
    JSON('{ function : "embedding" })
  );
END;
/

This PL/SQL block starts by initializing a model_source variable as a BLOB type, initially set to NULL. It then retrieves an ONNX model from Oracle Cloud Object Storage using the DBMS_CLOUD.GET_OBJECT procedure, specifying the credentials (OBJ_STORE_CRED) and the URI of the model. The ONNX model resides in a specific bucket named bucketname in this case, and is accessible through the provided URL. Then, the script loads the ONNX model into the model_source BLOB. The DBMS_DATA_MINING.IMPORT_ONNX_MODEL procedure then imports this model into the Oracle Database as myonnxmodel. During the import, a JSON metadata specifies the model's function as embedding, for embedding operations.

See IMPORT_ONNX_MODEL Procedure and GET_OBJECT Procedure and Function to learn about the PL/SQL procedure.

Example: Importing a Pretrained ONNX Model to Oracle Database

The following presents a comprehensive step-by-step example of importing ONNX embedding and generating vector embeddings.

conn sys/<password>@pdb as sysdba
grant db_developer_role to dmuser identified by dmuser;
grant create mining model to dmuser;
 
create or replace directory DM_DUMP as '<work directory path>';
grant read on directory dm_dump to dmuser;
grant write on directory dm_dump to dmuser;
>conn dmuser/<password>@<pdbname>;

–- Drop the model if it exits								  
exec DBMS_VECTOR.DROP_ONNX_MODEL(model_name => 'doc_model', force => true);

-- Load Model
EXECUTE DBMS_VECTOR.LOAD_ONNX_MODEL(
	'DM_DUMP', 
	'my_embedding_model.onnx', 
	'doc_model', 
	JSON('{"function" : "embedding", "embeddingOutput" : "embedding"}'));
/
--Alternately, load the model
EXECUTE DBMS_DATA_MINING.IMPORT_ONNX_MODEL(
       'my_embedding_model.onnx',
	'doc_model', 
	JSON('{"function" : "embedding",
	"embeddingOutput" : "embedding",
	"input": {"input": ["DATA"]}}')
	);
 
--check the attributes view
set linesize 120
col model_name format a20
col algorithm_name format a20
col algorithm format a20
col attribute_name format a20
col attribute_type format a20
col data_type format a20 

SQL> SELECT model_name, attribute_name, attribute_type, data_type, vector_info
FROM user_mining_model_attributes
WHERE model_name = 'DOC_MODEL'
ORDER BY ATTRIBUTE_NAME;
 
 
OUTPUT:
 
MODEL_NAME           ATTRIBUTE_NAME       ATTRIBUTE_TYPE       DATA_TYPE  VECTOR_INFO
-------------------- -------------------- -------------------- ---------- ---------------
DOC_MODEL                INPUT_STRING         TEXT                 VARCHAR2
DOC_MODEL                ORA$ONNXTARGET       VECTOR               VECTOR     VECTOR(128,FLOA
                                                                          T32)
 
 
 
SQL> SELECT MODEL_NAME, MINING_FUNCTION, ALGORITHM,
ALGORITHM_TYPE, MODEL_SIZE
FROM user_mining_models
WHERE model_name = 'DOC_MODEL'
ORDER BY MODEL_NAME;
 
OUTPUT:
MODEL_NAME           MINING_FUNCTION                ALGORITHM            ALGORITHM_ MODEL_SIZE
-------------------- ------------------------------ -------------------- ---------- ----------
DOC_MODEL                EMBEDDING                      ONNX                 NATIVE       17762137
 
 
 
SQL> select * from DM$VMDOC_MODEL ORDER BY NAME;
 
OUTPUT:
NAME                                     VALUE
---------------------------------------- ----------------------------------------
Graph Description                        Graph combining g_8_torch_jit and torch_
                                         jit
                                         g_8_torch_jit
 
 
 
                                         torch_jit
 
 
Graph Name                               g_8_torch_jit_torch_jit
Input[0]                                 input:string[1]
Output[0]                                embedding:float32[?,128]
Producer Name                            onnx.compose.merge_models
Version                                  1
 
6 rows selected.
 
 
SQL> select * from DM$VPDOC_MODEL ORDER BY NAME;
 
OUTPUT:
NAME                                     VALUE
---------------------------------------- ----------------------------------------
batching                                 False
embeddingOutput                          embedding
 
 
SQL> select * from DM$VJDOC_MODEL;
 
OUTPUT:
METADATA
--------------------------------------------------------------------------------
{"function":"embedding","embeddingOutput":"embedding","input":{"input":["DATA"]}}
 
 
 
--apply the model
SQL> SELECT TO_VECTOR(VECTOR_EMBEDDING(doc_model USING 'hello' as data)) AS embedding;
  
--------------------------------------------------------------------------------
[-9.76553112E-002,-9.89954844E-002,7.69771636E-003,-4.16760892E-003,-9.69305634E-002,
-3.01141385E-002,-2.63396613E-002,-2.98553891E-002,5.96499592E-002,4.13885899E-002,
5.32859489E-002,6.57707453E-002,-1.47056757E-002,-4.18472625E-002,4.1588001E-002,
-2.86354572E-002,-7.56499246E-002,-4.16395674E-003,-1.52879998E-001,6.60010576E-002,
-3.9013084E-002,3.15719917E-002,1.2428958E-002,-2.47651711E-002,-1.16851285E-001,
-7.82847106E-002,3.34323719E-002,8.03267583E-002,1.70483496E-002,-5.42407483E-002,
6.54291287E-002,-4.81935125E-003,6.11041225E-002,6.64106477E-003,-5.47

Parent topic: Import ONNX Models and Generate Embeddings