About ONNX

7.1 About ONNX

ONNX is an open-source format designed for machine learning models. It ensures cross-platform compatibility. This format also supports major languages and frameworks, facilitating efficient model exchange.

The ONNX format allows for model serialization. It simplifies the exchange of models across various platforms. These platforms include cloud, web, edge, and mobile experiences on Microsoft Windows, Linux, Mac, iOS, and Android. ONNX models also offer flexibility to export and import model in many languages such as Python, C++, C#, and Java to name a few. The ONNX format is useful for compute-heavy tasks such as training machine learning models and data processing that often uses trained models. Many leading machine learning development frameworks such as TensorFlow, Pytorch, and Scikit-learn, offer the capability to convert models into the ONNX format.

Once you represent the models in the ONNX format, you can run them with the ONNX Runtime. The architecture of the ONNX Runtime is adaptable, enabling providers to modify or enhance how some operations are implemented to make better use of particular hardware, such as, Graphical Processing Units (GPUs), Single Instruction Multiple Data (SIMD) instruction sets or specialized libraries. To learn more on ONNX Runtime, see https://onnxruntime.ai/docs/.

The ONNX Runtime integration with Oracle Database lets you import ONNX-formatted models, including embedding models. To support embedding models, Oracle Machine Learning has introduced a new machine learning technique called embedding. If you do not have a pretrained model in ONNX format, Oracle offers a Python utility package that downloads a pretrained model, converts the model to ONNX format augmented with pre-processsing and post-processing operations and imports the ONNX format model to Oracle Database. To learn more on the Python utility tool, see Convert Pretrained Models to ONNX Format.

Oracle supports ONNX Runtime version 1.15.1.

Supported Machine Learning Functions for ONNX Runtime
Describes the supported machine learning functions to import pretrained models and perform scoring.
Supported Attribute Data Types
Discover the supported ONNX input data types mapped to SQL data types.
Supported Target Data Types
Discover the supported ONNX target data types mapped to SQL data types.
Custom ONNX Runtime Operations
If you are looking to customize a pretrained embedding model by augmenting with pre-processing and post-processing operations, Oracle supports tokenization of an embedding model as a pre-processing operation and pooling and normalization as post-processing custom ONNX Runtime operations for version 1.15.1.
Use PL/SQL Packages to Import Models
Use the DBMS_DATA_MINING.IMPORT_ONNX_MODEL procedure or the DBMS_VECTOR.LOAD_ONNX_MODEL procedure to import ONNX format models. You can then use the imported ONNX format models through a scoring function run by the in-database ONNX Runtime.
Supported SQL Scoring Functions
Supported scoring functions for in-database scoring of machine learning models imported in the ONNX format are listed.

Parent topic: Integration of ONNX Runtime

7.1.1 Supported Machine Learning Functions for ONNX Runtime

Describes the supported machine learning functions to import pretrained models and perform scoring.

The following are the supported machine learning functions:

Classification
Clustering
Embedding
Regression

Parent topic: About ONNX

7.1.2 Supported Attribute Data Types

Discover the supported ONNX input data types mapped to SQL data types.

Data Type	SQL Type	Supported ONNX Data Type
Numerical	`BINARY_DOUBLE` `NUMBER`	`float`, `int8`, `int16`, `int32`, `int64`, `uint8`, `uint16`, `uint32`, `uint64`
Categorical	`VARCHAR`	For `VARCHAR` type: `string`
Text	`VARCHAR2` `CLOB`	`string`
Vectors	`VECTOR(float32,<dimension>)`	`float`

The following data types are not supported:

complex64, complex128
float16, bfloat16
fp8
int4, uint4

Parent topic: About ONNX

7.1.3 Supported Target Data Types

Discover the supported ONNX target data types mapped to SQL data types.

Depending on the machine learning function, different scoring functions are used. Different scoring function for same machine learning function can produce different data types. A few points to note:

Classification models have different rules to determine the type of PREDICTION function to be used. If you are using PREDICTION_PROBABILITY, then BINARY_DOUBLE is returned. See labels in JSON Metadata Parameters for ONNX Models.
For an embedding model, the VECTOR_EMBEDDING function returns a VECTOR type.
For a regression model, VARCHAR is not a valid target type and BINARY_DOUBLE is returned.
For a clustering model, if you are using CLUSTERING_PROBABILITY andCLUSTER_DISTANCE, then BINARY_DOUBLE is returned.

To learn more, see JSON Metadata Parameters for ONNX Models

Machine Learning Function	SQL Function	SQL Type	Supported ONNX Target Output
Regression	`PREDICTION`	`BINARY_DOUBLE`	`regressionOutput`
Classification	`PREDICTION`	`VARCHAR2`	`classificationLabelOutput`
Classification	`PREDICTION`	`NUMBER`	`classificationLabelOutput`
Classification	`PREDICTION_PROBABILITY`	`BINARY_DOUBLE`	`classificationProbOutput`
Classification	`PREDICTION_SET`	`set of ( NUMBER , BINARY_DOUBLE )` `set of (target_type, BINARY_DOUBLE)`	NA
Clustering	`CLUSTER_PROBABILITY`	`BINARY_DOUBLE`	`clusteringProbOutput`
Clustering	`CLUSTER_DISTANCE`	`BINARY_DOUBLE`	`clusteringDistanceOutput`
Clustering	`CLUSTER_SET`	`set of ( NUMBER , BINARY_DOUBLE )`	NA
Embedding	`VECTOR_EMBEDDING`	`VECTOR( float32, n)`	`embeddingOutput`

Parent topic: About ONNX

7.1.4 Custom ONNX Runtime Operations

If you are looking to customize a pretrained embedding model by augmenting with pre-processing and post-processing operations, Oracle supports tokenization of an embedding model as a pre-processing operation and pooling and normalization as post-processing custom ONNX Runtime operations for version 1.15.1.

Oracle offers a Python utility that provides a mechanism to augment a pretrained model with tokenization, pooling and normalization. The Python utility can augment the model with pre-processing and post-processing operations and convert a pretrained model to an ONNX format. Models using any other custom operations will fail on import. For details on how to use the Python utility, see Convert Pretrained Models to ONNX Format.

Parent topic: About ONNX

7.1.5 Use PL/SQL Packages to Import Models

Use the DBMS_DATA_MINING.IMPORT_ONNX_MODEL procedure or the DBMS_VECTOR.LOAD_ONNX_MODEL procedure to import ONNX format models. You can then use the imported ONNX format models through a scoring function run by the in-database ONNX Runtime.

To import a pretrained ONNX format model, use IMPORT_ONNX_MODEL Procedure or LOAD_ONNX_MODEL Procedure.
To drop an ONNX model, use DROP_ONNX_MODEL. See also DROP_MODEL procedure.
A complete step-by-step example that illustrates these procedures is in Import ONNX Models and Generate Embeddings.

The DBMS_DATA_MINING.RENAME_MODEL procedure is also supported.

Most of the existing Oracle Machine Learning for SQL APIs are available to the ONNX models. As partitioning is not applicable for external pretrained models, ONNX models do not support the following procedures:

ADD_PARTITION
DROP_PARTITION
ADD_COST_MATRIX
REMOVE_COST_MATRIX

Related Topics

Summary of DBMS_DATA_MINING Subprograms

Parent topic: About ONNX

7.1.6 Supported SQL Scoring Functions

Supported scoring functions for in-database scoring of machine learning models imported in the ONNX format are listed.

Machine Learning Technique	Operator	Supported	Return Type
Embedding	`VECTOR_EMBEDDING`	always	`VECTOR(<dimensions , FLOAT32>)` The number of dimensions of the output vector of a `VECTOR_EMBEDDING` operator is defined by the embedding models.
Regression	`PREDICTION`	always	Data type of the target. For regression, the data type is converted to `BINARY_DOUBLE` SQL type.
Classification	`PREDICTION`	always	Data type of the target.
Classification	`PREDICTION_PROBABILITY`	always	`BINARY_DOUBLE`
Classification	`PREDICTION_SET`	always	Set of `( t, NUMBER , BINARY_DOUBLE )` where `t` is the data type of the target.
Clustering	`CLUSTER_ID`	only if `clusteringProbOutput` is specified	`NUMBER`
Clustering	`CLUSTER_PROBABILITY`	only if `clusteringProbOutput` is specified	`BINARY_DOUBLE`
Clustering	`CLUSTER_SET`	only if `clusteringProbOutput` is specified	Set of `( NUMBER, BINARY_DOUBLE )`
Clustering	`CLUSTER_DISTANCE`	only if `clusteringDistanceOutput` is specified	`BINARY_DOUBLE`

Note:

You can define the outputs explicitly in the metadata or implicitly.

The metadata must explicitly specify how to find the result in the model output for some SQL scoring functions. For example, CLUSTER_PROBABILITY is supported only if clusteringProbOutput is specified in the metadata.
The system automatically assumes the output for a model with only one output if you don't specify it in the metadata.
If a scoring function does not comply according to the description provided, you will receive an ORA-40290 error when performing the scoring operation on your data. Additionally, any unsupported scoring functions will raise the ORA-40290 error.

To learn more about classification data types that are returned, see labels and classificationLabelOutput in JSON Metadata Parameters for ONNX Models.

Cost Matrix Clause

Specify a cost matrix directly within the PREDICTION and PREDICTION_SET scoring functions. To learn more about Cost Matrix, see Oracle Machine Learning for SQL Concepts.

Parent topic: About ONNX