7.1 About ONNX

ONNX is an open-source format designed for machine learning models. It ensures cross-platform compatibility. This format also supports major languages and frameworks, facilitating efficient model exchange.

The ONNX format allows for model serialization. It simplifies the exchange of models across various platforms. These platforms include cloud, web, edge, and mobile experiences on Microsoft Windows, Linux, Mac, iOS, and Android. ONNX models also offer flexibility to export and import model in many languages such as Python, C++, C#, and Java to name a few. The ONNX format is useful for compute-heavy tasks such as training machine learning models and data processing that often uses trained models. Many leading machine learning development frameworks such as TensorFlow, Pytorch, and Scikit-learn, offer the capability to convert models into the ONNX format.

Once you represent the models in the ONNX format, you can run them with the ONNX Runtime. The architecture of the ONNX Runtime is adaptable, enabling providers to modify or enhance how some operations are implemented to make better use of particular hardware, such as, Graphical Processing Units (GPUs), Single Instruction Multiple Data (SIMD) instruction sets or specialized libraries. To learn more on ONNX Runtime, see https://onnxruntime.ai/docs/.

The ONNX Runtime integration with Oracle Database lets you import ONNX-formatted models, including embedding models. To support embedding models, Oracle Machine Learning has introduced a new machine learning technique called embedding. If you do not have a pretrained model in ONNX format, Oracle offers a Python utility package that downloads a pretrained model, converts the model to ONNX format augmented with pre-processsing and post-processing operations and imports the ONNX format model to Oracle Database. To learn more on the Python utility tool, see Convert Pretrained Models to ONNX Format.

Oracle supports ONNX Runtime version 1.15.1.

7.1.1 Supported Machine Learning Functions for ONNX Runtime

Describes the supported machine learning functions to import pretrained models and perform scoring.

The following are the supported machine learning functions:

  • Classification
  • Clustering
  • Embedding
  • Regression

7.1.2 Supported Attribute Data Types

Discover the supported ONNX input data types mapped to SQL data types.

Data Type SQL Type Supported ONNX Data Type

Numerical

BINARY_DOUBLE

NUMBER

float, int8, int16, int32, int64, uint8, uint16, uint32, uint64

Categorical

VARCHAR

For VARCHAR type: string

Text

VARCHAR2

CLOB

string
Vectors

VECTOR(float32,<dimension>)

float

The following data types are not supported:

  • complex64, complex128

  • float16, bfloat16

  • fp8

  • int4, uint4

7.1.3 Supported Target Data Types

Discover the supported ONNX target data types mapped to SQL data types.

Depending on the machine learning function, different scoring functions are used. Different scoring function for same machine learning function can produce different data types. A few points to note:

  • Classification models have different rules to determine the type of PREDICTION function to be used. If you are using PREDICTION_PROBABILITY, then BINARY_DOUBLE is returned. See labels in JSON Metadata Parameters for ONNX Models.

  • For an embedding model, the VECTOR_EMBEDDING function returns a VECTOR type.

  • For a regression model, VARCHAR is not a valid target type and BINARY_DOUBLE is returned.

  • For a clustering model, if you are using CLUSTERING_PROBABILITY and CLUSTER_DISTANCE, then BINARY_DOUBLE is returned.

To learn more, see JSON Metadata Parameters for ONNX Models

Machine Learning Function SQL Function SQL Type Supported ONNX Target Output

Regression

PREDICTION

BINARY_DOUBLE regressionOutput

Classification

PREDICTION

VARCHAR2

classificationLabelOutput

Classification

PREDICTION

NUMBER

classificationLabelOutput

Classification

PREDICTION_PROBABILITY

BINARY_DOUBLE

classificationProbOutput

Classification

PREDICTION_SET

set of ( NUMBER , BINARY_DOUBLE )

set of (target_type, BINARY_DOUBLE)

NA

Clustering

CLUSTER_PROBABILITY

BINARY_DOUBLE

clusteringProbOutput

Clustering

CLUSTER_DISTANCE

BINARY_DOUBLE

clusteringDistanceOutput

Clustering

CLUSTER_SET

set of ( NUMBER , BINARY_DOUBLE ) NA

Embedding

VECTOR_EMBEDDING

VECTOR( float32, n) embeddingOutput

7.1.4 Custom ONNX Runtime Operations

If you are looking to customize a pretrained embedding model by augmenting with pre-processing and post-processing operations, Oracle supports tokenization of an embedding model as a pre-processing operation and pooling and normalization as post-processing custom ONNX Runtime operations for version 1.15.1.

Oracle offers a Python utility that provides a mechanism to augment a pretrained model with tokenization, pooling and normalization. The Python utility can augment the model with pre-processing and post-processing operations and convert a pretrained model to an ONNX format. Models using any other custom operations will fail on import. For details on how to use the Python utility, see Convert Pretrained Models to ONNX Format.

7.1.5 Use PL/SQL Packages to Import Models

Use the DBMS_DATA_MINING.IMPORT_ONNX_MODEL procedure or the DBMS_VECTOR.LOAD_ONNX_MODEL procedure to import ONNX format models. You can then use the imported ONNX format models through a scoring function run by the in-database ONNX Runtime.

The DBMS_DATA_MINING.RENAME_MODEL procedure is also supported.

Most of the existing Oracle Machine Learning for SQL APIs are available to the ONNX models. As partitioning is not applicable for external pretrained models, ONNX models do not support the following procedures:

  • ADD_PARTITION
  • DROP_PARTITION
  • ADD_COST_MATRIX
  • REMOVE_COST_MATRIX

7.1.6 Supported SQL Scoring Functions

Supported scoring functions for in-database scoring of machine learning models imported in the ONNX format are listed.

Machine Learning Technique Operator Supported Return Type
Embedding VECTOR_EMBEDDING always VECTOR(<dimensions , FLOAT32>)

The number of dimensions of the output vector of a VECTOR_EMBEDDING operator is defined by the embedding models.

Regression PREDICTION always Data type of the target. For regression, the data type is converted to BINARY_DOUBLE SQL type.
Classification PREDICTION always Data type of the target.
Classification PREDICTION_PROBABILITY always BINARY_DOUBLE
Classification PREDICTION_SET always Set of ( t, NUMBER , BINARY_DOUBLE ) where t is the data type of the target.
Clustering CLUSTER_ID

only if clusteringProbOutput is specified

NUMBER
Clustering CLUSTER_PROBABILITY only if clusteringProbOutput is specified BINARY_DOUBLE
Clustering CLUSTER_SET only if clusteringProbOutput is specified Set of ( NUMBER, BINARY_DOUBLE )
Clustering CLUSTER_DISTANCE only if clusteringDistanceOutput is specified BINARY_DOUBLE

Note:

You can define the outputs explicitly in the metadata or implicitly.

  • The metadata must explicitly specify how to find the result in the model output for some SQL scoring functions. For example, CLUSTER_PROBABILITY is supported only if clusteringProbOutput is specified in the metadata.

  • The system automatically assumes the output for a model with only one output if you don't specify it in the metadata.

  • If a scoring function does not comply according to the description provided, you will receive an ORA-40290 error when performing the scoring operation on your data. Additionally, any unsupported scoring functions will raise the ORA-40290 error.

To learn more about classification data types that are returned, see labels and classificationLabelOutput in JSON Metadata Parameters for ONNX Models.

Cost Matrix Clause

Specify a cost matrix directly within the PREDICTION and PREDICTION_SET scoring functions. To learn more about Cost Matrix, see Oracle Machine Learning for SQL Concepts.