About ONNX

7.1 About ONNX

ONNX is an open-source format designed for machine learning models. It ensures cross-platform compatibility. This format also supports major languages and frameworks, facilitating efficient model exchange.

The ONNX format allows for model serialization. Models are represented as graphs of common machine learning operations. These graphs are saved in a portable format called protocol buffers. It simplifies the exchange of models across various platforms. These platforms include cloud, web, edge, and mobile experiences on Microsoft Windows, Linux, Mac, iOS, and Android. ONNX models also offer flexibility to export and import model in many languages such as Python, C++, C#, and Java to name a few. The ONNX format is useful for compute-heavy tasks such as training machine learning models and data processing that often uses trained models. Many leading machine learning development frameworks such as TensorFlow, Pytorch, and Scikit-learn, offer the capability to convert models into the ONNX format.

Once you represent the models in the ONNX format, you can run them with the ONNX Runtime. The architecture of the ONNX Runtime is adaptable, enabling providers to modify or enhance how some operations are implemented to make better use of particular hardware, such as, Graphical Processing Units (GPUs), Single Instruction Multiple Data (SIMD) instruction sets or specialized libraries. To learn more on ONNX Runtime, see https://onnxruntime.ai/docs/.

The ONNX Runtime integration with Oracle Database allows for the import of ONNX-formatted models, including embedding models. To support embedding models, Oracle Machine Learning has introduced a machine learning technique called embedding. You can only use ONNX embedding models in the Oracle database if they were converted using Oracle’s Python utility package. Oracle's Python utility downloads a pretrained model, converts the model to ONNX format augmented with pre-processsing and post-processing operations and imports the ONNX format model to Oracle Database. For more information on the Python utility tool, see Convert Pretrained Models to ONNX Format.

Oracle supports ONNX Runtime version 1.20.1.

Initializers
Initializers are named constants stored inside ONNX models that are required during inference.
External Initializers and In-Memory Sharing
A model that uses very large initializers can reach the 2 GB size limit. To solve this problem, ONNX can store initializers as external files referred as external initializers or external data.
Enable and Use External Initializers
Use shared external initializers to reduce memory usage by loading large model data once into global memory. You verify eligibility, enable sharing, monitor usage, and later disable it when no longer needed.
Generate External Initializers with OML4Py
You can generate ONNX models with external initializers and their metadata using OML4Py.

Parent topic: Integration of ONNX Runtime

7.1.1 Initializers

Initializers are named constants stored inside ONNX models that are required during inference.

Most commonly, these are the model parameters such as weights and biases of layers; however, they can also store other fixed constants such as normalization coefficients. Machine learning operations, such as matrix multiplication, use these initializers while running operations.

Parent topic: About ONNX

7.1.2 External Initializers and In-Memory Sharing

A model that uses very large initializers can reach the 2 GB size limit. To solve this problem, ONNX can store initializers as external files referred as external initializers or external data.

Advanced models may use large constant tensor values (multi-dimensional arrays) whose cumulated size may be larger than 2GB and prevents serializing these models into protocol buffers.

Embedding models, crucial for tasks like vector similarity search, have initializers that often account for 95% of their file size and may range from dozens of megabytes to several gigabytes. Until this release, the approach was to load a private copy of these initializers into each session's process memory (PGA), leading to high total memory usage for concurrent workloads.

In-Memory Sharing

Oracle's ONNX integration supports the sharing and in-memory population of external initializers, enabling the same model’s initializers to be loaded once into global memory and accessed by all database sessions needing that model. This improves memory efficiency and scalability.

Only ONNX models imported with external initializers are eligible for in-memory population. When enabled, the model's initializers are loaded into a shared global area for use by all qualifying processes.

See Enable and Use External Initializers and on how to enable and use shared external initilizers.

Parent topic: About ONNX

7.1.3 Enable and Use External Initializers

Use shared external initializers to reduce memory usage by loading large model data once into global memory. You verify eligibility, enable sharing, monitor usage, and later disable it when no longer needed.

Before You Begin

Models must use external initializers to qualify for sharing. You can verify the model compatibility and proceed with the next steps. Administrators and data scientists must ensure their model import or conversion pipelines are designed accordingly, leveraging Oracle's OML4Py utilities or their own frameworks to export ONNX models with external initializers and associated metadata. See Support For Large ONNX Format Model Support for details on how to create models with external data.

Follow the steps to verify, enable, and use external initializers:

Check that the ONNX model uses external initializers:
```
SELECT model_name, EXTERNAL_DATA FROM all_mining_models;
```
Only models with EXTERNAL_DATA=YES can be enabled for in-memory sharing.
Enable in-memory sharing.
```
EXECUTE DBMS_DATA_MINING.INMEMORY_ONNX_MODEL('your_model_name');
```
This loads the initializers into shared memory. The INMEMORY status on the model view is set to YES.
Monitor the model usage and memory.
```
SELECT * FROM V$IM_ONNX_MODEL WHERE NAME = 'your_model_name';
```
Shows metadata, population status, initializer sizes, and pin count. Inference in a session increments PIN_COUNT, displaying active usage of shared memory or many processes are sharing the model. If the session stops using the model for a while, the system decreases the pin count by one.
Use V$IM_ONNX_SEGMENT for deeper insights for troubleshooting.
```
SELECT * FROM V$IM_ONNX_SEGMENT WHERE NAME = 'your_model_name';
```
Disable and release the memory.
```
EXECUTE DBMS_DATA_MINING.INMEMORY_ONNX_MODEL('your_model_name', enable => FALSE);
```
Note:

The system releases the memory only after all sessions remove their pin.

7.1.4 Generate External Initializers with OML4Py

You can generate ONNX models with external initializers and their metadata using OML4Py.

You can import those ONNX models into your database. OML4Py also creates a metadata file that describes each initializer, making it easy for the ONNX runtime or Oracle AI Database to work with your model.

OML4Py enables importing or exporting models with external initializers using the following structure:

A .onnx file that holds the ONNX model referencing external data.
One or more .dat files containing raw tensor data values.
A single .json file describing the metadata of the external initializers. This JSON file includes the name, shape, offset, type, and size for each initializer.