11.9 Support For Large ONNX Format Model
OML4Py supports large ONNX format models outside of the database. A large model is divided into a single ONNX file and one or more external data files. The maximum size limit for an ONNX file is 2GB, so for models larger than 2GB, the ONNX file must be split into smaller parts.
You can leverage ONNXPipelines to utilize external
data. External data can be applied in two ways:
- If model size is less than 1 GB: If the model size is less than
1 GB, manually activate the use of external data, set the parameter
use_external_datato True. It can be used with both preconfigured models and non-configured models.- By using a preconfigured
model.
from oml.utils import ONNXPipeline,ONNXPipelineConfig config = ONNXPipelineConfig("Snowflake/snowflake-arctic-embed-s") config.use_external_data=True pipeline = ONNXPipeline("Snowflake/snowflake-arctic-embed-s",config) - If the model is not configured, you can use
templates.
from oml.utils import ONNXPipeline, ONNXPipelineConfig config = ONNXPipelineConfig.from_template("text", max_seq_length=512) config.use_external_data=True pipeline = ONNXPipeline("Snowflake/snowflake-arctic-embed-s", config)
- By using a preconfigured
model.
- If model size exceeds 1 GB: Loading a model of size greter than
1 GB without quantization enabled will default to using external data. For
models greater than 2 GB with quantization enabled
(
quantize_model=True), will result in an error, instead of activating the external data. The following example, which uses a large multilingual-e5 model, will default to using the external data:from oml.utils import ONNXPipeline,ONNXPipelineConfig config = ONNXPipelineConfig.from_template("text", max_seq_length=512) pipeline = ONNXPipeline("intfloat/multilingual-e5-large", config) pipeline.export2db("multilingual")
Note:
You can always manually activate the use of external data by
setting the parameter use_external_data=True for models of
any size.
The export2file function will create a
zip file instead of a single ONNX file. This zip file will contain:
- An ONNX file: This file corresponds to the model graph.
- One or more .data files: These files will contain the tensors and any other external data associated with the model.
- A JSON file: This file details which tensor belongs to a specific location on each .data file.
- large_model.onnx
- large_model_data_0.data, large_model_data_1.data
- large_model_external_data.json
Model size support in ONNX pipeline API:
- Less than 1 GB: Supported with optimal performance.
- 1 GB to 2 GB: If quantization is enabled, full support is provided; otherwise, the model is exported using external initializers.
- 0.4 GB to 2 GB (without quantization): A warning is displayed, recommending quantization to improve performance.
- Greater than 2 GB: Models over 2 GB are supported only with quantization disabled, using external data. With quantization enabled, the limit is 2 GB.
Parent topic: Import Pretrained Models in ONNX Format