11.9 Support For Large ONNX Format Model

OML4Py supports large ONNX format models outside of the database. A large model is divided into a single ONNX file and one or more external data files. The maximum size limit for an ONNX file is 2GB, so for models larger than 2GB, the ONNX file must be split into smaller parts.

You can leverage ONNXPipelines to utilize external data. External data can be applied in two ways:
  • If model size is less than 1 GB: If the model size is less than 1 GB, manually activate the use of external data, set the parameter use_external_data to True. It can be used with both preconfigured models and non-configured models.
    • By using a preconfigured model.
      from oml.utils import ONNXPipeline,ONNXPipelineConfig
      config = ONNXPipelineConfig("Snowflake/snowflake-arctic-embed-s")
      config.use_external_data=True
      pipeline = ONNXPipeline("Snowflake/snowflake-arctic-embed-s",config)
    • If the model is not configured, you can use templates.
      from oml.utils import ONNXPipeline, ONNXPipelineConfig
      config = ONNXPipelineConfig.from_template("text", max_seq_length=512)
      config.use_external_data=True
      pipeline = ONNXPipeline("Snowflake/snowflake-arctic-embed-s", config)
  • If model size exceeds 1 GB: Loading a model of size greter than 1 GB without quantization enabled will default to using external data. For models greater than 2 GB with quantization enabled (quantize_model=True), will result in an error, instead of activating the external data. The following example, which uses a large multilingual-e5 model, will default to using the external data:
    from oml.utils import ONNXPipeline,ONNXPipelineConfig
    config = ONNXPipelineConfig.from_template("text", max_seq_length=512)
    pipeline = ONNXPipeline("intfloat/multilingual-e5-large", config)
    pipeline.export2db("multilingual")

Note:

You can always manually activate the use of external data by setting the parameter use_external_data=True for models of any size.

The export2file function will create a zip file instead of a single ONNX file. This zip file will contain:

  • An ONNX file: This file corresponds to the model graph.
  • One or more .data files: These files will contain the tensors and any other external data associated with the model.
  • A JSON file: This file details which tensor belongs to a specific location on each .data file.
An example of such a zip file would be as follows:
  • large_model.onnx
  • large_model_data_0.data, large_model_data_1.data
  • large_model_external_data.json
Model size support in ONNX pipeline API:
  • Less than 1 GB: Supported with optimal performance.
  • 1 GB to 2 GB: If quantization is enabled, full support is provided; otherwise, the model is exported using external initializers.
  • 0.4 GB to 2 GB (without quantization): A warning is displayed, recommending quantization to improve performance.
  • Greater than 2 GB: Models over 2 GB are supported only with quantization disabled, using external data. With quantization enabled, the limit is 2 GB.