Install with HTTP with Models and Advanced Options

This tutorial is a superset of the HTTP with Configuration File tutorial that allows you to define vector embedding models that do not ship with the container. For a more advanced configuration, you can optionally choose to specify the HTTP port, container version, and or container name.

The API Key and SSL are neither configured nor used in this tutorial.

The default configuration is used with HTTP port 8080.

Choose Your ONNX Pipeline Models

There are many possible embedding models that can be used. Along with models that are shipped with the container, a more extensive list of embedding models that are known to work with the container can be found in Available Embedding Models.

Once you have chosen your desired ONNX Pipeline model, and built it with Oracle Machine Learning Client 2.1 (if you choose a model other than the pre-built options), you need to copy that ONNX file into a directory on the host machine in which the container will run.

You must also have a JSON configuration file that lists the desired ONNX Pipeline models to be used in the container.

In this tutorial, both the ONNX models and the config file are copied into their own directories and run as the container user with least privilege.

  1. Set up the JSON configuration file.

    In this example, your desired ONNX Pipeline models are in /home/opc/models and your config file is /home/opc/config/config.json. The contents of the example config.json file are as follows:

    {
        "environment":{
           "PRIVATE_AI_LOG_LEVEL":"INFO"
        },
        "ratelimiter": {
           "service_requests_per_min": 3000,
           "monitor_requests_per_min": 60
      },
        "models": [
            {
                "modelname":"tinybert",
                "modelfile":"tinybert.onnx",
                "modelfunction":"EMBEDDING",
                "cache_on_startup":true
            },
            {
                "modelname":"snowflake-arctic-embed-s",
                "modelfile":"snowflake-arctic-embed-s.onnx",
                "modelfunction":"EMBEDDING",
                "cache_on_startup":true
            },
            {
                "modelname":"all-MiniLM-L12-v2",
                "modelfile":"all-MiniLM-L12-v2.onnx",
                "modelfunction":"EMBEDDING",
                "cache_on_startup":true
            }
        ]
    }

    In this configuration file, the modelfile (filename) and modelname of the two new ONNX Pipeline embedding models are defined. The modelname does not need to be related to the ONNX filename, but it makes things simpler for you to manage when you have many embedding models if they are the same name. The modelname is an identifier and cannot have a file extension such as .onnx or .zip.

    An existing model that ships with the container is also defined. If no config.json file is used, all models that ship with the container are loaded. If a config.json file is used, only the models that are explicitly defined in the file are loaded. Minimizing the number of explicitly defined models reduces the memory required by the container, as each model needs to be loaded into memory.

  2. Configure and start the container with the three models provided in the configuration file.
    mkdir /home/opc/privateai
    mkdir /home/opc/models
    mkdir /home/opc/config
    export PRIVATE_DIR=/home/opc/privateai
    ./configSetup.sh -d $PRIVATE_DIR -m /home/opc/models -c /home/opc/config/config.json
    ./containerSetup.sh -d $PRIVATE_DIR --http
    podman ps

    The time it takes to start the container depends on the number and size of ONNX Pipeline models that are in your models directory. It may take minutes if you have many ONNX files.

  3. Optionally specify the container version.

    If multiple versions of the container are available, the latest version will be run by default. If you want to run an older version, you can specify the desired version, for example 25.1.2.0.0.

    export PRIVATE_DIR=/home/opc/privateai
    ./configSetup.sh -d $PRIVATE_DIR -m /home/opc/models -c /home/opc/config/config.json 
    ./containerSetup.sh -d $PRIVATE_DIR --http -v 25.1.2.0.0
  4. Optionally name the container instance.

    You can stop a container by using either its container ID or its name. For example, podman stop privateai.

    If you would like to give the container instance a specific name, use the -n parameter. In this example, container5 is used:

    export PRIVATE_DIR=/home/opc/privateai
    ./configSetup.sh -d $PRIVATE_DIR -m /home/opc/models -c /home/opc/config/config.json 
    ./containerSetup.sh -d $PRIVATE_DIR --http -n container5
  5. Combine previous steps to specify multiple parameters.

    First, stop running the container:

    podman stop container5

    The commands in this step specify the port, container version, and container name:

    export PRIVATE_DIR=/home/opc/privateai
    ./configSetup.sh -d $PRIVATE_DIR -m /home/opc/models -c /home/opc/config/config.json 
    ./containerSetup.sh -d $PRIVATE_DIR --http -p 9000 -v 25.1.2.0.0  -n container5
  6. The curl commands to create vectors and to check the /health, /models, and /metrics are the same here as for the installation using HTTP with a configuration file.

    The difference is that you will now see the four new models that you configured in the /models endpoint.

    Note:

    HTTP port 9000 is used rather than 8080 because that's how the container was configured.
    curl http://localhost:9000/v1/models
  7. Specify the embedding model.

    This example vectorizes two input strings with the snowflake-arctic-embed-s embedding model:

    curl -X POST -H "Content-Type: application/json" -d '{"model": "snowflake-arctic-embed-s", "input":["A siamese cat","Standard Poodle running"]}' http://localhost:9000/v1/embeddings

    This example vectorizes a single input string for the tinybert embedding model:

    curl -X POST -H "Content-Type: application/json" -d '{"model": "tinybert", "input":["Standard Poodle running"]}' http://localhost:9000/v1/embeddings