Install with HTTP and Default Models

This is the simplest configuration. The API Key and SSL are not used. The default embedding models are used with the HTTP port 8080.

  1. Define the directory where you want the Private AI Services Container to be installed. A Linux user with the least privilege will own the subdirectories that the container uses.

    Note that the http parameter includes two dashes.

    cd setup
    mkdir /home/opc/privateai
    export PRIVATE_DIR=/home/opc/privateai
    ./configSetup.sh -d $PRIVATE_DIR
    ./containerSetup.sh -d $PRIVATE_DIR --http
  2. Verify that the container is running.
    podman ps

    After the container is running, it will take a few seconds for the container to load the default embedding models. You can verify that the container is ready using the /health endpoint:

    curl -i http://localhost:8080/health
  3. Now that the container is ready, determine which vector embedding models are available without any explicit model configuration:
    • clip-vit-base-patch32-txt
    • clip-vit-base-patch32-img
    • all-mpnet-base-v2
    • all-MiniLM-L12-v2
    • multilingual-e5-base
    • multilingual-e5-large
    curl http://localhost:8080/v1/models
  4. You can now, for example, create text vector embeddings and explicitly define the embedding model to use:
    • Using multilingual-e5-base:

      curl -X POST -H "Content-Type: application/json" -d '{"model": "multilingual-e5-base", "input":["This is a phrase to vectorize"]}' http://localhost:8080/v1/embeddings
    • Using clip-vit-base-patch32-txt:

      curl -X POST -H "Content-Type: application/json" -d '{"model": "clip-vit-base-patch32-txt", "input":["This is a phrase to vectorize"]}' http://localhost:8080/v1/embeddings

    /v1/embeddings and /v1/models are examples of using the REST endpoints as defined by the OpenAI API. This container implements these OpenAI API REST endpoints without requiring an OpenAI account or a connection to the public internet.

  5. You can gather performance metrics using the /metrics endpoint.
    curl http://localhost:8080/metrics/embeddings_call_latency

After following these steps, you now know how to do the following with HTTP:

  • List the available embeddings models
  • Create vectors based on an embedding model
  • Check the health of the container with the /health endpoint
  • Check the performance metrics with the /metrics endpoint