Install the Private AI Services Container

2 Install the Private AI Services Container

Use the included steps to configure and install the container with best practices using bash scripts.

The Oracle Private AI Services Container is AI infrastructure designed to run on-premises, and optionally in air-gapped environments. With the vector embedding service, the container can create low latency, free vector embeddings in a secure manner within the privacy of your own realm. Using the vector index service, the container securely offloads the computational costs of vector index creation to an NVIDIA GPU.

The container uses TLS 1.3 and API Keys for security. A PKCS12 keystore is used by default, but JKS is also supported. The container can also work with Security Enhanced Linux (SELinux) in enforcing mode.

The included scripts, provided for HTTP and HTTP/SSL, offer examples of how to use curl to create vector embeddings, list the loaded embedding models, check on the health of the container, and produce runtime metrics.

Prerequisites

The Oracle Private AI Services Container uses Oracle Linux 8 within the container and works with the following host Linux x86_64 distributions:

Oracle Linux 8.6+
Oracle Linux 9
Oracle Linux 10

The AI Services Container uses TLS 1.3. This means that you need a version of OpenSSL that supports TLS 1.3 to create certificates, for example:

OpenSSL 1.1.1k+
OpenSSL 3.0+

The AI Services Container can be used with the following software:

Podman

The chosen software must be one of the following versions:

To use the vector embedding service:
- Podman 3.4.4+, 4.4+, 5+
- Kubernetes 1.31.1+
- Red Hat OpenShift 4.19+
To use the vector index service:
- Podman 3.4.4+, 4.4+, 5+

The included examples use Podman with Oracle Linux 8 as the host operating system on OCI. The example OCI VM where the container is installed is called privateaivm.

There are also some system requirements to be aware of when using the Private AI Services Container:

Vector Embedding Service:
- Memory: You must have at least 16 GB of free memory to effectively use the container to create vector embeddings. If you want to use large embedding models or create multiple different types of vectors at the same time, then you will need more memory. Insufficient memory can result in reduced performance.
- CPU cores: Although you can use a single CPU core with the ONNX Runtime to create vector embeddings, having more CPU cores will enhance performance. Multiple CPU cores enable either a single vector or many different vectors to be created at the same time using multi-threading.
- Disk space: You need at least 22 GB of disk space to effectively use the container. You will need additional disk space for each additional embedding model that you use. You will also need additional disk space depending on the log level for long running containers.
Vector Index Service:
- Memory: At least 16 GB of free memory is required to create vector indexes, and it is recommended to have at least twice as much CPU memory as VRAM (GPU memory). Insufficient memory will cause index creation to fail.
- CPU cores: Oracle recommends a minimum of 4 CPU cores for vector index creation. Beyond 16 CPU cores, the performance benefits of additional cores diminishes.
- Disk space: Disk space is only needed for logging when using the vector index service. A minimum of 22 GB of disk space is suggested. More may be needed depending on the log level for long running containers.

Note that the container should not run on the same machine as the Oracle AI Database Server but on a Linux machine that is close to the database server. The container is designed to accelerate resource intensive tasks such as creating vectors or vector index offload of machines other than the Oracle AI Database. This enables the Oracle AI Database to have low latency and high throughput without being burdened with these resource-intensive AI infrastructure tasks.

The vector index service of the Private AI Services Container has the following additional prerequisites:

NVIDIA GPUs only: Turing architecture and above. To see a current list of NVIDIA GPUs along with their compute capability, see https://developer.nvidia.com/cuda/gpus.
NVIDIA driver version 580.65.06+
Compute capability 7.5+
The NVIDIA Container Toolkit must be installed on the host to expose GPUs inside the container. Using the latest version of the Toolkit is suggested, for example 1.19.0 or later. Follow the steps provided in the NVIDIA installation guide and the CDI integration support documentation for Podman.

Use nvidia-smi to ensure your GPU resources meet the listed prerequisites and nvidia-ctk to verify that the CDI is working correctly. For example, the latter can be confirmed using the following command:

nvidia-ctk --debug cdi list

Install the container.
- If you are using Oracle Linux 8:
```
sudo dnf module install -y container-tools:ol8
```
- If you are using Oracle Linux 9 or 10:
```
sudo dnf install -y container-tools
```
Verify that Podman is installed correctly.
```
podman version
```
```
podman images
```
Sign in to Oracle Container Registry (OCR).

Go to Oracle Container Registry and click Sign In.

You will be asked to enter your SSO user name and password. If you are not yet registered with container-registry.oracle.com, you will be prompted to do so. Registering simply saves your SSO user name with container-registry.oracle.com as an SSO-allowed user.

After completing the one-time registration, clicking Sign In on future visits will prompt you to enter your SSO user name and password only.
Accept the License Agreement.

You must click Continue to read and accept the license agreement. This action only needs to be completed once.
Generate an auth token to pull the container image.

To use Podman to download the Private AI Services container image, you will need to use the Podman login with your SSO user name and a generated auth token. To generate an auth token, follow the steps at Generating an Oracle Container Registry Authentication Token in Podman User's Guide.

Note:
Be sure to copy the generated auth token to a secure location before navigating from the generation page. The token will not be shown again and you will have to delete any existing token before creating another.
Pull the container image.

Download (pull) the Private AI Services Container image to the host where you intend to run it.
1. Run the following command to log in to OCR.
```
podman login container-registry.oracle.com
```
  When prompted for a user name, enter your SSO user name. Enter the generated auth token from the previous step when prompted for a password.
2. Once you have successfully logged in, pull the image for either the vector embedding service or the vector index service.
  - Vector embedding service:
```
podman pull container-registry.oracle.com/database/private-ai:25.1.3.0.0
```
  - Vector index service:
```
podman pull container-registry.oracle.com/database/private-ai:gpu-index-26.1.0.0.0
```
Verify the image has been pulled.
Run the following command to show all images on your system, including container-registry.oracle.com/database/private-ai:25.1.3.0.0 or container-registry.oracle.com/database/private-ai:gpu-index-26.1.0.0.0.
```
podman images
```
(Optional) When using air-gapped environments, continue with the following steps so that you can access the Private AI Container Image offline.
1. Save the Private AI Container Image to a tar file.
  The following command saves the container image to a file, named private-ai-25.1.3.0.0.tar for the vector embedding service or private-ai-gpu-index-26.1.0.0.0.tar for the vector index service, in the current directory:
  - Vector embedding service:
```
podman save -o private-ai-25.1.3.0.0.tar  container-registry.oracle.com/database/private-ai:25.1.3.0.0
```
  - Vector index service:
```
podman save -o private-ai-gpu-index-26.1.0.0.0.tar  container-registry.oracle.com/database/private-ai:gpu-index-26.1.0.0.0
```
2. Load the tar file on the air-gapped machine.
  After copying the tar file to the air-gapped host, you can load it into the local podman image repository using the following command:
  - Vector embedding service:
```
podman load -i private-ai-25.1.3.0.0.tar
```
  - Vector index service:
```
podman load -i private-ai-gpu-index-26.1.0.0.0.tar
```
  Once this load operation is successful, verify that the Private AI Container Image is accessible on the air-gapped host using podman images. This should list all podman images on the host, including container-registry.oracle.com/database/private-ai:25.1.3.0.0 or container-registry.oracle.com/database/private-ai:gpu-index-26.1.0.0.0.