2 Install the Private AI Services Container
Use the included steps to configure and install the container with best practices using bash scripts.
The Oracle Private AI Services Container is AI infrastructure designed to run on-premises, and optionally in air-gapped environments. With the vector embedding service, the container can create low latency, free vector embeddings in a secure manner within the privacy of your own realm. Using the vector index service, the container securely offloads the computational costs of vector index creation to an NVIDIA GPU.
The container uses TLS 1.3 and API Keys for security. A PKCS12 keystore is used by default, but JKS is also supported. The container can also work with Security Enhanced Linux (SELinux) in enforcing mode.
The included scripts, provided for HTTP and HTTP/SSL, offer examples of how to use curl to create vector embeddings, list the loaded embedding models, check on the health of the container, and produce runtime metrics.
Prerequisites
The Oracle Private AI Services Container uses Oracle Linux 8 within the container and works with the following host Linux x86_64 distributions:
- Oracle Linux 8.6+
- Oracle Linux 9
- Oracle Linux 10
The AI Services Container uses TLS 1.3. This means that you need a version of OpenSSL that supports TLS 1.3 to create certificates, for example:
- OpenSSL 1.1.1k+
- OpenSSL 3.0+
The AI Services Container can be used with the following software:
- Podman
The chosen software must be one of the following versions:
- To use the vector embedding service:
- Podman 3.4.4+, 4.4+, 5+
- Kubernetes 1.31.1+
- Red Hat OpenShift 4.19+
- To use the vector index service:
- Podman 3.4.4+, 4.4+, 5+
The included examples use Podman with Oracle Linux 8 as the host operating
system on OCI. The example OCI VM where the container is installed is called
privateaivm.
-
Vector Embedding Service:
-
Memory: You must have at least 16 GB of free memory to effectively use the container to create vector embeddings. If you want to use large embedding models or create multiple different types of vectors at the same time, then you will need more memory. Insufficient memory can result in reduced performance.
-
CPU cores: Although you can use a single CPU core with the ONNX Runtime to create vector embeddings, having more CPU cores will enhance performance. Multiple CPU cores enable either a single vector or many different vectors to be created at the same time using multi-threading.
-
Disk space: You need at least 22 GB of disk space to effectively use the container. You will need additional disk space for each additional embedding model that you use. You will also need additional disk space depending on the log level for long running containers.
-
- Vector Index Service:
-
Memory: At least 16 GB of free memory is required to create vector indexes, and it is recommended to have at least twice as much CPU memory as VRAM (GPU memory). Insufficient memory will cause index creation to fail.
-
CPU cores: Oracle recommends a minimum of 4 CPU cores for vector index creation. Beyond 16 CPU cores, the performance benefits of additional cores diminishes.
-
Disk space: Disk space is only needed for logging when using the vector index service. A minimum of 22 GB of disk space is suggested. More may be needed depending on the log level for long running containers.
-
Note that the container should not run on the same machine as the Oracle AI Database Server but on a Linux machine that is close to the database server. The container is designed to accelerate resource intensive tasks such as creating vectors or vector index offload of machines other than the Oracle AI Database. This enables the Oracle AI Database to have low latency and high throughput without being burdened with these resource-intensive AI infrastructure tasks.
The vector index service of the Private AI Services Container has the following additional prerequisites:
- NVIDIA GPUs only: Turing architecture and above. To see a current list of NVIDIA GPUs along with their compute capability, see https://developer.nvidia.com/cuda/gpus.
- NVIDIA driver version 580.65.06+
- Compute capability 7.5+
- The NVIDIA Container Toolkit must be installed on the host to expose GPUs inside the container. Using the latest version of the Toolkit is suggested, for example 1.19.0 or later. Follow the steps provided in the NVIDIA installation guide and the CDI integration support documentation for Podman.
Use nvidia-smi to ensure your GPU resources meet the listed
prerequisites and nvidia-ctk to verify that the CDI is working correctly.
For example, the latter can be confirmed using the following command:
nvidia-ctk --debug cdi list