Deploy the LLM
Then you can use the OCI Block Volumes service to store data, objects, and unstructured model data. Follow the instructions to complete each task:
- Create an Instance
- Create a Block Volume
- Attach a Block Volume to an Instance
- Connect to a Block Volume
- Create an OCI Object Storage Bucket
This will deploy a model from OCI Object Storage to an OKE cluster running on OCI.
Create an OKE Cluster
To create an OKE cluster, use the following command:
oci ce cluster create --compartment-id ocid1.compartment.oc1..aaaaaaaay______t6q
--kubernetes-version v1.24.1 --name amd-mi300x-ai-cluster --vcn-id
ocid1.vcn.oc1.iad.aaaaaae___yja
To use the console option, follow these steps:
Serve LLM
llm
and
SamplingParams
classes for offline inferencing with a batch of
prompts. You can then load and call the model.
The following is an example of a Meta Llama 3 70B model that needs multiple GPUs to run with tensor parallelism. vLLM uses the Megatron-LM's tensor parallelism algorithm and Python's multiprocessing to manage the distributed runtime on single nodes.