Build Generative AI applications using Llama 2 model on Oracle Cloud Infrastructure

Oracle Cloud Infrastructure Generative AI (OCI Generative AI) is a fully managed service that provides a set of state-of-the-art, customizable large language models (LLMs) that cover a wide range of use cases for text generation.

Meta Llama 2 is an open-source large language model, which is offered as a fully managed pretrained Foundational Model (meta.llama-2-70b-chat) with 70B parameters on Oracle Cloud Infrastructure (OCI). User prompt and response can be up to 4096 tokens for each run. You can quickly build your Generative AI applications on OCI and host the Llama 2 model by procuring a dedicated AI Cluster on OCI.

Architecture

You can use Llama 2 as pretrained foundational models without worrying about the underlying infrastructure. Run your prompts, adjust the parameters, update your prompts, and rerun the model until you are happy with the results. Then get the code from the Console and copy the code into your applications. You can also host the Llama 2 model on a dedicated cluster and integrate with your application using API endpoints.

In this reference architecture, Object Storage is provisioned in OCI for data storage, Data Integration for transformations, OCI Data Science Workspace for model building, Vector database for storing embeddings, OCI Generative AI service with dedicated AI Cluster for hosting and OCI Developer tool – APEX for UI.

The following diagram illustrates this reference architecture.

Description of oci-generative-ai-llama-arch.png follows

Description of the illustration oci-generative-ai-llama-arch.png

oci-generative-ai-llama-arch-oracle.zip

Advantages of building a LLM on OCI

Generative AI Service: OCI Generative AI is a fully managed service available via an API to seamlessly integrate these versatile language models into a wide range of use cases, including writing assistance, summarization, and chat.

Dedicated AI Clusters: Dedicated AI clusters are compute resources that you can use for fine-tuning custom models or for hosting dedicated AI endpoints for models. The clusters are dedicated to your models and not shared with users in other tenancies.

Note:

New AI vector similarity search feature is available in Oracle Database 23ai.

The architecture has the following components:

Object storage
Object storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.
OCI Integration
Oracle Cloud Infrastructure Integration services connect any application and data source, including Salesforce, SAP, Shopify, Snowflake, and Workday, to automate end-to-end processes and centralize management. The broad array of integrations, with prebuilt adapters and low-code customization, simplify migration to the cloud while streamlining hybrid and multicloud operations.
OCI Data Science
Oracle Cloud Infrastructure (OCI) Data Science is a fully managed and serverless platform for data science teams to build, train, and manage machine learning models.
OCI Generative AI
Oracle Cloud Infrastructure Generative AI is a fully managed service that provides a set of state-of-the-art, large language models (LLMs) that cover a wide range of use cases for text generation. Use the playground to try out the ready-to-use pretrained models, or create and host your own dedicated Llama2 model based on your enterprise data on dedicated AI clusters.

Recommendations

Use the following recommendations as a starting point. Your requirements might differ from the architecture described here.

Maintenance and High Availability
This reference architecture uses nearly only Oracle-managed PaaS services. There is no need to install, patch, update, or upgrade the software using this solution.
Scalability and size
This reference architecture uses PaaS services, and it is scalable out-of-the-box for most of the services it includes.
Connectivity
All connections within OCI should be established through a private network, you can use the private endpoints option to connect to OCI PaaS services.

Considerations

Consider the following points when deploying this reference architecture.

Security
Dedicated AI clusters in OCI Generative AI are compute resources that you can use for hosting endpoints for Llama 2 LLM models. The clusters are dedicated to your models and not shared with users in other tenancies.
Resource limits
Consider the best practices, limits by service, and compartment quotas for your tenancy.

Explore More

Review these additional resources to learn more about the features of this reference architecture.

Acknowledgments

Author: Pavan Kumar Manuguri