Supported Models for Import
You can import open-source and third-party large language large language models from Hugging Face and OCI Object Storage buckets into OCI Generative AI. Create endpoints for those models and use them in the Generative AI service to speed up AI initiatives.
OCI Generative AI Imported Model Architecture
The OCI Generative AI service uses Open Model Engine (OME) to deploy and manage imported models. OME acts as the orchestration layer between the GPU and the inference runtime.
When you deploy an imported model, OME analyzes the model and pairs it with the most efficient runtime: vLLM (optimized for high-throughput) and SGLang (optimized for high-performance). The vLLM and SGLang runtime engines run the models on the GPUs.
Some models are heavily optimized for SGLang (such as large-scale LLMs and those requiring RadixAttention for long-context memory), while others have better community kernels in vLLM (such as popular open-source LLMs and multimodal models).
While you can import any chat, embedding, (and fine-tuned) model supported through Open Model Engine (with vLLM or SGLang runtime), only models explicitly listed in the Supported Models section are supported. Unlisted models might have compatibility issues and we recommend that you test any unlisted model before production use.
For available hardware and steps on how to deploy the imported models, see Managing Imported Models.
Supported Models
- Alibaba Qwen
Features advanced multilingual and multimodal capabilities.
- Google Gemma
Built for broad language processing needs and high versatility.
- Meta Llama
Enhanced with Grouped Query Attention (GQA) for improved performance.
- Microsoft Phi
Known for efficiency and compactness, designed for scalable and flexible performance.
- Mistral
Includes embedding and chat models. The embedding model is suited for efficient long-context handling.
- OpenAI GptOss
Built with open-weight Mixture-of-Experts (MoE) architecture for efficient reasoning and large-context handling.