Compatible Models for Import

You can import open-source and third-party large language large language models from Hugging Face and OCI Object Storage buckets into OCI Generative AI. After importing a model, you can host it on a dedicated AI cluster, create an endpoint, and use it in the Generative AI service.

Important

Imported models don’t require the 744 unit-hour minimum hosting commitment that applies when you host pretrained models available in OCI Generative AI on dedicated AI clusters.
Your use of these models may be subject to separate terms from the applicable third-party providers, and you are responsible for your compliance with such terms. Oracle disclaims all warranties, indemnities, and liabilities arising from or related to any open-source or third-party LLMs you import.
For available hardware shapes and steps on how to deploy the imported models, see Dedicated AI Cluster Unit Sizes.

OCI Generative AI Imported Model Architecture

The OCI Generative AI service uses Open Model Engine (OME) to deploy and manage imported models. OME acts as the orchestration layer between the GPU and the inference runtime.

When you deploy an imported model, OME analyzes the model and pairs it with the most efficient runtime: vLLM (optimized for high-throughput) and SGLang (optimized for high-performance). The vLLM and SGLang runtime engines run the models on the GPUs.

Some models are heavily optimized for SGLang (such as large-scale LLMs and those requiring RadixAttention for long-context memory), while others have better community kernels in vLLM (such as popular open-source LLMs and multimodal models).

Compatible Models

Alibaba Qwen
Features advanced multilingual and multimodal use cases.
DeepSeek
Optimized for coding, math, and complex reasoning.
Google Gemma
Built for broad language processing and general-purpose use cases.
Meta Llama
Models with enhanced Grouped Query Attention (GQA) for improved performance.
Microsoft Phi
Compact and efficient models for scalable deployments.
MiniMax
Efficient Mixture-of-Experts (MoE) models optimized for coding, complex reasoning, and agentic workflows.
Mistral
Includes embedding and chat models. The embedding model is suited for efficient long-context handling.
Moonshot AI Kimi
Text-to-text models optimized for long-horizon coding, agentic workflows, and complex software engineering tasks.
NVIDIA Nemotron
Open-weight models with published training data and recipes, suited for building specialized AI agents.
OpenAI Whisper
Audio-to-text model for multilingual transcription and high-throughput automatic speech recognition workloads.
OpenAI GptOss
Open-weight Mixture-of-Experts (MoE) models for efficient reasoning and large-context handling.
Z.ai GLM
Text-generation models optimized for long-horizon coding, agentic workflows, and efficient large-context processing.

Oracle Cloud Infrastructure Documentation

Compatible Models for Import

OCI Generative AI Imported Model Architecture

Compatible Models