Pretrained Foundational Models in Generative AI

You can use the following pretrained foundational models in OCI Generative AI:

Note

For supported model time lines, see Retiring the Models.
Chat Models (New)

Ask questions and get conversational responses through an AI chatbot.

Model Available in These Regions Key Features
cohere.command-r-plus v1.2
  • US Midwest (Chicago)
  • Germany Central (Frankfurt)
  • User prompt can be up to 128,000 tokens, and response can be up to 4000 tokens for each run.
  • Optimized for complex tasks, offers advanced language understanding, higher capacity, and more nuanced responses, and can maintain context from its long conversation history of 128,000 tokens. Also ideal for question-answering, sentiment analysis, and information retrieval.
cohere.command-r-16k v1.2
  • US Midwest (Chicago)
  • Germany Central (Frankfurt)
  • User prompt can be up to 16,000 tokens, and response can be up to 4000 tokens for each run.
  • Optimized for conversational interaction and long context tasks. Ideal for text generation, summarization, translation, or text-based classification.
meta.llama-3-70b-instruct v1.0
  • US Midwest (Chicago)
  • Germany Central (Frankfurt)
  • Model has 70 billion parameters.
  • User prompt and response can be up to 8000 tokens for each run.
  • You can fine-tune this model with your dataset.
Tip

Learn about chat models.

Generation Models

Give instructions to generate text or extract information from your text.

Important

The text generation feature will be removed from the OCI Generative AI playground, API, and CLI when the cohere.command v15.6 and cohere.command-light v15.6 models are retired. Instead, you can use the chat models. For retirement dates see Retiring the Models.
Model Available in These Regions Key Features
cohere.command v15.6
  • US Midwest (Chicago)
  • Model has 52 billion parameters.
  • User prompt and response can be up to 4096 tokens for each run.
  • You can fine-tune this model with your dataset.
cohere.command-light v15.6
  • US Midwest (Chicago)
  • Model has 6 billion parameters.
  • User prompt and response can be up to 4096 tokens for each run.
  • You can fine-tune this model with your dataset.
meta.llama-2-70b-chat
  • US Midwest (Chicago)
  • Model has 70 billion parameters.
  • User prompt and response can be up to 4096 tokens for each run.
The Summarization Model

Summarize text with your instructed format, length, and tone.

Important

The summarization feature will be removed from the OCI Generative AI playground, API, and CLI when the cohere.command v15.6 model is retired. Instead of this model, you can summarize text by using the chat models. For retirement dates see Retiring the Models.
Model Available in These Regions Key Features
cohere.command v15.6
  • US Midwest (Chicago)
  • Model has 52 billion parameters.
  • User prompt and response can be up to 4096 tokens for each run.
Embedding Models

Convert text to vector embeddings to use in applications for semantic searches, text classification, or text clustering.

Model Available in These Regions Key Features
cohere.embed-english-v3.0
  • US Midwest (Chicago)
  • Germany Central (Frankfurt)
  • English or multilingual.
  • Model creates a 1024-dimensional vector for each embedding.
  • Maximum 96 sentences per run.
  • Maximum 512 tokens per embedding.
cohere.embed-multilingual-v3.0
  • US Midwest (Chicago)
  • Germany Central (Frankfurt)
  • English or multilingual.
  • Model creates a 1024-dimensional vector for each embedding.
  • Maximum 96 sentences per run.
  • Maximum 512 tokens per embedding.
cohere.embed-english-light-v3.0
  • US Midwest (Chicago)
  • Light models are smaller and faster than the original models.
  • English or multilingual.
  • Model creates a 384-dimensional vector for each embedding.
  • Maximum 96 sentences per run.
  • Maximum 512 tokens per embedding.
cohere.embed-multilingual-light-v3.0
  • US Midwest (Chicago)
  • Light models are smaller and faster than the original models.
  • English or multilingual.
  • Model creates a 384-dimensional vector for each embedding.
  • Maximum 96 sentences per run.
  • Maximum 512 tokens per embedding.