About the Generation Models in Generative AI

Prompt the OCI Generative AI generation models to generate text.

Important

The text generation feature will be removed from the OCI Generative AI playground, API, and CLI when the cohere.command v15.6 and cohere.command-light v15.6 models are retired. Instead, you can use the chat models. For retirement dates see Retiring the Models.

You can ask questions in natural language and optionally submit text such as documents, emails, and product reviews to the generation models and each model reasons over the text and provides intelligent answers.

Prompt style: Write an email to Susan thanking her for…

Output style for previous prompt: Dear Susan, Thanks for…

Tip

Unlike the chat models, the text generation models don't keep the context of previous prompts. For follow-up questions in the generation models, you can include the previous responses in the next prompt.

Following are some example use cases for text generation models:

  • Copy generation: Draft marketing copy, emails, blog posts, product descriptions, documents, and so on.
  • Ask questions: Ask the models to explain concepts, brainstorm ideas, solve problems, and answer questions on information that the models have been trained on.
  • Stylistic conversion: Edit your text or rewrite content in a different style or language.

Selecting a Generation Model

Select a model to generate text based on the model size, your project goal, cost, and the model's response. Use the playground's provided examples with each listed model to get a feel for how each model responds to the same prompt and then decide which model's response style goes well with your use case.

cohere.command

A highly performant generation model with 50 billion parameters and a great general knowledge of the world. Use this model from brainstorming to optimizing for accuracy such as text extraction and sentiment analysis, and for complex instructions to draft your marketing copies, emails, blog posts, and product descriptions, and then review and use them.

cohere.command-light

A quick and light generation model. Use this model for tasks that require a basic knowledge of the world and simple instructions, when speed and cost is important. For best results, you must give the model clear instructions. The more specific your prompt, the better this model performs. For example, instead of the prompt, "What is the following tone?", write, "What is the tone of this product review? Answer with either the word positive or negative.".

meta.llama-2-70b-chat

This 70 billion parameter model was trained on a dataset of 1.2 trillion tokens, that includes texts from the internet, books, and other sources. Use this model for text generation, language translation, summarization, question answering based on the content of a given text or topic, and content generation such as articles, blog posts, and social media updates.

Tip

  • If the generation models don't respond well to your use case, you can fine-tune a pretrained generation models with your own dataset. See each generation model's key features to find out which model is available for fine-tuning.

  • Learn to calculate cost with examples.

Generation Model Parameters

When using the generate models, you can vary the output by changing the following parameters.

Maximum output tokens

The maximum number of tokens that you want the the model to generate for each response. Estimate four characters per token.

Temperature

The level of randomness used to generate the output text.

Tip

Start with the temperature set to 0 or less than one, and increase the temperature as you regenerate the prompts for a more creative output. High temperatures can introduce hallucinations and factually incorrect information.
Top k

A sampling method in which the model chooses the next token randomly from the top k most likely tokens. A higher value for k generates more random output, which makes the output text sound more natural. The default value for k is 0 for command models and -1 for Llama models, which means that the models should consider all tokens and not use this method.

Top p

A sampling method that controls the cumulative probability of the top tokens to consider for the next token. Assign p a decimal number between 0 and 1 for the probability. For example, enter 0.75 for the top 75 percent to be considered. Set p to 1 to consider all tokens.

Stop sequences

A sequence of characters—such as a word, a phrase, a newline (\n), or a period—that tells the model when to stop the generated output. If you have more than one stop sequence, then the model stops when it reaches any of those sequences.

Frequency penalty

A penalty that is assigned to a token when that token appears frequently. High penalties encourage fewer repeated tokens and produce a more random output.

Presence penalty

A penalty that is assigned to each token when it appears in the output to encourage generating outputs with tokens that haven't been used.

Show likelihoods

Every time a new token is to be generated, a number between -15 and 0 is assigned to all tokens, where tokens with higher numbers are more likely to follow the current token. For example, it's more likely that the word favorite is followed by the word food or book rather than the word zebra. This parameter is available only for the cohere models.