About the Generation Models in Generative AI

The following pretrained foundational text generation models are available in Generative AI:

  • cohere.command
  • cohere.command-light
  • meta.llama-2-70b-chat

Prompt style: Write an email to Susan thanking her for…

Output style for previous prompt: Dear Susan, Thanks for…

Choosing a Generation Model

Choose a text generation model based on your goals.

cohere.command

A highly performant generation model with 50 billion parameters and a great general knowledge of the world. Use this model from brainstorming to optimizing for accuracy such as text extraction and sentiment analysis, and for complex instructions to draft your marketing copies, emails, blog posts, and product descriptions, and then review and use them.

cohere.command-light

A quick and light generation model. Use this model for tasks that require a basic knowledge of the world and simple instructions, when speed and cost is important. For best results, you must give the model clear instructions. The more specific your prompt, the better this model performs. For example, instead of the prompt, "What is the following tone?", write, "What is the tone of this product review? Answer with either the word positive or negative.".

meta.llama-2-70b-chat

A highly performant generation model with 70 billion parameters and a great general knowledge of the world. Use this model from brainstorming to optimizing for accuracy such as text extraction and sentiment analysis, and for complex instructions to draft your marketing copies, emails, blog posts, and product descriptions, and then review and use them.

Following are some example use cases for these models:

  • Copy generation: Draft marketing copy, emails, blog posts, product descriptions, documents, and so on.
  • Ask questions: Ask the models to explain concepts, brainstorm ideas, solve problems, and answer questions on information that the models have been trained on.
  • Stylistic conversion: Edit your text or rewrite content in a different style or language.

Generation Model Parameters

When using the generate models, you can fine-tune the output by changing the following parameters.

Maximum output tokens

The maximum number of tokens that you want the the model to generate for each response. Estimate four characters per token.

Temperature

The level of randomness used to generate the output text.

Tip

Start with the temperature set to 0 or less than one, and increase the temperature as you regenerate the prompts for a more creative output. High temperatures can introduce hallucinations and factually incorrect information.
Top k

A sampling method in which the model chooses the next token randomly from the top k most likely tokens. A higher value for k generates more random output, which makes the output text sound more natural. The default value for k is 0 for command models and -1 for Llama 2 models, which means that the models should consider all tokens and not use this method.

Top p

A sampling method that eliminates tokens with a low likelihood by assigning p a minimum percentage for the next token's likelihood. The default value for p is 0.75, which eliminates the bottom 25 percent for the next token.

The Top p method ensures that only the most likely tokens with the sum p of their probabilities are considered for generation at each step. A higher value for p introduces more randomness into the output. Set the value to either 1.0 or 0 to disable this method.

If you're also using the top k method, then the model considers only the top tokens whose probabilities add up to p percent and ignores the rest of the k tokens. For example, if k is 20 but only the probabilities of the top 10 add up to the value of p, then only the top 10 tokens are chosen.

Stop sequences

A sequence of characters—such as a word, a phrase, a newline (\n), or a period—that tells the model when to stop the generated output. If you have more than one stop sequence, then the model stops when it reaches any of those sequences.

Frequency penalty

A penalty that is assigned to a token when that token appears frequently. High penalties encourage fewer repeated tokens and produce a more random output.

Presence penalty

A penalty that is assigned to each token when it appears in the output to encourage generating outputs with tokens that haven't been used.

Show likelihoods

Every time a new token is to be generated, a number between -15 and 0 is assigned to all tokens, where tokens with higher numbers are more likely to follow the current token. For example, it's more likely that the word favorite is followed by the word food or book rather than the word zebra. This parameter is available only for the cohere models.