About the Generation Models in Generative AI
You can prompt a generation model to generate text. Following are some example use cases for text generation models:
- Copy generation: Draft marketing copy, emails, blog posts, product descriptions, documents, and so on.
- Ask questions: Ask the models to explain concepts, brainstorm ideas, solve problems, and answer questions on information that the models have been trained on.
- Stylistic conversion: Edit text or rewrite content in a different style or language.
- Not Available on-demand: All OCI Generative AI foundational pretrained models supported for the on-demand serving mode that use the text generation and summarization APIs (including the playground) are now retired. We recommend that you use the chat models instead.
- Can be hosted on clusters: If you host a summarization or a generation model such as
cohere.command
on a dedicated AI cluster, (dedicated serving mode), you can continue to use that model until it's retired. These models, when hosted on a dedicated AI cluster are only available in US Midwest (Chicago). See Retiring the Models for retirement dates and definitions.
Selecting a Generation Model
Select a generation model that you hosted on a dedicated AI cluster, to generate text based on the model size, your project goal, cost, and the model's response.
cohere.command
(deprecated)-
A highly performant generation model with 50 billion parameters and a great general knowledge of the world. Use this model from brainstorming to optimizing for accuracy such as text extraction and sentiment analysis, and for complex instructions to draft your marketing copies, emails, blog posts, and product descriptions, and then review and use them.
cohere.command-light
(deprecated)-
A quick and light generation model. Use this model for tasks that require a basic knowledge of the world and simple instructions, when speed and cost is important. For best results, you must give the model clear instructions. The more specific your prompt, the better this model performs. For example, instead of the prompt, "What is the following tone?", write, "What is the tone of this product review? Answer with either the word positive or negative.".
meta.llama-2-70b-chat
(deprecated)-
This 70 billion parameter model was trained on a dataset of 1.2 trillion tokens, that includes texts from the internet, books, and other sources. Use this model for text generation, language translation, summarization, question answering based on the content of a given text or topic, and content generation such as articles, blog posts, and social media updates.
Generation Model Parameters
When using the generation models, you can vary the output by changing the following parameters.
- Maximum output tokens
-
The maximum number of tokens that you want the model to generate for each response. Estimate four characters per token.
- Temperature
-
The level of randomness used to generate the output text.
Tip
Start with the temperature set to 0 or less than one, and increase the temperature as you regenerate the prompts for a more creative output. High temperatures can introduce hallucinations and factually incorrect information. - Top k
-
A sampling method in which the model chooses the next token randomly from the
top k
most likely tokens. A higher value fork
generates more random output, which makes the output text sound more natural. The default value for k is 0 forcommand
models and -1 forLlama
models, which means that the models should consider all tokens and not use this method. - Top p
-
A sampling method that controls the cumulative probability of the top tokens to consider for the next token. Assign
p
a decimal number between 0 and 1 for the probability. For example, enter 0.75 for the top 75 percent to be considered. Setp
to 1 to consider all tokens. - Stop sequences
-
A sequence of characters—such as a word, a phrase, a newline
(\n)
, or a period—that tells the model when to stop the generated output. If you have more than one stop sequence, then the model stops when it reaches any of those sequences. - Frequency penalty
-
A penalty that's assigned to a token when that token appears frequently. High penalties encourage fewer repeated tokens and produce a more random output.
- Presence penalty
-
A penalty that's assigned to each token when it appears in the output to encourage generating outputs with tokens that haven't been used.
- Show likelihoods
-
Every time a new token is to be generated, a number between -15 and 0 is assigned to all tokens, where tokens with higher numbers are more likely to follow the current token. For example, it's more likely that the word favorite is followed by the word food or book rather than the word zebra. This parameter is available only for the
cohere
models.