Paying for On-Demand Inferencing
You get the following benefits with committing to on-demand inferencing in OCI Generative AI:
- Low barrier to start using Generative AI.
- Access to all available Generative AI foundational models.
- Great for experimenting and evaluating the models.
- Pay as you go for transactions. See the following note for details.
With on-demand inferencing you pay as you go for the following character lengths:
- Chat: prompt length (in characters) + response length (in characters)
- Text generation: prompt length (in characters) + response length (in characters)
- Summarization: prompt length (in characters) + response length (in characters)
- Text Embeddings: input length (in characters)
The following examples calculate on-demand inferencing cost for text generation and text embeddings in OCI Generative AI. For calculating dedicated AI cluster cost, see Paying for Dedicated AI Clusters.
Corresponding a Foundational Model to a Product
To find the unit price for 10,000 transactions of on-demand inferencing, match the foundational model that you use for inferencing to the product in the following table.
Capability | Foundational Base Model | Product for On-Demand Inferencing on Pricing Page |
---|---|---|
Chat | meta.llama-3-70b-instruct |
Oracle Cloud Infrastructure Generative AI - Large Meta |
Chat | cohere.command-r-plus |
Oracle Cloud Infrastructure Generative AI - Large Cohere V2 |
Chat | cohere.command-r-16k |
Oracle Cloud Infrastructure Generative AI - Small Cohere V2 |
Text Generation | cohere.command |
Oracle Cloud Infrastructure Generative AI - Large Cohere |
Text Generation | cohere.command-light |
Oracle Cloud Infrastructure Generative AI - Small Cohere |
Text Generation | meta.llama2_70b-chat |
Oracle Cloud Infrastructure Generative AI - Large Meta |
Summarization | cohere.command |
Oracle Cloud Infrastructure Generative AI - Large Cohere |
Embedding | cohere.embed |
Oracle Cloud Infrastructure Generative AI - Embed Cohere |
The following examples calculate on-demand inferencing cost for text generation and text embeddings in OCI Generative AI. For calculating dedicated AI cluster cost, see Paying for Dedicated AI Clusters.
Chat Example
Paul calls the meta.llama-3-70b-instruct
model with the following
prompt, which is 220 characters
long:
Generate a product pitch for a USB connected compact microphone that can record surround sound. The microphone is most useful in recording music or conversations. The microphone can also be useful for recording podcasts.
The response from the model is 1,618 characters
long. Paul wants
to know the cost for this call. Here are the steps to calculate the cost.
Text Generation Example
Paul calls the cohere.command
model with the following prompt, which is 220 characters
long:
Generate a product pitch for a USB connected compact microphone that can record surround sound. The microphone is most useful in recording music or conversations. The microphone can also be useful for recording podcasts.
The response from the model is 1,618 characters
long. Paul wants to know the cost for this call. Here are the steps to calculate the cost.
Text Embeddings Example
Gina is converting customer contracts into embeddings for a new semantic search application. On average, Gina ingests 16 documents every hour. Each document is about 1,000 characters
long. Gina wants to get an estimate of the monthly bill for generating those embeddings. Here are the steps to calculate the cost.