OpenAI gpt-oss-120b

OCI Generative AI supports access to the pretrained OpenAI gpt-oss-120b model.

The openai.gpt-oss-120b is an open-weight, text-only language model designed for powerful reasoning and agentic tasks.

Regions for this Model

Important

For supported regions, endpoint types (on-demand or dedicated AI clusters), and hosting (OCI Generative AI or external calls) for this model, see the Models by Region page. For details about the regions, see the Generative AI Regions page.

Key Features

  • Model Name in OCI Generative AI: openai.gpt-oss-120b
  • Model Size: 117 billion parameters
  • Text Mode Only: Input text and get a text output. Images and file inputs such as audio, video, and document files aren't supported.
  • Knowledge: Specialized in advanced reasoning and text-based tasks across a wide range of subjects.
  • Context Length: 128,000 tokens (maximum prompt + response length is 128,000 tokens for each run). In the playground, the response length is capped at 16,000 tokens for each run.
  • Excels at These Use Cases: Because of its training data, this model is especially strong in STEM (science, technology, engineering, and mathematics), coding, and general knowledge. Suitable for high-reasoning, production-level tasks.
  • Function Calling: Yes, through the API.
  • Has Reasoning: Yes.
  • Knowledge Cutoff: June 2024

For key feature details, see the OpenAI gpt-oss documentation.

Dedicated AI Cluster for the Model

For models in on-demand mode, no clusters are required; access them through the Console playground and API. For models available in the dedicated mode, use endpoints created on dedicated AI clusters. Learn about the Dedicated Mode.

The following table lists hardware unit sizes, available regions, and service limits for dedicated AI clusters. This model isn't available for fine-tuning.

Hardware Unit Size Available Regions Limit Name
OAI_A100_80G_X2
  • US Midwest (Chicago)
  • Limit Name: dedicated-unit-a100-80g-count
  • Request Increase by: 2
OAI_H100_X2
  • Brazil East (Sao Paulo)
  • Germany Central (Frankfurt)
  • India South (Hyderabad)
  • Japan Central (Osaka)
  • UK South (London)
  • US East (Ashburn)
  • US Midwest (Chicago)
  • Limit Name: dedicated-unit-h100-count
  • Request Increase by: 2
Important

  • For hardware pricing, see the Cost estimator.
  • If tenancy limits are insufficient for hosting this model on a dedicated AI cluster, request an increase for the relevant hardware limit. For example, request an increase for the dedicated-unit-h100-count limit by 2. See Requesting a Service Limit Increase.

Cluster Performance Benchmarks

Review the OpenAI gpt-oss-120b (New) cluster performance benchmarks for different use cases.

Model Parameters

To change the model responses, you can change the values of the following parameters in the playground or the API.

Maximum output tokens

The maximum number of tokens that you want the model to generate for each response. Estimate four characters per token. Because you're prompting a chat model, the response depends on the prompt and each response doesn't necessarily use up the maximum allocated tokens. The maximum prompt + output length is 128,000 tokens for each run. In the playground, the maximum output tokens is capped at 16,000 tokens for each run.

Tip

For large inputs with difficult problems, set a high value for the maximum output tokens parameter.
Temperature

The level of randomness used to generate the output text. Min: 0, Max: 2, Default: 1

Tip

Start with the temperature set to 0 or less than one, and increase the temperature as you regenerate the prompts for a more creative output. High temperatures can introduce hallucinations and factually incorrect information.
Top p

A sampling method that controls the cumulative probability of the top tokens to consider for the next token. Assign p a decimal number between 0 and 1 for the probability. For example, enter 0.75 for the top 75 percent to be considered. Set p to 1 to consider all tokens. Default: 1

Frequency penalty

A penalty that's assigned to a token when that token appears frequently. High penalties encourage fewer repeated tokens and produce a more random output. Set to 0 to disable. Default: 0

Presence penalty

A penalty that's assigned to each token when it appears in the output to encourage generating outputs with tokens that haven't been used. Set to 0 to disable. Default: 0