QuickStart

Use the following steps to begin building Enterprise AI Agents with the OCI Responses API.

Tip

For an overview of the steps and features offered in this QuickStart, keep this page collapsed and review the titles. Then, expand each section for details.

The first six steps setup your environment to get a model response using the Responses API. The rest of the sections show other features for building agents.

Set Up OCI Responses API

Before you create a project or call the OCI Responses API, ensure that the required IAM permissions are in place.

Grant access to OCI Generative AI resources

Ask an administrator to grant the user group that you belong to permission to manage OCI Generative AI resources in the compartment used for this QuickStart, <QuickStart-compartment-name>.

allow group <your-group-name> 
to manage generative-ai-family
in compartment <QuickStart-compartment-name>

With this permission, you can create all Generative AI service resources including projects, API keys, vector stores, and semantic stores in the <QuickStart-compartment-name>.

Important

This QuickStart uses a sandbox-style setup for its permissions. We recommend that you grant broad access only to administrators or users working in sandbox environments. For production use, apply more restrictive policies.

A project is the foundational resource for organizing AI agents and related assets in OCI Generative AI. You can create a project by using the Console.

After you create a project, you can manage it through the Console. For example, you can update its details, move it to another compartment, manage tags, or delete it. These actions are available from the Actions menu (three dots) on the project list page.

To begin, navigate to the project list page and select Create project.

Basic information

Start by defining the core attributes of the project.

Name (optional):

Provide a name that begins with a letter or underscore, followed by letters, numbers, hyphens, or underscores. The length can be 1 to 255 characters. If you don't specify a name, one is generated automatically by using the format generativeaiproject<timestamp>, for example generativeaiproject20260316042443. You can update this later.
Description (optional):

Add a brief description to help identify the purpose of the project.
Compartment:

Select the <QuickStart-compartment-name> compartment.

Data retention

Configure how long the generated data is stored.

Response retention:

Defines how long individual model responses are stored after generation.
Conversation retention:

Decides how long an entire conversation is retained after its most recent update.

You can set both values in hours, up to a maximum of 720 hours (30 days).

Short-term memory compaction config

Short-term memory compaction summarizes recent conversation history into a compact representation. It helps maintain context while reducing token usage and latency.

Enable (optional):

Turn on short-term memory compaction to condense prior interactions automatically.
Model selection:

If you enable this feature, select a compaction model. Available models vary by region.

For available models, see For Short-Term Memory (Conversation History) Compaction.

Important

You can select the compaction model only at creation time and can't change this option later.
If enabled, this feature can't be disabled without deleting the project.

Long-term memory config

By enabling long-term memory, the service can extract and persist important information from conversations for future use. This data is stored as embeddings, making it searchable and reusable across interactions.

Enable (optional):

Turn on long-term memory to retain key insights from conversations.
Model selection:

Select the following models:
- Extraction model: Identifies and captures important information.
- Embedding model: Converts stored data into vector representations for retrieval.

Important

You must select these models during project creation.
After it's enabled, you can't change the models and long-term memory can't be disabled unless the project is deleted.

Tip

For best results, set both response retention and conversation retention to the maximum duration of 720 hours when you use long-term memory.

On-Demand Mode

On-demand models are hosted and managed by OCI and are available without requiring dedicated AI clusters. Examples:

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    input="Write a one-sentence explanation of what a database is."
)

response = client.responses.create(
    model="google.gemini-2.5-pro",
    input="Write a one-sentence explanation of what a database is."
)

Dedicated Mode

For production workloads requiring isolation or predictable performance, you can host models on a dedicated AI cluster. In this case, use the cluster endpoint OCID as the model identifier:

response = client.responses.create(
    model="<dedicated-ai-cluster-endpoint-ocid>",
    input="Write a one-sentence explanation of what a database is."
)

Ensure that you select the same region as the cluster, when you send the request.

Select a model in an available region, with the available mode that best fits your requirements for performance, cost, and control.

The following examples shows how to call the Responses API by using Python. If the request returns an explanation, the OCI Responses API is working correctly.

from openai import OpenAI

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",  # change the region if needed
    api_key="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",  # replace with your Generative AI API key created in Step 2
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx"  # replace with your Generative AI project OCID created in Step 1
)

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    input="Write a one-sentence explanation of what a database is."
)

print(response.output_text)

Use this approach when using OCI IAM API signing keys (not OCI Generative AI API keys). See Required Keys and OCIDs.

from openai import OpenAI
from oci_openai import OciUserPrincipalAuth
import httpx

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",
    api_key="not-used",
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx",
    http_client=httpx.Client(
        auth=OciUserPrincipalAuth(
            config_file="~/.oci/config",
            profile_name="DEFAULT",
        )
    )

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    input="Write a one-sentence explanation of what a database is."
)

print(response.output_text)

Use this approach when running code locally:

from openai import OpenAI
from oci_openai import OciSessionAuth
import httpx

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",  # update region if needed
    api_key="not-used",
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx",  # project OCID created earlier
    http_client=httpx.Client(auth=OciSessionAuth(profile_name="DEFAULT"))  # update profile if needed
)

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    input="Write a one-sentence explanation of what a database is."
)

print(response.output_text)

Use this approach when workiing with managed environments such as OCI Functions or OCI Container Engine for Kubernetes (OKE):

from openai import OpenAI
from oci_openai import OciResourcePrincipalAuth
import httpx

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",  # update region if needed
    api_key="not-used",
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx",  # project OCID created earlier
    http_client=httpx.Client(auth=OciResourcePrincipalAuth()),
)

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    input="Write a one-sentence explanation of what a database is."
)

print(response.output_text)

Add More Features

If you encounter issues when calling the API, enabling debug logging can help with troubleshooting. Debug logs display the raw HTTP requests and responses, including the opc-request-id, which is useful when working with Oracle support.

You can reference this request ID when reporting issues to help identify and diagnose problems more quickly.

from openai import OpenAI
import logging

logger = logging.getLogger("openai")
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

# Create and use the OpenAI client as usual
client = OpenAI(
    ...
)

The OCI Responses API supports streaming, where you can receive model outputs incrementally as the tokens are generated.

Stream All Events

response_stream = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Explain the difference between structured and unstructured data.",
    stream=True
)

for event in response_stream:
    print(event)

Stream Only Text Output (Delta Tokens)

response_stream = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Explain the difference between structured and unstructured data.",
    stream=True
)

for event in response_stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

Streaming is especially useful for interactive applications where users can read the responses as they're generated.

Reasoning controls let you tune how much effort the model uses before producing a response. This is useful when you want to prioritize speed, depth, or a balance of both.

Tip

Review the key features in the model's detail page to ensure the model that you're calling has reasoning. See Available Chat Models.

Reasoning Effort

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Solve 18 * (4 + 2).",
    reasoning={"effort": "medium"},
    store=False,
)

print(response.output_text)

Reasoning Summary Output

If you're building a chatbot, enabling reasoning summaries can help users better understand how the model arrived at a result. During streaming, users can also see reasoning tokens while the model is thinking.

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Solve 18 * (4 + 2).",
    reasoning={"summary": "auto"},
    store=False,
)

print(response.output_text)

Use structured outputs when you want the model to return data in a predictable format instead of free-form text. This is useful for extracting fields from unstructured input, passing results to other systems, or showing specific values in a UI. In the OCI Responses API, you define a schema and parse the response against it, which makes the output more consistent and easier for applications to use. Structured outputs match a supplied schema, while a JSON mode only guarantees valid JSON.

A strongly typed object is an object whose fields and data types are defined in advance. For example, if a schema says customer_name must be a string and priority must be one of "low", "medium", or "high", the parsed result follows that structure. This makes it easier for code to work with the response safely and predictably. The OCI Responses API supports this by allowing you to define a schema and parse the model output into strongly typed objects.

This approach is useful when integrating with downstream systems, enforcing consistency, or extracting specific fields from natural language input.

from pydantic import BaseModel
from typing import Literal

class SupportRequest(BaseModel):
    customer_name: str
    product: str
    issue_summary: str
    priority: Literal["low", "medium", "high"]
    requested_action: str

response = client.responses.parse(
    model="<supported-model-id>",
    input=[
        {"role": "system", "content": "Extract the support request into the schema."},
        {
            "role": "user",
            "content": (
                "Sarah Johnson from Example Company says the mobile inventory app "
                "crashes whenever she scans more than 20 items in one session. "
                "This is delaying warehouse processing, and she wants a fix as soon as possible."
            ),
        },
    ],
    text_format=SupportRequest,
)

support_request = response.output_parsed
print(support_request)

Example output:

SupportRequest(
customer_name='Sarah Johnson', 
product='mobile inventory app', 
issue_summary='App crashes whenever she scans more than 20 items in one session.', 
priority='high', 
requested_action='Provide a fix as soon as possible'
)

The OCI Responses API supports models that accept multimodal inputs. You can combine text with images, files, and reasoning controls to support richer workflows such as document analysis, image understanding, and more deliberate model responses.

Tip

Review the key features in the model's detail page to ensure the model that you're calling accepts multimodal inputs. See Available Chat Models.

Image Input as Base64-Encoded Data URL

import base64

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image("/path/to/image.png")

response = client.responses.create(
    model="google.gemini-2.5-pro",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Describe the main objects in this image."},
                {
                    "type": "input_image",
                    "image_url": f"data:image/jpeg;base64,{base64_image}",
                    "detail": "high",
                },
            ],
        }
    ],
)

print(response.output_text)

Image Input as an Internet URL

response = client.responses.create(
    model="google.gemini-2.5-pro",
    store=False,
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Describe the scene shown in this image."},
                {
                    "type": "input_image",
                    "image_url": "https://example.photos/id/123",
                },
            ],
        }
    ],
)

print(response.output_text)

Replace image_url with a valid image URL.

File Input as File ID

Important

The file ID input feature is supported only with Google Gemini models. For each request, the combined size of all uploaded PDF files must be under 50 MB, and you can provide a maximum of 10 file IDs in the request. See supported Gemini models.

file = client.files.create(
    file=open("<path-to-file>", "rb"),
    purpose="user_data"
)

response = client.responses.create(
    model="google.gemini-2.5-pro",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_file",
                    "file_id": file.id,
                },
                {
                    "type": "input_text",
                    "text": "Summarize this document.",
                },
            ]
        }
    ]
)

print(response.output_text)

File Input as Internet-Accessible URL

response = client.responses.create(
    model="google.gemini-2.5-flash",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Summarize this file."},
                {
                    "type": "input_file",
                    "file_url": "https://www.example.com/letters/example-letter.pdf",
                },
            ],
        }
    ],
)

print(response.output_text)

Tools

Add OpenAI Tools to OCI Responses API

Tip

Expand the following sections for supported models, regions, and examples on how to use these tools.

OCI Generative AI supports tools with the Responses API, allowing supported models to use integrated tools during response generation. Add tool definitions in the tools property of a Responses API request so the model can retrieve relevant content from the data, run Python code, call application-defined functions, or use tools exposed by a remote MCP server.

Tool support is available only through the API. Depending on the request, the model can decide whether to use one of the configured tools. Expand the following sections for supported tools. Example:

"tools": [
        { "type": "function", "name": "get_calendar_events" },
        { "type": "function", "name": "purchase_tickets" }
        { "type": "file_search", "vector_store_ids": ["<vector_store_id>"]
        }
    ]

Some OCI Responses API tools use OCI Generative AI containers as temporary, isolated workspaces. Containers are used by Code Interpreter.

A container can hold uploaded files, files created by the model, logs, reports, charts, and temporary working data. Files and generated artifacts remain available while the container is active. When the container expires or is deleted, the workspace state is no longer available.

Use the Container Files API to upload files to a container, list files, retrieve generated files, and delete files.

API Reference: OpenAI Container Files API documentation and Container Files API.

You can use containers in two ways:


Container mode	Description
Automatic container	OCI Generative AI provisions or reuses a container for the current context. Hosted Shell uses `container_auto`. Code Interpreter uses `auto`.
Explicit container	The application creates a container and passes the container ID in the tool request. Use this when you need more control over memory size, uploaded files, container reuse, network policy, Resource Principal access, or attached Skills where supported.

Container Memory Limits

Supported container sizes are:

1 GB
4 GB
16 GB
64 GB

By default, a tenancy has a shared container memory pool of 64 GB. This shared limit can be divided across several containers. For example, it can support sixteen 4 GB containers or four 16 GB containers. If you need more capacity, you can submit a service request.

Important

For multi-step workflows, reuse the same container so the tool can continue working with existing files and generated outputs. Containers expire after 20 minutes of inactivity. Treat containers as temporary workspaces. When a container expires, its content such as files, and in-memory state, such as Python variables and Python objects are lost.

When you add the File Search tool to the Responses API, the model can retrieve content from files stored in a vector store during response generation. This is useful when you want responses to reflect the documents you provide rather than relying only on the model’s built-in knowledge.

By creating vector stores and adding files to them, you enable semantic and keyword-based search across your data. This extends the model’s built-in knowledge with your custom content and helps produce more precise, context-aware answers.

Because File Search is handled by the service, your application doesn't need to implement its own retrieval pipeline.

Prepare a Vector Store

Before using File Search, create a vector store and add the files that you want the model to reference. OCI Generative AI supports the following APIs for file and vector store management:


API Set	Description
Files	Upload and manage files.
Vector Store Files	Manage files attached to a vector store.
Vector Store File Batches	Add and manage multiple files in a vector store batch.
Container Files	Manage files in a container.

Example

To use File Search in a request, add a tool definition in the tools property with type: "file_search" and provide the vector store ID.

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Summarize the main ideas covered in the documents in this vector store.",
    tools=[
        {
            "type": "file_search",
            "vector_store_ids": ["<vector_store_id>"]
        }
    ]
)

print(response)

In this example:

The model can use the vector store content during response generation.
File retrieval is managed by the platform.
Hybrid search parameters aren't supported with the File Search tool.

Use Code Interpreter to let the model write and run Python code in an isolated OCI-managed container. Code Interpreter is useful for calculations, data analysis, file processing, chart generation, and other computation-heavy tasks. Code Interpreter supports dynamic file interaction during the life of the container. The model can read files you provide and can also create new files.

Note

The OCI code interpreter tool uses the same format as the OpenAI code interpreter tool used with the Responses API with the OCI OpenAI-compatible endpoint. For syntax and request details, see the Code Interpreter topic in OpenAI documentation.

Tip

In prompts, reference the Code Interpreter as the python tool. For example: Use the python tool to solve the problem.

The Python environment includes more than 420 preinstalled libraries, including Pandas, Matplotlib, and SciPy. Because Code Interpreter uses OCI Generative AI containers, see Containers for Hosted Tools for shared details about container modes, memory sizes, file persistence, expiration, and the Container Files APIs.

Because code runs in an isolated environment with no external network access, Code Interpreter is a good option when the workflow needs computation or file processing in a controlled setting.

To use Code Interpreter, add a tool definition in the tools property with "type": "code_interpreter".

Python example:

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    tools=[
        {
            "type": "code_interpreter",
            "container": {"type": "auto"}
            "memory_limit": "1g"
        }
    ],
    instructions="Use the python tool to solve the problem and explain the result.",
    input="Find the value of (18 / 3) + 7 * 2."
)

print(response.output_text)

To reuse a specific container, create the container first and pass the container ID in the request.

Python example:

container = client.containers.create(
    name="analysis-container",
    memory_limit="4g"
)

response = client.responses.create(
    model="xai.grok-code-fast-1",
    tools=[
        {
            "type": "code_interpreter",
            "container": container.id
        }
    ],
    input="Use the python tool to calculate the average of 12, 18, 24, and 30."
)

print(response.output_text)

Note

The OCI code interpreter that's used as a tool for the Ressponses API uses the same format as the OpenAI code interpreter with the OCI OpenAI-compatible endpoint. For syntax and request see the following references:

Use Function Calling to let the model request data or actions from your application during a Responses API workflow. This is useful when the model needs information or operations that aren't available in the prompt itself, such as calendar data, internal application state, or the result of a custom operation.

With this pattern, the model doesn't execute the function directly. Instead, it returns the function name and arguments, your application runs the function, and then your application sends the function output back so the model can continue and produce the user-facing answer.

What This Enables

Function Calling is useful when your application needs to stay in control of execution while still allowing the model to decide when outside information is needed.

Example use cases:

Looking up calendar events
Retrieving application data
Calling internal or external API
Running business logic or calculations

Within your function, you can call an external service, a database, one of your own library APIs, a CLI, or a local MCP server.

This approach gives you flexibility while keeping the execution path inside the application.

Execution Flow

A typical function calling interaction works las follows:

The client sends a request that includes one or more function tool definitions.
The model decides whether one of those tools is needed.
If a tool is needed, the model returns the function name and arguments.
The application runs the function and prepares the result.
The application sends that result back in a follow-up request.
The model uses that result to complete the response.

State Handling Options

There are two common ways to manage state across these requests:

Service-managed state

Recommended for most use cases. The follow-up request includes previous_response_id, and the service tracks the earlier exchange.
Client-managed state

The application keeps the full interaction history and sends the accumulated context with each request.

Tip

Keep tool definitions precise. Clear names, well-written descriptions, and well-defined parameters help the model select the right tool and generate usable arguments.

Define a Function Tool

To define a function tool, add an entry in the tools property with "type": "function".

The following example defines a tool that retrieves calendar events for a specified date:

tools = [
    {
        "type": "function",
        "name": "get_calendar_events",
        "description": "Return calendar events scheduled for a specific date.",
        "parameters": {
            "type": "object",
            "properties": {
                "date": {
                    "type": "string",
                    "description": "Date to query, for example 2026-04-02"
                }
            },
            "required": ["date"],
        },
    },
]

Include this tools array in the client.responses.create() request.

Example: Service-Managed State

In this pattern, the first request lets the model decide whether the tool is needed. The second request sends the tool result back and references the earlier response.

import json

tools = [
    {
        "type": "function",
        "name": "get_calendar_events",
        "description": "Return calendar events scheduled for a specific date.",
        "parameters": {
            "type": "object",
            "properties": {
                "date": {
                    "type": "string",
                    "description": "Date to query, for example 2026-04-02"
                }
            },
            "required": ["date"],
        },
    },
]

def get_calendar_events(date):
    # Replace this with actual calendar logic or an API call
    return [
        {"time": "09:00", "title": "Team standup"},
        {"time": "13:00", "title": "Design review"},
        {"time": "16:00", "title": "Project check-in"},
    ]

# Initial request
response = client.responses.create(
    model="openai.gpt-oss-120b",
    tools=tools,
    input="Show the calendar events for 2026-04-02.",
)

# Execute the requested function
tool_outputs = []
for item in response.output:
    if item.type == "function_call" and item.name == "get_calendar_events":
        args = json.loads(item.arguments)
        events = get_calendar_events(**args)
        tool_outputs.append({
            "type": "function_call_output",
            "call_id": item.call_id,
            "output": json.dumps({"events": events}),
        })

# Follow-up request
final = client.responses.create(
    model="openai.gpt-oss-120b",
    instructions="Summarize the schedule clearly for the user.",
    tools=tools,
    input=tool_outputs,
    previous_response_id=response.id,
)

print(final.output_text)

Example: Client-Managed State

In this pattern, the application keeps the full exchange and resubmits it with the follow-up request.

import json

tools = [
    {
        "type": "function",
        "name": "get_calendar_events",
        "description": "Return calendar events scheduled for a specific date.",
        "parameters": {
            "type": "object",
            "properties": {
                "date": {
                    "type": "string",
                    "description": "Date to query, for example 2026-04-02"
                }
            },
            "required": ["date"],
        },
    },
]

def get_calendar_events(date):
    # Replace this with actual calendar logic or an API call
    return [
        {"time": "09:00", "title": "Team standup"},
        {"time": "13:00", "title": "Design review"},
        {"time": "16:00", "title": "Project check-in"},
    ]

conversation = [
    {"role": "user", "content": "Show the calendar events for 2026-04-02."}
]

response = client.responses.create(
    model="openai.gpt-oss-120b",
    tools=tools,
    input=conversation,
)

conversation += response.output

for item in response.output:
    if item.type == "function_call" and item.name == "get_calendar_events":
        args = json.loads(item.arguments)
        events = get_calendar_events(**args)
        conversation.append({
            "type": "function_call_output",
            "call_id": item.call_id,
            "output": json.dumps({"events": events}),
        })

final = client.responses.create(
    model="openai.gpt-oss-120b",
    instructions="Summarize the schedule clearly for the user.",
    tools=tools,
    input=conversation,
)

print(final.output_text)

Function calling is a strong option when the application must remain responsible for execution, access control, and integration logic, while still allowing the model to request the information it needs.

Tool Choice

Tip

Depending on the request, the model can decide whether to use one of the configured tools. When needed, you can also guide tool behavior with the tool_choice property.

"tool_choice": {
    "type": "allowed_tools",
    "mode": "auto",
    "tools": [
        { "type": "function", "name": "get_calendar_events" },
        { "type": "function", "name": "purchase_tickets" }
    ]
  }
}

MCP (Model Context Protocol) tools are executable functions or capabilities that AI models can use to interact with external systems. Use MCP Calling in OCI Generative AI to let a model access executable tools exposed by a remote MCP server during a Responses API request. These tools can provide access to external systems such as API, databases, filesystems, or application endpoints. OCI Generative AI communicates directly with remote MCP servers as part of the request workflow.

Note

The OCI MCP calling tool uses the same format as the OpenAI MCP calling for the Responses API, with the OCI OpenAI-compatible endpoint. For syntax and request details, see the OpenAI MCP documentation.

When To Use MCP Calling

Use MCP Calling when the model needs access to tools hosted on a remote MCP server. This approach is useful when you want:

The Enterprise AI Agent platform to communicate with the MCP server directly
Fewer client-side orchestration steps
Lower latency than a client-executed tool pattern
Access to tools exposed through a remote MCP server

Key Features

MCP Calling provides the following benefits:

Direct platform-to-server communication: Unlike standard function calling, which returns control to the client application, MCP Calling lets OCI Generative AI communicate directly with the remote MCP server.
Lower latency: Because the request doesn't require an extra client round trip, MCP Calling can reduce orchestration overhead.
Transport Support: Supports Streamable HTTP (SSE deprecated and unsupported).

Defining an MCP Tool

To define an MCP tool, add an entry in the tools property with "type": "mcp".

response_stream = client.responses.create(
    model="openai.gpt-5.4",
    tools=[
        {
            "type": "mcp",
            "server_label": "dmcp",
            "server_description": "A Dungeons and Dragons MCP server to assist with dice rolling.",
            "server_url": "https://mcp.deepwiki.com/mcp",
            "require_approval": "never",
        },
    ],
    input="Roll 2d4+1",
    stream=True,
)

for event in response_stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

This example streams the response and prints text as it's generated.

Restrict the Tools Exposed by an MCP Server

A remote MCP server can expose many tools, which can increase cost and latency. If the application needs only a subset of those tools, use allowed_tools to limit the set.

response_stream = client.responses.create(
    model="openai.gpt-oss-120b",
    tools=[
        {
            "type": "mcp",
            "server_label": "dmcp",
            "server_description": "A Dungeons and Dragons MCP server to assist with dice rolling.",
            "server_url": "https://mcp.deepwiki.com/mcp",
            "require_approval": "never",
            "allowed_tools": ["roll"],
        },
    ],
    input="Roll 2d4+1",
    stream=True,
    store=False,
)

Provide Authentication to the MCP Server

If the remote MCP server requires authentication, pass the access token in the authorization field.

response_stream = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    tools=[
        {
            "type": "mcp",
            "server_label": "calendar",
            "server_url": "https://calendar.example.com/mcp",
            "authorization": "$CALENDAR_OAUTH_ACCESS_TOKEN"
        },
    ],
    input="List my meetings for 2026-02-02.",
    stream=True
)

Pass only the raw token value. Don't include the Bearer prefix.

OCI sends the token in the API request body over TLS. OCI doesn't decode, inspect, store, or log the token. We recommend that you use TLS-encrypted MCP server endpoints.

Add xAI Tools to OCI Responses API

Tip

Expand the following sections for supported models, regions, and examples on how to use these tools.

Supported Models and Regions for xAI-Compatible Tools
xAI Models	Regions	Endpoint
Supported xAI Models	Supported Regions	OCI Responses API

Although you call xAI models through the OCI Responses API, which is compatible with the OpenAI Responses API, the supported xAI tools are provided by xAI and accept the xAI tool parameters and limits. See the following examples.

Supported xAI-Compatible Tools
Tool	Responses API `tools[].type`	See Examples
Web Search	`web_search`	xAI Web Search Tool
X Search	`x_search`	xAI X Search Tool
Code Execution	`code_interpreter`	xAI Code Execution Tool

Use the xAI web search tool to search the internet in real time, extract information from web pages, and have a supported xAI model answer queries based on the search. This tool is supported only through the OCI Responses API.

Parameters

allowed_domains: search within specified domains (up to 5)
excluded_domains: exclude domains from search (up to 5)
enable_image_understanding: analyze images found with the search

Tip

Use the xAI Web Search tool settings and limits, not OpenAI web search settings.

Don’t use unsupported OpenAI options such as user_location.
When specifying domains, follow the xAI limit of 10 domains (not 100).
Use either allowed domains or excluded domains, but not both.
If image understanding is enabled for Web Search, it also applies to X Search when both tools are used in the same request.

Reference

Web Search in xAI documentation

Python Example

from openai import OpenAI
from oci_openai import OciUserPrincipalAuth
import httpx

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",
    api_key="not-used",
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx",
    http_client=httpx.Client(
        auth=OciUserPrincipalAuth(
            config_file="~/.oci/config",
            profile_name="DEFAULT"
        )
    )

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    tools={
        "type": "web_search",
        "enable_image_understanding": True,
        "allowed_domains": ["docs.oracle.com","www.oracle.com"],
        
        # NOTE: allowed_domains and excluded_domains CANNOT be used together.
        #"excluded_domains": ["www.example1.com","www.example2.com"],
        }
    input="Tell me about xAI tools in OCI Generative AI."
)

print(response.output_text)

Use the X Search tool to search posts, users, and threads on X (formerly Twitter). Using this tool, you can access real-time social media content, analyze posts, and gather insights from X.

Parameters

allowed_x_handles: search the specified X handles (up to 10)
excluded_x_handles: exclude the specified X handles (up to 10)
from_date: start date for the search (ISO8601 format)
to_date: end date for the search (ISO8601 format)
enable_image_understanding: analyze images found with the search
enable_video_understanding: analyze videos found with the search

Reference

X Search in xAI documentation

Python Example

from openai import OpenAI
from oci_openai import OciUserPrincipalAuth
import httpx

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",
    api_key="not-used",
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx",
    http_client=httpx.Client(
        auth=OciUserPrincipalAuth(
            config_file="~/.oci/config",
            profile_name="DEFAULT"
        )
    )

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    tools={
            "type": "x_search",
            "enable_image_understanding": True,
            "allowed_x_handles": ["example-person1-handle", "example-person2-handle"],
            "from_date": "2026-01-01",
            "to_date": "2026-04-30"
            "enable_image_understanding": True,
             "enable_video_understanding": True,
        },
    input="Give me a summary of the last 10 posts by @example-person1."
)

print(response.output_text)

By using the code execution tool in the tools parameter of the OCI Responses API, you can have the supported models run Python code in real time for tasks such as calculations, data analysis, statistical computations, and solving math problems.

Note

The xAI code execution tool is supported with all OCI Responses API SDKs.

Reference: Code Execution Tool in xAI documentation

Python Example

from openai import OpenAI
from oci_openai import OciUserPrincipalAuth
import httpx

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",
    api_key="not-used",
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx",
    http_client=httpx.Client(
        auth=OciUserPrincipalAuth(
            config_file="~/.oci/config",
            profile_name="DEFAULT"
        )
    )

response = client.responses.create(
    model="xai.grok-4.20-reasoning",
    tools=[
        {
            "type": "code_interpreter",
            "container": {"type": "auto"},
        },
    input="Calculate the monthly payment, total interest paid, "
            "and the first 12 months of an amortization schedule for a "
            "$350,000 loan at 6.25% annual interest over 30 years. "
            "Present the results clearly."
)

print(response.output_text)

You can have the model include citations when using the web search or X search tools. The xAI web search and X search tools can provide citations based on their search.

To use citations, use response.citations.

Reference: Citations in xAI documentation

Python Example

from openai import OpenAI
from oci_openai import OciUserPrincipalAuth
import httpx

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",
    api_key="not-used",
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx",
    http_client=httpx.Client(
        auth=OciUserPrincipalAuth(
            config_file="~/.oci/config",
            profile_name="DEFAULT",
        )
    )

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
      tools=[
        web_search(),
        x_search(),
    ],
    include=["inline_citations"],  # Enable inline citations
    input="Tell me about Enterprise AI Agents in OCI Generative AI."
)

print(response.output_text)
print(response.citations)

from openai import OpenAI
from oci_openai import OciUserPrincipalAuth
import httpx

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",
    api_key="not-used",
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx",
    http_client=httpx.Client(
        auth=OciUserPrincipalAuth(
            config_file="~/.oci/config",
            profile_name="DEFAULT",
        )
    )

response = client.responses.create(
    model="xai.grok-4.20-reasoning",
    tools=[
        {
            "type": "web_search",
            "enable_image_understanding": True,
            "filters": {"allowed_domains": ["docs.oracle.com","www.oracle.com"]},
        },
        {
            "type": "x_search",
            "enable_video_understanding": True,
            "allowed_x_handles": ["example-person1-handle", "example-person2-handle"],
            "from_date": "2026-01-01",
            "to_date": "2026-04-30",
        },
         {
            "type": "code_interpreter",
            "container": {"type": "auto"},
        },
       
    ],
    tool_choice="auto",
    input="Tell me about xAI tools in OCI Generative AI "
        "and then give me a summary of the last 5 posts by example-person1."
)

print(response.output_text)

print("\n\nCitations:")
print(response.citations)

print("\n\nTool Calls:")
print(response.tool_calls)

Set Up the OCI NL2SQL Tool

Use SQL Search (NL2SQL) to convert natural-language requests into validated SQL for enterprise data in OCI Generative AI.

NL2SQL helps Enterprise AI Agents work with federated enterprise data without moving or duplicating the underlying data. It uses a semantic enrichment layer to map business terms to database tables, columns, and joins, and then generates SQL from natural-language input.

Before you use NL2SQL, you must have:

A source database, such as Oracle Autonomous Database
Two DBTools connections:
- Enrichment Connection
- Query Connection
IAM permissions for Semantic Stores, NL2SQL, Database Tools, and secrets
OCI IAM authentication configured for the OCI service APIs used in this quickstart

Before you create a Semantic Store, create the source database and the required DBTools connections.

You need the following connections:

Enrichment Connection

The Enrichment Connection is a higher-privileged database connection used during enrichment. It needs privileges to:

Execute queries
Perform DDL operations
Access example values from the database

OCI Generative AI uses this connection to read schema details and build the metadata needed for SQL generation.

Query Connection

The Query Connection is a lower-privileged database connection used to run queries on behalf of the querying user.

This separation helps keep generation and execution responsibilities distinct and supports safer access control.

Releated Topics

Before you create a Semantic Store, set up the required IAM policies.

For Semantic Store Administrators

If you gave access to performed QuickStart step to Grant access to OCI Generative AI resources, you can skip this step for the administrators. They already have the permission to manage semantic stores.

Semantic store administrators are admins who create, update, delete, and manage the OCI Generative AI semantic store resource and its NL2SQL-related operations.

Ask an administrator to create an IAM group for the admins. In this topic, the admin group is represented by:

<semantic-store-admin>

allow group <semantic-store-admin> 
to manage generative-ai-semantic-store 
in compartment <QuickStart-compartment-name>

allow group <semantic-store-admin> 
to manage generative-ai-nl2sql 
in compartment <QuickStart-compartment-name>

Admin Tasks Available with The Preceding Two Polices

A <semantic-store-admin> can:

create the semantic store
view and update it
delete or move it
trigger enrichment
inspect enrichment results
generate SQL from natural language for validation/testing
manage NL2SQL operations tied to the store

For OCI Generative AI Semantic Stores

Create a dynamic group for semantic stores that are created in the tenancy or a specified compartment.
Grant the dynamic group permission to:
- Access Database Tools connections
- Read database metadata
- Read Autonomous Database metadata
- Access Generative AI inference
- Read secrets used by Database Tools connections

Create a dynamic group for asemantic stores in the tenancy with the following matching rule:
```
all {resource.type='generativeaisemanticstore'}
```

To restrict the semantic stores to a specific compartment, update the previous condition to:

all {resource.type='generativeaisemanticstore',
 resource.compartment.id='<QuickStart-compartment-name>'}

Create a policy to grant the dynamic group permission to access Database Tools connections in a specified compartment.

allow dynamic-group <dynamic-group-name> 
to use database-tools-family in compartment <QuickStart-compartment-name>'}

Add a policy to grant the dynamic group permission to read secrets used by Database Tools connections.

allow dynamic-group <dynamic-group-name> 
to read secret-family in compartment <QuickStart-compartment-name>

Add a policy to grant the dynamic group permission to read Oracle Database metadata for Database Tools connections.
```
allow dynamic-group <dynamic-group-name> 
to read database-family in compartment <QuickStart-compartment-name>
```
Add a policy to grant the dynamic group permission to read Autonomous Database metadata for Database Tools connections and enrichment jobs.
```
allow dynamic-group <dynamic-group-name> 
to read autonomous-database-family in compartment <QuickStart-compartment-name>
```

Add a policy to grant the dynamic group permission to access the OCI Generative AI resources for inference.

allow dynamic-group <dynamic-group-name> 
to use generative-ai-family in compartment <QuickStart-compartment-name>

What The Preceding Two Polices Provide

The generativeaisemanticstore resource can:

invoke LLM inference through Generative AI
use Database Tools connections for enrichment and querying
read secrets required by Database Tools-backed connections
read Oracle Database and Autonomous Database metadata

For Semantic Store Users

Semantic store users are end users who are allowed to access an existing semantic store and use NL2SQL capabilities, but don't need to administer the resource.

Ask an administrator to create an IAM group for the users. In this topic, the user group is represented by:

<semantic-store-users>

allow group <semantic-store-users> 
to read generative-ai-semantic-store 
in compartment <QuickStart-compartment-name>

allow group <semantic-store-users> 
to manage generative-ai-nl2sql 
in compartment <QuickStart-compartment-name>

User Tasks Available with The Preceding Two Polices

The <semantic-store-users> can:

view the semantic store
use NL2SQL-related capabilities associated with it
inspect and query outputs
access enrichment information

For User Access to Database Tools Connections

Grant the group access to the required Database Tools resources:

allow group <semantic-store-users>
to use database-tools-family in compartment <QuickStart-compartment-name>
where all {request.principal.type='generativeaisemanticstore'}

allow group <semantic-store-users>
to read database-family in compartment <QuickStart-compartment-name>
where all {request.principal.type='generativeaisemanticstore'}

allow group <semantic-store-users>
to read autonomous-database-family in compartment <QuickStart-compartment-name>
where all {request.principal.type='generativeaisemanticstore'}

This QuickStart uses OCI IAM authentication for the OCI service API that create Semantic Stores, run enrichment jobs, and generate SQL.

Set up OCI IAM authentication for the identity that signs requests. The examples in this section use the OCI Python SDK and BaseClient.

You can authenticate by using:

An OCI config file and API signing keys
A security token signer

The following examples use SecurityTokenSigner, but you can also use a standard OCI config signer if that fits your environment better.

Related Topics

To use NL2SQL, create an OCI Semantic Store resource.

A Semantic Store is backed by a vector store with structured data and includes the following DBTools connections:

Enrichment Connection
Query Connection

Create a Semantic Store in the Console

Open the vector stores list page.
Enter a name and description.
Select the <QuickStart-compartment-name> compartment.
Under Data Source Type, select Structured data.
Under Configure sync connector, select OCI Database tool as the connection type.
Enter the Enrichment connection ID, then select Test enrichment connection.
Enter the Querying connection ID, then select Test query connection.
In Schemas, specify the database schema names to ingest.
For Automation, select when enrichment runs:
- None
- On create
Select Create.

Create a Semantic Store by Using Python

The following example creates and manages a Semantic Store by using the OCI service API:

import json
import oci
from oci.base_client import BaseClient
from oci.retry import DEFAULT_RETRY_STRATEGY

API_VERSION = "20231130"
HOST = "https://generativeai.us-ashburn-1.oci.oraclecloud.com"
BASE_PATH = f"/{API_VERSION}"

def get_signer_auth_security_token(profile="DEFAULT"):
    config = oci.config.from_file("~/.oci/config", profile)
    signer = oci.auth.signers.SecurityTokenSigner(config)
    return config, signer

def make_base_client(signer):
    return BaseClient(
        service_endpoint=HOST,
        signer=signer,
        retry_strategy=None,
    )

def create_semantic_store(client, body: dict):
    return client.call_api(
        resource_path=f"{BASE_PATH}/semanticStores",
        method="POST",
        header_params={"content-type": "application/json"},
        body=body,
    )

if __name__ == "__main__":
    config, signer = get_signer_auth_security_token(profile="DEFAULT")
    client = make_base_client(signer)

    create_body = {
        "displayName": "TestSemanticStore",
        "description": "Semantic store for the ADMIN schema",
        "freeformTags": {},
        "definedTags": {},
        "dataSource": {
            "queryingConnectionId": "ocid1.databasetoolsconnection.oc1.xxx",
            "enrichmentConnectionId": "ocid1.databasetoolsconnection.oc1.xxx",
            "connectionType": "DATABASE_TOOLS_CONNECTION",
        },
        "refreshSchedule": {"type": "ON_CREATE"},
        "compartmentId": "xxx",
        "schemas": {
            "connectionType": "DATABASE_TOOLS_CONNECTION",
            "schemas": [{"name": "ADMIN"}],
        },
    }

    create_resp = create_semantic_store(client, create_body)
    print("CREATE status:", create_resp.status)

    create_payload = create_resp.data
    if isinstance(create_payload, (bytes, str)):
        create_payload = json.loads(create_payload)

    print("CREATE response:", json.dumps(create_payload, indent=2))

For Automation, if you selected None instead of On create, you can run enrichment after the semantic store is created. Perform this step if you skipped the On create option.

The enrichment process reads schema metadata, such as tables and columns, from the connected database. OCI Generative AI uses this metadata to help generate the SQL.

To trigger enrichment manually, call the GenerateEnrichmentJob API.

You can also manage enrichment jobs by using the following API:

ListEnrichmentJobs
GetEnrichmentJob
CancelEnrichmentJob

NL2SQL uses OCI service APIs rather than the OCI OpenAI-compatible /openai/v1 paths.

Semantic Store CRUD

Use the following base URL for semantic store CRUD operations:

Base URL: https://generativeai.${region}.oci.oraclecloud.com

Use the following endpoint path:

/20231130/semanticStores

Authentication:

IAM session only

Enrichment Job API

Use the following base URL for enrichment jobs of a semantic store:

Base URL: https://inference.generativeai.${region}.oci.oraclecloud.com

Use the following endpoint path:

/20260325/semanticStores/{semanticStoreId}/

Authentication:

IAM session only

Generate SQL from Natural Language

Use the following base URL for SQL generation:

Base URL: https://inference.generativeai.${region}.oci.oraclecloud.com

Use the following endpoint pattern:

/20260325/semanticStores/{semanticStoreId}/actions/generateSqlFromNl

Authentication:

IAM session only

Available Semantic Store and NL2SQL APIs

Semantic Stores

CreateSemanticStore
ListSemanticStores
GetSemanticStore
UpdateSemanticStore
ChangeSemanticStoreCompartment
DeleteSemanticStore

Enrichment Jobs

ListEnrichmentJobs
GetEnrichmentJob
GenerateEnrichmentJob
CancelEnrichmentJob

Generate SQL

GenerateSqlFromNl

Note

The Base URL differs for the semantic stores and enrichment jobs API.

After the Semantic Store is ready and enrichment has completed, call the NL2SQL API to generate SQL from natural language.

import json
import oci
from oci.base_client import BaseClient

INFERENCE_BASE_URL = "https://inference.generativeai.<region>.oci.oraclecloud.com"
API_VERSION = "20260325"
SEMANTIC_STORE_ID = "ocid1.generativeaisemanticstore.oc1.xxx"

config = oci.config.from_file("~/.oci/config", "oc1")
signer = oci.auth.signers.SecurityTokenSigner(config)

client = BaseClient(
    service_endpoint=INFERENCE_BASE_URL,
    signer=signer,
    retry_strategy=None,
)

resource_path = (
    f"/{API_VERSION}/semanticStores/{SEMANTIC_STORE_ID}/actions/generateSqlFromNl"
)

body = {
    "displayName": "Generate SQL example",
    "description": "Generate SQL from natural language",
    "inputNaturalLanguageQuery": "Give me last week's order details."
}

resp = client.call_api(
    resource_path=resource_path,
    method="POST",
    header_params={"content-type": "application/json"},
    body=body,
)

print("HTTP status:", resp.status)
print("opc-request-id:", resp.headers.get("opc-request-id"))

data = resp.data
if isinstance(data, (bytes, str)):
    data = json.loads(data)

print(json.dumps(data, indent=2))

Observe the OCI Responses API Activities

Use the built-in response trace data in the OCI Responses API to understand how a request was processed. For deeper observability, you can also integrate the OCI Responses API with external observability platforms such as Langfuse.

This QuickStart shows how to inspect execution details returned by the Responses API and how to trace requests by using Langfuse.

Before you begin, you must have:

An OCI Generative AI project
Authentication configured for the OCI Responses API
The OpenAI SDK installed
A working OCI Responses API client

(Optional) To integrate with Langfuse, you need a Langfuse account and credentials.

When you call the OCI Responses API, the response includes an output field. This field is an array of items that describe what occurred during the request.

Each item represents a step in the execution and can include different types, such as:

message
file_search_call
mcp_call

These output items provide visibility into how the request was processed. You can use them to:

Debug and understand model behavior
Display execution steps in a user interface
Build custom observability or logging workflows

For example, after you send a request, you can inspect the output field:

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Explain the difference between structured and unstructured data."
)

for item in response.output:
    print(item.type)

For deeper insight, such as latency, cost, and execution traces, you can integrate the OCI Responses API with an observability platform.

One option is Langfuse, an open-source LLM engineering platform that helps developers debug, monitor, and improve LLM applications. Langfuse provides end-to-end observability for tracing agent actions, supports prompt versioning, and helps evaluate model outputs. It integrates with popular frameworks such as OpenAI, LangChain, and LlamaIndex.

Install the Langfuse SDK:

pip install langfuse

3. Configure Environment Variables

Set the required Langfuse and OCI environment variables:

LANGFUSE_SECRET_KEY="sk-lf-xxxxxxxxx"
LANGFUSE_PUBLIC_KEY="pk-lf-xxxxxxxxx"
LANGFUSE_BASE_URL="https://us.cloud.langfuse.com"

# OCI Generative AI credentials
OCI_GENAI_API_KEY="sk-xxxxxxxxx"
OCI_GENAI_PROJECT_ID="ocid1.generativeaiproject.oc1.xxx"

Import the OpenAI client from the Langfuse SDK. The existing request code stays the same, but requests are automatically traced.

import os
from langfuse.openai import OpenAI  # Import from Langfuse

client = OpenAI(
    base_url="https://inference.generativeai.<region>.oci.oraclecloud.com/openai/v1",
    api_key=os.getenv("OCI_GENAI_API_KEY"),
    project=os.getenv("OCI_GENAI_PROJECT_ID"),
)

After you instrument the client, send a Responses API request as usual. Langfuse automatically traces the request.

Example:

response = client.responses.create(
    model="openai.gpt-oss-120b",
    tools=[
        {
            "type": "mcp",
            "server_label": "dmcp",
            "server_url": "https://mcp.example.com/mcp",
            "require_approval": "never",
        },
    ],
    input="Explain why tracing and observability are important in distributed systems."
)

print(response.output_text)

Oracle Cloud Infrastructure Documentation

QuickStart

Set Up OCI Responses API

Grant access to OCI Generative AI resources

Basic information

Data retention

Short-term memory compaction config

Long-term memory config

Tags

Basic Information

Key Names and Expiration Times

Find the API Key OCID

Grant Permission to the API Key

On-Demand Mode

Dedicated Mode

Add More Features

Stream All Events

Stream Only Text Output (Delta Tokens)

Reasoning Effort

Reasoning Summary Output

Image Input as Base64-Encoded Data URL

Image Input as an Internet URL

File Input as File ID

File Input as Internet-Accessible URL

Tools

Add OpenAI Tools to OCI Responses API

Container Memory Limits

Prepare a Vector Store

Example

What This Enables

Execution Flow

State Handling Options

Define a Function Tool

Example: Service-Managed State

Example: Client-Managed State

Tool Choice

When To Use MCP Calling

Key Features

Defining an MCP Tool

Restrict the Tools Exposed by an MCP Server

Provide Authentication to the MCP Server

Add xAI Tools to OCI Responses API

Set Up the OCI NL2SQL Tool

Enrichment Connection

Query Connection

For Semantic Store Administrators

For OCI Generative AI Semantic Stores

For Semantic Store Users

For User Access to Database Tools Connections

Create a Semantic Store in the Console

Create a Semantic Store by Using Python

Semantic Store CRUD

Enrichment Job API

Generate SQL from Natural Language

Available Semantic Store and NL2SQL APIs

Observe the OCI Responses API Activities