QuickStart Guide for Building Agents in OCI Generative AI

Here are some high-level tasks to get you started with building agents in OCI Generative AI.

Call OCI Responses API with OpenAI SDK

Here are some high-level steps to get you started with agents in OCI Generative AI.

  1. In the Console, create a project.
  2. Create an OCI Generative AI API key.
  3. Add the following IAM permission:
    allow group <your-group-name> to manage generative-ai-response 
    in tenancy where ALL 
    { request.principal.type='generativeaiapikey', request.principal.id='<your-api-key-OCID>'}
  4. Install the official OpenAI SDK

    Python

    pip install openai
    Note

    You use the official OpenAI SDK, not the OCI SDK, to invoke the Responses API. Also ensure you have the latest version of the OpenAI SDK installed.

    You can find support for other languages on OpenAI libraries page.

  5. Call the OCI Generative AI Responses API endpoint using the OpenAI SDK.

    Example:

    from openai import OpenAI
    
    client = OpenAI(
        base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1", # change the region if needed
        api_key="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", # replace with your Generative AI API Key created in Step 2
        project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx"  # replace with your Generative AI Project OCID created in Step 1
    )
    
    response = client.responses.create(
        model="xai.grok-4-1-fast-reasoning",
        input="What is 2x2?"
    )
    
    print(response.output_text) # should output a string like "2 x 2 = **4**."

    If "2 x 2 = **4**" prints out, then OCI Responses API is working.

Endpoints

API Base URL Authentication Endpoint path
Responses API https://inference.generativeai.${region}.oci.oraclecloud.com API key or IAM session /openai/v1/responses
Conversations API https://inference.generativeai.${region}.oci.oraclecloud.com API key or IAM session /openai/v1/conversations
Files API https://inference.generativeai.${region}.oci.oraclecloud.com API key or IAM session /openai/v1/files
Vector Store Files API https://inference.generativeai.${region}.oci.oraclecloud.com API key or IAM session /openai/v1/vector_stores/{id}/files
Vector Store Search https://inference.generativeai.${region}.oci.oraclecloud.com API key or IAM session /openai/v1/vector_stores/{id}/search
Containers API https://inference.generativeai.${region}.oci.oraclecloud.com API key or IAM session /openai/v1/containers
Project CRUD https://generativeai.${region}.oci.oraclecloud.com IAM session only /20231130/generativeAiProjects
API Key CRUD https://generativeai.${region}.oci.oraclecloud.com IAM session only /20231130/apikeys
Semantic Store CRUD https://generativeai.${region}.oci.oraclecloud.com IAM session only /20231130/semanticStores
Vector Store CRUD https://generativeai.${region}.oci.oraclecloud.com IAM session only /20231130/openai/v1/vector_stores

OCI IAM Authentication

While long-lived API key is convenient for quick testing, experimentation, and research, it might not be recommended for all production uses depending on your security requirement. Consult Security for your use case.

OCI Responses API natively supports OCI IAM Auth too, and this section guides you to use it instead of Generative AI API Key Auth.

  1. Install the oci-genai-auth library to get the OCI IAM Auth utilities
    pip install oci-genai-auth

    This library gives you a few helper classes to work with OpenAI SDK:

    • OciResourcePrincipalAuth
    • OciInstancePrincipalAuth
    • OciSessionAuth
    • OciUserPrincipalAuth
  2. Initialize the OpenAI client class with a custom HTTP Client.

    Make the following change to initialize the OpenAI client.

    Example with OciSessionAuth - using this when you run this code on on your laptop for local development
    
    
    from openai import OpenAI
    from oci_openai import OciSessionAuth
    import httpx
    
    client = OpenAI(
        base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1", # change the region if needed 
        api_key="not-used",
        project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx", # replace with your Generative AI Project OCID created in Step 1 
        http_client=httpx.Client(auth=OciSessionAuth(profile_name="DEFAULT")), # change "DEFAULT" to your profile name
    )
    
    response = client.responses.create(
        model="openai.gpt-oss-20b",
        input="What is 2x2?"
    )
    
    print(response.output_text)

    Here's an example with OciResourcePrincipal

    Use this when you run this code in managed OCI Services such as OCI Functions or OCI Container Engine for Kubernetes

    from openai import OpenAI
    from oci_openai import OciResourcePrincipalAuth
    import httpx
    
    client = OpenAI(
        base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1", # change the region if needed 
        api_key="not-used",
        project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx",  # replace with your Generative AI API Project OCID created in Step 1 
        http_client=httpx.Client(auth=OciResourcePrincipalAuth()),
    )

IAM Permissions

If Using IAM Auth

Add the following policy to call Responses API (CreateResponse, GetResponse, DeleteResponse, GetInputItems). Other APIs such as CreateConversation need other statements.

allow group <your-group-name> to manage generative-ai-response in tenancy

Or you can use the following policy that gives your user group access to all resources offered by Generative AI Service. This general policy is useful for sandbox environments when use Responses and other API such as Conversations API. It helps eliminates authorization issue as you test the various API. Later when going to production, you can narrow the policy.

allow any-user to manage generative-ai-family in tenancy

If Using GenAI API Key Auth

You need the following extra policy:

allow group <Your-Group> to manage generative-ai-response 
in tenancy where ALL { request.principal.type='generativeaiapikey', request.principal.id='<your-api-key-OCID>'}

Or you can use the following general policy:

allow any-user to manage generative-ai-family in tenancy 
where ALL { request.principal.type='generativeaiapikey', request.principal.id='<your-api-key-OCID>'}

Enabling Debug Log

Enabling debug log prints out the raw HTTP post request which contains the "opc-request-id". Reference this ID so we can help troubleshoot your errors.

from openai import OpenAI
import logging

logger = logging.getLogger("openai")
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

# Then create and use your OpenAI client as usual
client = OpenAI(
   ...
)

Calling Models

Model served by a 3rd-party model provider
response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    input="What is 2x2?"
)

response = client.responses.create(
    model="google.gemini-2.5-pro",
    input="What is 2x2?"
)
Models hosted in the on-demand mode
response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="What is 2x2?"
)
Models hosted in the dedicated mode, using your own dedicated AI cluster
response = client.responses.create(
    model="<dedicated-ai-cluster-endpoint-ocid>",
    input="What is 2x2?"
)

Streaming Responses

All streamed events
response_stream = client.responses.create(
    model="openai.gpt-oss-120b",
    input="What are the shapes of OCI GPUs",
    stream=True
)

for event in response_stream:
    print(event)
Only the delta text tokens
response_stream = client.responses.create(
    model="openai.gpt-oss-120b",
    input="What are the shapes of OCI GPUs",
    stream=True
)

for event in response_stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

Structured Output

from pydantic import BaseModel

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]


response = client.responses.parse(
    model="openai.gpt-oss-120b",
    input=[
        {"role": "system", "content": "Extract the event information."},
        {
            "role": "user",
            "content": "Alice and Bob are going to a science fair on Friday.",
        },
    ],
    store=False,
    text_format=CalendarEvent,
)

event = response.output_parsed
print(event)

Tracing The Calls

Responses output items

The returned response object includes an output field, which is an array. This array contains one or more items that capture what happened during the Responses API call.

The items in the output array can have different types, such as:

  • message
  • web_search_call
  • file_search_call
  • mcp_call
  • mcp_list_tools

These items provide visibility into what happened during the API call. You can use them for observability, to render execution details in your user interface, or both.

Integrating with observability providers

If you need end-to-end tracing and visibility into metrics such as cost and latency, you can integrate the OCI Responses API with an observability provider.

Many observability providers offer built-in support for the Responses API. The following example uses Langfuse to show how to send Responses API calls to Langfuse for tracing and monitoring.

Step 1: Install the Langfuse SDK

pip install langfuse

Step 2: Obtain Langfuse API keys and set environment variables

LANGFUSE_SECRET_KEY="sk-lf-xxxxxxxxx"
LANGFUSE_PUBLIC_KEY="pk-lf-xxxxxxxxx"
LANGFUSE_BASE_URL="https://us.cloud.langfuse.com"

# your other env variables...
OCI_GENAI_API_KEY=sk-xxxxxxxxx
OCI_GENAI_PROJECT_ID=ocid1.generativeaiproject.oc1.iad.xxxxxxxxx

Step 3: Import OpenAI from the Langfuse SDK

import os

from langfuse.openai import OpenAI # you import OpenAI from langfuse.openai

# Everything else below is your existing code (No changes required)
client = OpenAI(
    base_url="https://inference.generativeai.us-ashburn-1.oci.oraclecloud.com/openai/v1",
    api_key=os.getenv("OCI_GENAI_API_KEY"),
    project=os.getenv("OCI_GENAI_PROJECT_ID"),
)

# OpenAI method calls will now be instrumented by Langfuse automatically
response = client.responses.create(
    model="openai.gpt-oss-120b",
    tools=[
        {
            "type": "mcp",
            "server_label": "dmcp",
            "server_url": "https://mcp.deepwiki.com/mcp",
            "require_approval": "never",
        },
    ],
    input="Summarize the langfuse/langfuse-python repo in 3 sentences",
)

print(response.output_text)

Multimodal Inputs

Image Input as Base64-encoded data URL
import base64

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image("/path/to/image.png")

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "what's in this image?"},
                {
                    "type": "input_image",
                    "image_url": f"data:image/jpeg;base64,{base64_image}",
                    "detail": "high",
                },
            ],
        }
    ],
)

print(response.output_text)
Image Input as Internet accessible URL
response = client.responses.create(
    model="openai.gpt-oss-120b",
    store=False,
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "what's in this image?"},
                {
                    "type": "input_image",
                    "image_url": "https://picsum.photos/id/237/200/300",
                },
            ],
        }
    ],
)

print(response.output_text)
File Input as File ID
Note

File Id as input feature is only supported with Google Gemini models For each request, the combined size of all uploaded PDF files must be under 50 MB, and you can provide a maximum of 10 file IDs in the request.
file = client.files.create(
    file=open("/path/to/file.pdf", "rb"),
    purpose="user_data"
)

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_file",
                    "file_id": file.id,
                },
                {
                    "type": "input_text",
                    "text": "What's discussed in the file?",
                },
            ]
        }
    ]
)

print(response.output_text)
File Input as Internet accessible URL
response = client.responses.create(
    model="openai.gpt-oss-120b",
    store=False,
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "what is in this file?"},
                {
                    "type": "input_file",
                    "file_url": "https://www.example.com/letters/2025ltr.pdf",
                },
            ],
        }
    ],
)

print(response.output_text)

Reasoning

Reasoning effort
import json

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="What is the answer to 12 * (3 + 9)?",
    reasoning={"effort": "high"},
    store=False,
)

pretty_output = json.dumps(response.to_dict()["output"], indent=4)
print(pretty_output)
Reasoning summary output

If you're building chatbot, we strongly recommend that you to enable reasoning summary. With reasoning, the users can see reasoning tokens during streaming while the model is thinking.

import json

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="What is the answer to 12 * (3 + 9)?",
    reasoning={"summary": "auto"},
    store=False,
)

pretty_output = json.dumps(response.to_dict()["output"], indent=4)
print(pretty_output)

Multi-User-Turn Conversation

OCI Generative AI simplifies building multi-user-turn conversations by managing conversation state for you. We offer two two variants to achieve that: (1) Responses Chaining and (2) Conversations API.

Using Responses Chaining
# first turn
response1 = client.responses.create(
    model="openai.gpt-oss-120b",
    input="tell me a joke. keep it short",
)
print("Response 1: ", response1.output_text)


# second turn, chaining to the first turn
response2 = client.responses.create(
    model="openai.gpt-oss-120b",
    input="why is it funny?",
    previous_response_id=response1.id, # chaining to response1
)
print("Response 2: ", response2.output_text)
Using Conversations API
# create a conversation upfront
conversation = client.conversations.create(
    metadata={"topic": "demo"}
)
print("Conversation ID: ", conversation.id)

# first turn on the conversation
response1 = client.responses.create(
    model="openai.gpt-oss-120b",
    input="tell me a joke. keep it short",
    conversation=conversation.id,
)
print("Response 1: ", response1.output_text)

# second turn on the same conversation
response2 = client.responses.create(
    model="openai.gpt-oss-120b",
    input="why is it funny?",
    conversation=conversation.id,
)
print("Response 2: ", response2.output_text)