QuickStart Guide for Building Agents in OCI Generative AI
Here are some high-level tasks to get you started with building agents in OCI Generative AI.
Call OCI Responses API with OpenAI SDK
Here are some high-level steps to get you started with agents in OCI Generative AI.
Endpoints
| API | Base URL | Authentication | Endpoint path |
|---|---|---|---|
| Responses API | https://inference.generativeai.${region}.oci.oraclecloud.com |
API key or IAM session | /openai/v1/responses |
| Conversations API | https://inference.generativeai.${region}.oci.oraclecloud.com |
API key or IAM session | /openai/v1/conversations |
| Files API | https://inference.generativeai.${region}.oci.oraclecloud.com |
API key or IAM session | /openai/v1/files |
| Vector Store Files API | https://inference.generativeai.${region}.oci.oraclecloud.com |
API key or IAM session | /openai/v1/vector_stores/{id}/files |
| Vector Store Search | https://inference.generativeai.${region}.oci.oraclecloud.com |
API key or IAM session | /openai/v1/vector_stores/{id}/search |
| Containers API | https://inference.generativeai.${region}.oci.oraclecloud.com |
API key or IAM session | /openai/v1/containers |
| Project CRUD | https://generativeai.${region}.oci.oraclecloud.com |
IAM session only | /20231130/generativeAiProjects |
| API Key CRUD | https://generativeai.${region}.oci.oraclecloud.com |
IAM session only | /20231130/apikeys |
| Semantic Store CRUD | https://generativeai.${region}.oci.oraclecloud.com |
IAM session only | /20231130/semanticStores |
| Vector Store CRUD | https://generativeai.${region}.oci.oraclecloud.com |
IAM session only | /20231130/openai/v1/vector_stores |
OCI IAM Authentication
While long-lived API key is convenient for quick testing, experimentation, and research, it might not be recommended for all production uses depending on your security requirement. Consult Security for your use case.
OCI Responses API natively supports OCI IAM Auth too, and this section guides you to use it instead of Generative AI API Key Auth.
IAM Permissions
If Using IAM Auth
Add the following policy to call Responses API (CreateResponse, GetResponse, DeleteResponse, GetInputItems). Other APIs such as CreateConversation need other statements.
allow group <your-group-name> to manage generative-ai-response in tenancy
Or you can use the following policy that gives your user group access to all resources offered by Generative AI Service. This general policy is useful for sandbox environments when use Responses and other API such as Conversations API. It helps eliminates authorization issue as you test the various API. Later when going to production, you can narrow the policy.
allow any-user to manage generative-ai-family in tenancy
If Using GenAI API Key Auth
You need the following extra policy:
allow group <Your-Group> to manage generative-ai-response
in tenancy where ALL { request.principal.type='generativeaiapikey', request.principal.id='<your-api-key-OCID>'}Or you can use the following general policy:
allow any-user to manage generative-ai-family in tenancy
where ALL { request.principal.type='generativeaiapikey', request.principal.id='<your-api-key-OCID>'}Enabling Debug Log
Enabling debug log prints out the raw HTTP post request which contains the "opc-request-id". Reference this ID so we can help troubleshoot your errors.
from openai import OpenAI
import logging
logger = logging.getLogger("openai")
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())
# Then create and use your OpenAI client as usual
client = OpenAI(
...
)
Calling Models
- Model served by a 3rd-party model provider
-
response = client.responses.create( model="xai.grok-4-1-fast-reasoning", input="What is 2x2?" ) response = client.responses.create( model="google.gemini-2.5-pro", input="What is 2x2?" ) - Models hosted in the on-demand mode
-
response = client.responses.create( model="openai.gpt-oss-120b", input="What is 2x2?" ) - Models hosted in the dedicated mode, using your own dedicated AI cluster
-
response = client.responses.create( model="<dedicated-ai-cluster-endpoint-ocid>", input="What is 2x2?" )
Streaming Responses
- All streamed events
-
response_stream = client.responses.create( model="openai.gpt-oss-120b", input="What are the shapes of OCI GPUs", stream=True ) for event in response_stream: print(event) - Only the delta text tokens
-
response_stream = client.responses.create( model="openai.gpt-oss-120b", input="What are the shapes of OCI GPUs", stream=True ) for event in response_stream: if event.type == "response.output_text.delta": print(event.delta, end="", flush=True)
Structured Output
from pydantic import BaseModel
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
response = client.responses.parse(
model="openai.gpt-oss-120b",
input=[
{"role": "system", "content": "Extract the event information."},
{
"role": "user",
"content": "Alice and Bob are going to a science fair on Friday.",
},
],
store=False,
text_format=CalendarEvent,
)
event = response.output_parsed
print(event)
Tracing The Calls
- Responses output items
-
The returned response object includes an
outputfield, which is an array. This array contains one or more items that capture what happened during the Responses API call.The items in the
outputarray can have different types, such as:messageweb_search_callfile_search_callmcp_callmcp_list_tools
These items provide visibility into what happened during the API call. You can use them for observability, to render execution details in your user interface, or both.
- Integrating with observability providers
Multimodal Inputs
- Image Input as Base64-encoded data URL
-
import base64 def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode("utf-8") base64_image = encode_image("/path/to/image.png") response = client.responses.create( model="openai.gpt-oss-120b", input=[ { "role": "user", "content": [ {"type": "input_text", "text": "what's in this image?"}, { "type": "input_image", "image_url": f"data:image/jpeg;base64,{base64_image}", "detail": "high", }, ], } ], ) print(response.output_text) - Image Input as Internet accessible URL
-
response = client.responses.create( model="openai.gpt-oss-120b", store=False, input=[ { "role": "user", "content": [ {"type": "input_text", "text": "what's in this image?"}, { "type": "input_image", "image_url": "https://picsum.photos/id/237/200/300", }, ], } ], ) print(response.output_text) - File Input as File ID
-
Note
File Id as input feature is only supported with Google Gemini models For each request, the combined size of all uploaded PDF files must be under 50 MB, and you can provide a maximum of 10 file IDs in the request.file = client.files.create( file=open("/path/to/file.pdf", "rb"), purpose="user_data" ) response = client.responses.create( model="openai.gpt-oss-120b", input=[ { "role": "user", "content": [ { "type": "input_file", "file_id": file.id, }, { "type": "input_text", "text": "What's discussed in the file?", }, ] } ] ) print(response.output_text) - File Input as Internet accessible URL
-
response = client.responses.create( model="openai.gpt-oss-120b", store=False, input=[ { "role": "user", "content": [ {"type": "input_text", "text": "what is in this file?"}, { "type": "input_file", "file_url": "https://www.example.com/letters/2025ltr.pdf", }, ], } ], ) print(response.output_text)
Reasoning
- Reasoning effort
-
import json response = client.responses.create( model="openai.gpt-oss-120b", input="What is the answer to 12 * (3 + 9)?", reasoning={"effort": "high"}, store=False, ) pretty_output = json.dumps(response.to_dict()["output"], indent=4) print(pretty_output) - Reasoning summary output
-
If you're building chatbot, we strongly recommend that you to enable reasoning summary. With reasoning, the users can see reasoning tokens during streaming while the model is thinking.
import json response = client.responses.create( model="openai.gpt-oss-120b", input="What is the answer to 12 * (3 + 9)?", reasoning={"summary": "auto"}, store=False, ) pretty_output = json.dumps(response.to_dict()["output"], indent=4) print(pretty_output)
Multi-User-Turn Conversation
OCI Generative AI simplifies building multi-user-turn conversations by managing conversation state for you. We offer two two variants to achieve that: (1) Responses Chaining and (2) Conversations API.
- Using Responses Chaining
-
# first turn response1 = client.responses.create( model="openai.gpt-oss-120b", input="tell me a joke. keep it short", ) print("Response 1: ", response1.output_text) # second turn, chaining to the first turn response2 = client.responses.create( model="openai.gpt-oss-120b", input="why is it funny?", previous_response_id=response1.id, # chaining to response1 ) print("Response 2: ", response2.output_text) - Using Conversations API
-
# create a conversation upfront conversation = client.conversations.create( metadata={"topic": "demo"} ) print("Conversation ID: ", conversation.id) # first turn on the conversation response1 = client.responses.create( model="openai.gpt-oss-120b", input="tell me a joke. keep it short", conversation=conversation.id, ) print("Response 1: ", response1.output_text) # second turn on the same conversation response2 = client.responses.create( model="openai.gpt-oss-120b", input="why is it funny?", conversation=conversation.id, ) print("Response 2: ", response2.output_text)