Managing Private Endpoints

Private endpoints provide a secure private access to OCI Generative AI models within a virtual cloud network (VCN). You can create private endpoints for on-demand models and for pretrained and custom models hosted on dedicated AI clusters.

About

A private endpoint is a private IP address in a VCN that provides private access to an OCI service. For Generative AI, private endpoints let you access large language models from within a VCN. The service creates the private endpoint in a private subnet that you select and maintains its availability. You control access using routing, security lists, and network security groups (NSGs).

Learn about About Private Endpoints and security rules.

When you create a private endpoint in OCI Generative AI, you receive a fully qualified domain name (FQDN) for it, regardless of whether you select Allow Usage In On-Demand Mode. Use the private endpoint to:

  • Attach it to an endpoint on a dedicated AI cluster to access the cluster (and its hosted models) through the private endpoint.
  • Access on-demand models if you enable Allow Usage In On-Demand Mode.

Regions

Private endpoints are supported for all models listed on the Generative AI Models by Region page in the commercial (OC1), government (OC4), and sovereign (OC19) regions where the models are available.

Access

To access a model through a private endpoint, run a client from a network that has private connectivity to the endpoint subnet, and call the model using the private endpoint FQDN.

Common access paths include:

  • Same VCN: from any subnet in the VCN (subject to routing, Network Security Group (NSG) and security lists).
  • Peered VCNs: through local peering gateway (LPG) or dynamic routing gateway (DRG-based) connectivity (hub-and-spoke).
  • On-premises/other private networks: through internet protocol security (IPSec) VPN or FastConnect to a DRG.
  • Administration: use OCI Bastion to reach a private host in the VCN, then call the endpoint from there.
Note

Ensure that the private endpoint FQDN resolves to the private IP.

Limits

By default, a tenancy can have up to 5 private endpoints. To create more, For more request a service limit increase for private-endpoint-count in the Generative AI service.