Model Deployment

Follow these steps to deploy models with AI Quick Actions.

Model Deployment Creation

You can create a Model Deployment from the foundation models with the tag Ready to Deploy in the Model Explorer, or with fine tuned models. When you create a Model Deployment in AI Quick Actions, you're creating an OCI Data Science Model Deployment, which is a managed resource in the OCI Data Science Service. You can deploy the model as HTTP endpoints in OCI.

You need to have the necessary policy to use Data Science Model Deployment. You can select the compute shape for the model deployment. You can set up logging to monitor the model deployment. Logging is optional but it's highly recommended to help troubleshoot errors with the model deployment. You need to have the necessary policy to enable logging, see Model Deployment Logs for more information on logs. Under advanced option, you can select the number of instances to deploy and the Load Balancer bandwidth.

See Model Deployment on GitHub for more information about, and tips on, deploying models.

Note

To access model deployments using private endpoints, create a notebook session with the network type set to custom networking. The custom egress must reside in the same VCN and subnet as the private endpoint resource.
  • You can deploy models using three options on the Deploy model page:

    • Deploy single model: Deploy one model on a compute shape.
    • Deploy multi model: Deploy multiple models on a single compute instance.
    • Deploy model stack: Deploy a base model and multiple fine-tuned variants as a stack on a single compute shape.

    This section describes each deployment option.

    1. Navigate to the Model Explorer.
    2. Select the model card for the model you want to deploy.
    3. Select Deploy to deploy the model. The Deploy model page is displayed with the three options. Follow the steps based on the model selected:
    4. Deploy Single Model:
      1. Give the deployment a name.
      2. Select a compute shape.
      3. (Optional) Select a log group.
      4. (Optional) Select a predict and access log.
      5. (Optional) Select a private endpoint.
        Note

        A private endpoint must be created as a prerequisite for the model deployment resource.

        The private endpoint feature for model deployment is only enabled in the OC1 realm. For other realms, create a service request for Data Science.

        The list to select and use a private endpoint in model deployment only appears in the Console if a private endpoint exists in the compartment.
      6. Select Show advanced options.
      7. Update the instance count, and update the Load Balancer bandwidth.
      8. (Optional) Under Inference container select an inference container.
      9. (Optional) Select Inference mode.
      10. Select Deploy.
    5. Deploy Multi Model: Only the vLLM container is supported. Service-managed models can be combined in one deployment.
      1. Enter a deployment name.
      2. Select the models to deploy.
      3. Select a compute shape suitable for the models.
      4. (Optional) Configure log groups, predict and access logs, or private endpoints.
      5. Select Show advanced options.
      6. Update the instance count, and update the Load Balancer bandwidth.
      7. (Optional) Under Inference container select an inference container.
      8. (Optional) Select Inference mode.
      9. Select Deploy.
    6. Deploy Model Stack: Only the vLLM container is supported.
      1. Enter a deployment name.
      2. Select the base model.
      3. Select the fine-tuned weights or variants.
      4. Select a compute shape.
      5. (Optional) Configure log groups, predict and access logs, or private endpoints.

        Logging is recommended for tracking and troubleshooting deployment operations.

      6. Select Show advanced options.
      7. Update the instance count, and update the Load Balancer bandwidth.
      8. (Optional) Under Inference container select an inference container.
      9. (Optional) Select Inference mode.
      10. Select Deploy.
    7. Under AI Quick Actions, select Deployments.
      The list of model deployments is shown. For the preceding deployment created, wait for Lifecycle state to become Active before clicking it to use it.
    8. Scroll to display the Inference Window.
    9. Enter text in Prompt to test the model.
    10. (Optional) Adjust the model parameters as appropriate.
    11. Select Generate.
      The output is displayed in Response.
  • For a complete list of parameters and values for AI Quick Actions CLI commands, see AI Quick Actions CLI.

  • This task can't be performed using the API.

Invoke Model Deployment in AI Quick Actions

You can invoke model deployment in AI Quick Actions from the CLI or Python SDK.

For more information, see the section on model deployment tips in GitHub.

Model Artifacts

Where to find model artifacts.

When a model is downloaded into a Model Deployment instance, it's downloaded in the /opt/ds/model/deployed_model/<object_storage_folder_name_and_path> folder.

Using Model Deployments in Autonomous Database Select AI

You can make model deployments created with AI Quick Actions available for natural language querying with Oracle Autonomous Database Select AI.

Prerequisites

  • Model deployment completed and Model Deployment OCID.
  • Autonomous Database instance with Select AI enabled. See Select AI with Autonomous Database.
  • Required Oracle Cloud Infrastructure (OCI) permissions to create credentials and profiles.
  1. In the Autonomous Database, create a credential for accessing the model deployment.
    BEGIN
        DBMS_CLOUD.create_credential(
            credential_name   => '<CREDENTIAL_NAME>',
            user_ocid         => '<USER_OCID>',
            tenancy_ocid      => '<TENANCY_OCID>',
            private_key       => '<PRIVATE_KEY>',
            fingerprint       => '<FINGERPRINT>'
        );
    END;
    /

    Replace each placeholder with specific values:

    • <CREDENTIAL_NAME>: Name for the credential
    • <USER_OCID>: OCI user OCID
    • <TENANCY_OCID>: OCI tenancy OCID
    • <PRIVATE_KEY>: API private key in PEM format
    • <FINGERPRINT>: API public key fingerprint

    See Managing Credentials for details.

  2. Create a Select AI profile to connect the autonomous database to your deployed model.

    BEGIN
        DBMS_CLOUD_AI.CREATE_PROFILE(
            profile_name => '<PROFILE_NAME>',
            attributes => '
    {
      "credential_name": "<CREDENTIAL_NAME>",
      "model": "<MODEL_NAME>",
      "provider": "openai",
      "provider_endpoint": "<MODEL_DEPLOYMENT_OCID>",
      "conversation": "",
      "object_list": [
        {"owner": "ADMIN", "name": "customers"}
      ]
    }'
        );
    END;
    /
    Replace the placeholders:
    • <PROFILE_NAME>: Name of the profile.
    • <CREDENTIAL_NAME>: Name of the credential from step 1.
    • <MODEL_NAME>: Name of the deployed model (for example, odsc_2025llm).
    • <MODEL_DEPLOYMENT_OCID>: Model Deployment OCID.
    • Update "object_list" to reflect the schema and table you want to expose.