Creating a Model Deployment with Autoscaling Using a Custom Metric

Learn how to create a model deployment with an autoscaling policy using a custom metric.

1. On the Projects list page, select the project that you want to add the model deployment to. If you need help finding the list page or the project, see Listing Projects.
2. On the project details page, select Model deployments.
3. Select Create model deployment.
4. On the Create model deployment page, enter the following information.
  
  Autoscaling configuration (Optional): Select Enable autoscaling and enter the following information.
  
  Minimum number of instances
  
  Maximum number of instances
  
  Cooldown period in seconds
  
  Scaling metric type
  To use the custom scaling metric option, select Custom, and then specify the scale-in and scale-out queries.
  Important
  
  Include the following text in each MQL query to reference the resource OCID: {resourceId = "MODEL_DEPLOYMENT_OCID"}
  
  Scale-in threshold in percentage
  
  Scale-out threshold in percentage
  
  Advanced options (Optional): Autoscale the load balancer. Set the value of maximum bandwidth to be more than the minimum bandwidth value, and no more than twice the minimum bandwidth value.
  
  Scale-in instance count step
  
  Scale-out instance count step
  
  Update other fields as needed. For field descriptions, see Creating a Model Deployment.
5. Select Create.

Use the oci data-science model-deployment create command and required parameters to create a model deployment:

oci data-science model-deployment create --required-param-name variable-name ... [OPTIONS]

For example, deploy a model:

oci data-science model-deployment create \
--compartment-id <MODEL_DEPLOYMENT_COMPARTMENT_OCID> \
--model-deployment-configuration-details file://<MODEL_DEPLOYMENT_CONFIGURATION_FILE> \
--project-id <PROJECT_OCID> \
--display-name <MODEL_DEPLOYMENT_NAME>

Use this model deployment JSON configuration file:

{
  "deploymentType": "SINGLE_MODEL",
  "modelConfigurationDetails": {
    "modelId": "ocid1.datasciencemodel.oc1.iad.amaaaaaav66vvnias2wuzfkwmkkmxficse3pty453vs3xtwlmwvsyrndlx2q",
    "instanceConfiguration": {
      "instanceShapeName": "VM.Standard.E4.Flex",
      "modelDeploymentInstanceShapeConfigDetails": {
        "ocpus": 1,
        "memoryInGBs": 16
      }
    },
    "scalingPolicy": {
      "policyType": "AUTOSCALING",
      "coolDownInSeconds": 650,
      "isEnabled": true,
      "autoScalingPolicies": [
        {
          "autoScalingPolicyType": "THRESHOLD",
          "initialInstanceCount": 1,
          "maximumInstanceCount": 2,
          "minimumInstanceCount": 1,
          "rules": [
            {
              "metricExpressionRuleType": "CUSTOM_EXPRESSION",
              "scaleInConfiguration": {
                "scalingConfigurationType": "QUERY",
                "pendingDuration": "PT5M",
                "instanceCountAdjustment": 1,
                "query": "MemoryUtilization[1m]{resourceId = 'MODEL_DEPLOYMENT_OCID'}.grouping().mean() < 10"
              },
              "scaleOutConfiguration": {
                "scalingConfigurationType": "QUERY",
                "pendingDuration": "PT3M",
                "instanceCountAdjustment": 1,
                "query": "MemoryUtilization[1m]{resourceId = 'MODEL_DEPLOYMENT_OCID'}.grouping().mean() > 65"
              }
            }
          ]
        }
      ]
    },
    "bandwidthMbps": 10,
    "maximumBandwidthMbps": 20
  }
}

For a complete list of parameters and values for CLI commands, see the CLI Command Reference.

Use the CreateModelDeployment operation to create a model deployment using the custom scaling metric type.

Oracle Cloud Infrastructure Documentation

Creating a Model Deployment with Autoscaling Using a Custom Metric