Custom Scaling Metric Type to Configure Autoscaling

Use the custom metric type option to configure autoscaling.

Use the custom scaling metric option to use any of the Model Deployment Metrics emitted by the model deployment resource to create an MQL query, which can then be used to configure autoscaling. This approach lets you create more sophisticated queries, such as joining several queries using AND, and OR, using different aggregation functions, and incorporating an evaluation window of choice. By using this option, you gain greater control over the scaling conditions, enabling a more tailored and precise setup.

When formulating an MQL query, include {resourceId = "MODEL_DEPLOYMENT_OCID"} in the query as shown in the examples provided. During the processing of the request, the service replaces the placeholder MODEL_DEPLOYMENT_OCID keyword with the actual resource OCID. This lets the service retrieve the exact set of metrics associated with the resource.

Testing Custom Metric MQL Queries

Follow these steps to test and complete the queries.

  1. Follow the steps in Viewing Model Deployment Metrics to view metrics.
  2. Click the metric chart for the metric you want to use.
  3. Click Options.
  4. Navigate to View Query in MQL Explorer.
  5. Click Edit Queries.
  6. Select for Advanced Mode.
  7. In the Query code editor, update and test the query for scaling-out and scaling-in operations.
    Use these tested queries to create model deployments with autoscaling capabilities.

Example Queries

The following are sample queries for metrics you can use to enable autoscaling.
Note

These queries are provided for reference and can be customized based on the specific use case. However, these queries can also be used without modification.
Sample Queries for Model Deployment Metrics
Metric Query Explanation
PredictRequestCount
Scale out
PredictRequestCount[1m]{resourceId
 = "MODEL_DEPLOYMENT_OCID"}.grouping().sum() > 100
Scale in
PredictRequestCount[1m]{resourceId
 = "MODEL_DEPLOYMENT_OCID"}.grouping().sum() < 5
If no predict calls are made, then no metrics are emitted. In such cases, it becomes necessary to incorporate the absent() function into the alarm query. The following is an example query for scenarios where minimal or no predict calls are made:
PredictRequestCount[1m]{resourceId
 = "MODEL_DEPLOYMENT_OCID"}.grouping().absent() == 1 
|| 
PredictRequestCount[1m]{resourceId
 = "MODEL_DEPLOYMENT_OCID"}.grouping().sum() < 2

Use the provided metric and queries for scaling in response to predict request volume.

If the total count of prediction requests to the specific model deployment exceeds 100 within a one-minute time window and this condition persists for the specified pending duration time, it triggers a scale-out operation.

Similarly, if the cumulative count is less than 5, or if there are no requests at all, and this situation continues for the pending duration time, the condition begins a scale-in operation.

PredictLatency
Scale out
PredictLatency[1m]{resourceId
 = "MODEL_DEPLOYMENT_OCID"}.groupBy(result).percentile(.99) > 120
Scale in
PredictLatency[1m]{resourceId
 = "MODEL_DEPLOYMENT_OCID"}.groupBy(result).percentile(.99) < 20

Apply this metric and queries to help scaling based on predict request latencies.

The query evaluates the 99th percentile of PredictLatency for a specific model deployment over a 1-minute period. If this 99th percentile latency value exceeds 120 milliseconds and persists for the pending duration time, the condition is met, triggering a scale-out operation.

Conversely, if the 99th percentile is less than 20 milliseconds for the pending duration time, a scale-in operation is started.

PredictResponse - Success Rate
Scale out
(PredictResponse[1m]{resourceId = "MODEL_DEPLOYMENT_OCID",
 result = "Success"}.grouping().mean() * 100) 
/ PredictResponse[1m]{resourceId
 = "MODEL_DEPLOYMENT_OCID"}.grouping().mean() < 95
Scale in
(PredictResponse[1m]{resourceId = "MODEL_DEPLOYMENT_OCID",
 result = "Success"}.grouping().mean() * 100) 
/ PredictResponse[1m]{resourceId
 = "MODEL_DEPLOYMENT_OCID"}.grouping().mean() > 95

Use this metric and queries to implement scaling based on predict response success rate.

The MQL query evaluates the percentage of successful PredictResponses compared to all PredictResponses within a 1-minute interval for a specific model deployment.

If this percentage is less than 95 and persists for the pending duration time, the condition triggers a scale-out operation. Conversely, if the percentage is more than 95 for the pending duration time, the condition starts a scale-in operation.

Creating a Model Deployment with Autoscaling Using a Custom Metric

Learn how to create a model deployment with an autoscaling policy using a custom metric.

    1. Use the Console to sign in to a tenancy with the necessary policies.
    2. Open the navigation menu and click Analytics & AI. Under Machine Learning, click Data Science.
    3. Select the compartment that contains the project with the model deployments.

      All projects in the compartment are listed.

    4. Click the name of the project.

      The project details page opens and lists the notebook sessions.

    5. Under Resources, click Model deployments.

      A tabular list of model deployments in the project is displayed.

    6. Click Create Model Deployment.
    7. Follow the steps in Creating a Model Deployment to configure the model deployment.
    8. Under Autoscaling configuration, select Enable autoscaling.
      Several lists and fields are displayed to let you configure the autoscaling.
    9. Select Custom from the Scaling metric type list.
    10. Populate Scale-in custom metric query and Scale-out custom metric query with the MQL queries.
      Important

      Include
      {resourceId = "MODEL_DEPLOYMENT_OCID"}
      in each query. The actual resource OCID is used instead of "MODEL_DEPLOYMENT_OCID" when the query is run.
    11. Click Create.
  • Use the oci data-science model-deployment create command and required parameters to create a model deployment:

    oci data-science model-deployment create --required-param-name variable-name ... [OPTIONS]
    For example, deploy a model:
    oci data-science model-deployment create \
    --compartment-id <MODEL_DEPLOYMENT_COMPARTMENT_OCID> \
    --model-deployment-configuration-details file://<MODEL_DEPLOYMENT_CONFIGURATION_FILE> \
    --project-id <PROJECT_OCID> \
    --display-name <MODEL_DEPLOYMENT_NAME>
    Use this model deployment JSON configuration file:
    {
      "deploymentType": "SINGLE_MODEL",
      "modelConfigurationDetails": {
        "modelId": "ocid1.datasciencemodel.oc1.iad.amaaaaaav66vvnias2wuzfkwmkkmxficse3pty453vs3xtwlmwvsyrndlx2q",
        "instanceConfiguration": {
          "instanceShapeName": "VM.Standard.E4.Flex",
          "modelDeploymentInstanceShapeConfigDetails": {
            "ocpus": 1,
            "memoryInGBs": 16
          }
        },
        "scalingPolicy": {
          "policyType": "AUTOSCALING",
          "coolDownInSeconds": 650,
          "isEnabled": true,
          "autoScalingPolicies": [
            {
              "autoScalingPolicyType": "THRESHOLD",
              "initialInstanceCount": 1,
              "maximumInstanceCount": 2,
              "minimumInstanceCount": 1,
              "rules": [
                {
                  "metricExpressionRuleType": "CUSTOM_EXPRESSION",
                  "scaleInConfiguration": {
                    "scalingConfigurationType": "QUERY",
                    "pendingDuration": "PT5M",
                    "instanceCountAdjustment": 1,
                    "query": "MemoryUtilization[1m]{resourceId = 'MODEL_DEPLOYMENT_OCID'}.grouping().mean() < 10"
                  },
                  "scaleOutConfiguration": {
                    "scalingConfigurationType": "QUERY",
                    "pendingDuration": "PT3M",
                    "instanceCountAdjustment": 1,
                    "query": "MemoryUtilization[1m]{resourceId = 'MODEL_DEPLOYMENT_OCID'}.grouping().mean() > 65"
                  }
                }
              ]
            }
          ]
        },
        "bandwidthMbps": 10,
        "maximumBandwidthMbps": 20
      }
    }

    For a complete list of parameters and values for CLI commands, see the CLI Command Reference.

  • Use the CreateModelDeployment operation to create a model deployment using the custom scaling metric type.