Creating a Model Deployment

After you store a model in the model catalog, it can be deployed as an HTTP endpoint using the model deployments resource. Ensure that you have created the necessary policies, authentication, and authorization for your model deployments.

You can create a model deployment using these interfaces:

Using the Console

  1. Log into your tenancy using the Console with the necessary policies.
  2. Open the navigation menu and click Analytics & AI. Under Machine Learning, click Data Science.
  3. Select the compartment that contains the project you want to use.
  4. Click the name of a project.
  5. Click Create model deployment.
  6. Select the compartment to contain the model deployment.
  7. (Optional) Enter a unique name for the model (limit of 255 characters). If you don't provide a name, a name is automatically generated for you.

    For example:

    modeldeployment20200108222435
  8. (Optional) Enter a description for the model deployment.
  9. Select an active model to deploy from the model catalog by clicking Select.
    1. You find a model using the default model list or by clicking Using OCID and searching for the model by entering its OCID.
    2. Select the model.
    3. Click Submit.
    Important

    Model Artifacts exceeding 6GB are not supported for deployment. Please select a smaller model artifact for deployment.
  10. Select a Compute shape by clicking Select.
    1. Select one of the supported Compute shapes.
    2. Select the shape that best suits how you want to use the resource. For the AMD shape, you can use the default or set the number of OCPUs and memory.
    3. Click Submit.
  11. Select the number of instances for the model deployment to replicate the model on.
  12. (Optional) If you configured access or predict logging, you can:
    1. For access logs, select a compartment, log group, and log name.
    2. For predict logs, select a compartment, log group, and log name.
    3. Click Submit.
  13. (Optional) Click Show Advanced Options to set load balancing and tags.
    1. Select the load balancing bandwidth in Mbps or use the 10-Mbps default .

      Tips for load balancing:

      If you know the common payload size and the frequency of requests per second, you can use the following formula to estimate the bandwidth of the load balancer you need. We recommend that you add an extra 20% to allow for estimation errors and sporadic peak traffic.

      (Payload size in KB) * (Estimated requests per second) * 8 / 1024

      For example, if the payload is 1,024 KB and you estimate 120 requests per second, then the recommended load balancer bandwidth would be: (1024 * 120 * 8 / 1024) * 1.2 = 1152 Mbps.

      Remember that the maximum supported payload size is 10 MB particularly when dealing with image payloads.

      Important

      If the request payload size is more than the allocated bandwidth of the load balancer that was defined, then the request is rejected with a 429 status code.

    2. Add tags to easily locate and track the project by selecting a tag namespace, then entering the key and value. To add more than one tag, click +Additional Tags.

      Tagging describes the various tags that you can use organize and find projects including cost-tracking tags.

  14. Click Create.

Using the OCI Python SDK

We have developed an OCI Python SDK model deployment example that includes authentication.

Important

Artifacts exceeding 6GB are not supported for deployment. Please select a smaller model artifact for deployment.
Note

You must upgrade the OCI SDK (v2.33.0 or later) before creating a deployment with the Python SDK using this command:

pip install --upgrade oci

Using the OCI CLI

You can use the OCI CLI to create a model deployment as in this example.

  1. Deploy the model with:
    oci data-science model-deployment create \
    --compartment-id <MODEL_DEPLOYMENT_COMPARTMENT_OCID> \
    --model-deployment-configuration-details file://<MODEL_DEPLOYMENT_CONFIGURATION_FILE> \
    --project-id <PROJECT_OCID> \
    --category-log-details file://<OPTIONAL_LOGGING_CONFIGURATION_FILE> \
    --display-name <MODEL_DEPLOYMENT_NAME>
  2. Use this model deployment JSON configuration file:
    {
          "deploymentType": "SINGLE_MODEL",
          "modelConfigurationDetails": {
            "bandwidthMbps": <YOUR_BANDWIDTH_SELECTION>,
            "instanceConfiguration": {
              "instanceShapeName": "<YOUR_VM_SHAPE>"
            },
            "modelId": "<YOUR_MODEL_OCID>",
            "scalingPolicy": {
                "instanceCount": <YOUR_INSTANCE_COUNT>,
                "policyType": "FIXED_SIZE"
             }
         }
     }

    If you are specifying a flex shape, you must include the modelDeploymentInstanceShapeConfigDetails object as in this example:

    {
      "modelDeploymentConfigurationDetails": {
        "deploymentType": "SINGLE_MODEL",
        "modelConfigurationDetails": {
          "modelId": "ocid1.datasciencemodel.oc1.iad........",
          "instanceConfiguration": {
            "instanceShapeName": "VM.Standard.E4.Flex",
            "modelDeploymentInstanceShapeConfigDetails": {
              "ocpus": 1,
              "memoryInGBs": 16
            }
          },
          "scalingPolicy": {
            "policyType": "FIXED_SIZE",
            "instanceCount": 1
          },
          "bandwidthMbps": 10
        },
        "streamConfigurationDetails": {
          "inputStreamIds": null,
          "outputStreamIds": null
        }
      }
    }
  3. (Optional) Use this logging JSON configuration file:
    {
        "access": {
          "logGroupId": "<YOUR_LOG_GROUP_OCID>",
          "logId": "<YOUR_LOG_OCID>"
        },
        "predict": {
          "logGroupId": "<YOUR_LOG_GROUP_OCID>",
          "logId": "<YOUR_LOG_OCID>"
        }
    }