Creating a Model Deployment

After you store a Data Science model in the model catalog, it can be deployed as an HTTP endpoint as a model deployment.

Considerations

Consider using a custom container when creating a model deployment.

You can create and run model deployments with these networking options:

  • Default Networking: Service-managed networking appears as default networking in two options: without internet access and with internet access.
    • Default networking without internet access provides connectivity to other OCI services
    • Default networking with internet access provides connectivity to both other OCI services and the internet.
  • Custom Networking lets you configure networking in your tenancy, giving you full control over the VCN, subnets, routing, and access policies.

The instructions on this page cover all networking options.

    1. On the Projects list page, select the project that contains the model deployments that you want to work with. If you need help finding the list page or the project, see Listing Projects.
    2. On the project details page, select Model deployments.
    3. Select Create model deployment.
    4. On the Create model deployment page, enter the following information.
      • Compartment
      • Name (Optional): Enter a unique name for the model deployment (limit of 255 characters). If you don't provide a name, a name is automatically generated. Example: modeldeployment20200108222435
      • Description (Optional): Enter a description (limit of 400 characters) for the model deployment.
      • Custom environment variable key (Optional): Enter a custom environment variable key.
      • Value (Optional): Enter the value for the key.
      • Models: Select Select to open the Select models panel, select the relevant option, and then select Submit to close the panel.
        Important

        Model artifacts that exceed 400 GB aren't supported for deployment. Select a smaller model artifact for deployment.
        • Single Model: Find the model with either Select a model compartment (specify compartment and project) or Using OCID, then select the model from the list.
        • Model Groups: Specify compartment and project, then select the model group from the list.
      • Change the Compute shape by selecting Change shape. Then, follow these steps in the Select compute shape panel.
        • Select an instance type.
        • Select a shape series.
        • Select one of the supported Compute shapes in the series. Select the shape that best suits how you want to use the resource.
        • Expand the selected shape to configure OCPUs and memory.
          • Number of OCPUs
          • Amount of memory (GB): For each OCPU, select up to 64 GB of memory and a maximum total of 512 GB. The minimum amount of memory allowed is either 1 GB or a value matching the number of OCPUs, whichever is greater.
          • Enable Burstable Shape: Select if using burstable VMs, and then for Baseline utilization per OCPU, select the percentage of OCPUs that you usually want to use. The supported values are 12.5% and 50%. (For model deployments, only the value of 50% is supported.)
        • Select Select shape.
      • Number of instances: Enter the number of instances for the model deployment to replicate the model on.
      • Autoscaling configuration (Optional): Select Enable autoscaling and enter the following information.
        • Minimum number of instances
        • Maximum number of instances
        • Cooldown period in seconds
        • Scaling metric type

          To use the custom scaling metric option, select Custom, and then specify the scale-in and scale-out queries.

          Important

          Include the following text in each MQL query to reference the resource OCID: {resourceId = "MODEL_DEPLOYMENT_OCID"}
        • Scale-in threshold in percentage
        • Scale-out threshold in percentage
        • Advanced options (Optional): Autoscale the load balancer. Set the value of maximum bandwidth to be more than the minimum bandwidth value, and no more than twice the minimum bandwidth value.
          • Scale-in instance count step
          • Scale-out instance count step
      • Networking resources: Select the relevant option.
        • Default Networking: Restricts traffic to Oracle services only. The system uses the existing service-managed network. The workload is attached by using a secondary VNIC to a preconfigured, service-managed VCN, and subnet. This provided subnet lets egress to the public internet through a NAT gateway, and access to other Oracle Cloud services through a service gateway.

          If you need access only to the public internet and OCI services, we recommend using this option. It doesn't require you to create networking resources or write policies for networking permissions.

        • Default networking with internet: Allows outbound internet access through the Data Science NAT gateway.
          Note

          You can't use Default networking with internet in disconnected realms and Oracle development tenancies. If your tenancy or compartment has a Data Science security zone policy that denies public network access (for example, deny model_deploy_public_network—see Data Science security zone policy), the service-managed public internet access option is disabled. If you try to use this option, you receive a 404 NotAuthorizedOrNotFound error.
        • Custom Networking: Select the VCN and subnet (by compartment) that you want to use.

          For egress access to the public internet, use a private subnet with a route to a NAT gateway.

          Note

          • Custom networking must be used to use a file storage mount.
          • Switching from custom networking to managed networking isn't supported after creation.
          • If you see the banner The specified subnet is not accessible. Select a different subnet., then create a policy that allows Data Science to use custom networking. See Policies.
      • Endpoint type: Select the relevant option.
        • Public endpoint: Enables data access to a managed instance from outside a virtual cloud network (VCN).
        • Private endpoint: Specifies the private endpoint to use for the model deployment. Deployments that use private networking or private endpoints can't enable service-managed public internet access.
          • Private endpoint compartment
          • Private endpoint
      • Logging (Optional): Select Select to open the Select logging panel, enter the following information, and then select Submit to close the panel.
        Note

        Logging requires configured access or predict logging.
        • For access logs, select a compartment, log group, and log name.
        • For predict logs, select a compartment, log group, and log name.
      • Set your BYOC environment (under Use a Custom Container Image) (Optional): Select Select to open the Set your BYOC environment panel, enter the following information, and select Select again to close the panel.
        • Repository compartment
        • Repository: Enter the repository that contains the custom image.
        • Image: Enter the custom image to use at runtime.
        • Digest: Enter the image digest. For example: sha256:<digest>. The digest must match the exact image you're deploying.
        • Entrypoint: Enter one or more entry point files to run when the container starts, such as /opt/script/entrypoint.sh. Don't use quotation marks at the end.
        • Server port: Enter the port for the inference web server. The default is 8080. Valid range: 1024–65535, except 24224, 8446, 8447.
        • Health check port: Enter the port for the container health check. The default is the server port. Valid range: 1024–65535, except 24224, 8446, 8447.
      • Deployment mode (under Advanced options) (Optional): Select the load balancing bandwidth in Mbps or use the 10 Mbps default.

        Tips for load balancing:

        If you know the common payload size and the frequency of requests per second, you can use the following formula to estimate the bandwidth of the load balancer that you need. We recommend that you add an extra 20% to account for estimation errors and sporadic peak traffic.

        (Payload size in KB) * (Estimated requests per second) * 8 / 1024

        For example, if the payload is 1,024 KB and you estimate 120 requests per second, then the recommended load balancer bandwidth would be (1024 * 120 * 8 / 1024) * 1.2 = 1152 Mbps.

        Remember that the maximum supported payload size is 10 MB when dealing with image payloads.

        If the request payload size is more than the allocated bandwidth of the load balancer that was defined, then the request is rejected with a 429 status code.

      • Tags (under Advanced options) (Optional): Add tags to the model deployment. If you have permissions to create a resource, then you also have permissions to apply free-form tags to that resource. To apply a defined tag, you must have permissions to use the tag namespace. For more information about tagging, see Resource Tags. If you're not sure whether to apply tags, skip this option or ask an administrator. You can apply tags later.
    5. Select Create.
  • Use the oci data-science model-deployment create command and required parameters to create a model deployment:

    oci data-science model-deployment create --compartment-id <compartment-id> ... [OPTIONS]

    For a complete list of parameters and values for CLI commands, see the CLI Command Reference.

  • Use the CreateModelDeployment operation to create a model deployment.