Implement ML Use Cases as ML Application Packages

Providers developing a new ML Application Implementation need to create a new application package corresponding to the implementation.

The Applications Package lets the standard packaging of ML functionality in a way that's environment-independent and region-independent. This makes it a portable solution that can be used in any tenancy, region, or environment. Infrastructure dependencies (for example, VCN and Log OCIDs) that are specific to a region or environment are provided as Package arguments during the upload process.

Packages contain all the implementation details for an ML Application, such as Terraform for example components and application components, a descriptor containing implementation version information, a configuration schema, and more. These packages can be uploaded or deployed to existing ML Application Implementation resources, and when a new version of a package is uploaded, the ML Applications service automatically creates a new ML Application Implementation Version and starts an upgrade of all ML Application Instances that use the package.

The package contains components implementing the ML use case. Two types of component exist:
Application components
These are the resources that need to be created per ML Application Implementation, provisioning new ML Application Implementations involves creating corresponding application components. Application components are common for all instances of the ML Application Implementation and aren't created or re-created when new ML Application Instances are provisioned.
Instance components
These are the resources that need to be created per ML Application Instance. Provisioning new ML Application Instances involves creating corresponding instance components. Instance components are different for all instances of the ML Application Implementation.

The Terraform configuration for all application components is present inside the application_components directory in the application package. Similarly, the Terraform for all instance components are present in the instance_components directory.

To make the distinction between application components and instance components more clear, consider that the providers want to develop a solution (ML Application Implementation) for some ML Applications use case, which involves the following parts:
Training and deploying a model
Providers write a machine learning algorithm that trains an ML model based on some training data. Providers use Jobs for training the model, storing it in the Model Catalog, and then deploying the model.
Data to be used for training
The model is trained on the customer (consumer) data which resides in an Object Storage bucket in the consumer tenancy. The ML job loads data from the consumer OS bucket into the provider Object Storage bucket.

In this example, the ML job is an application component and the Terraform configuration for creating the ML job is part of the application_components directory in the application package. As the actual training happens on consumer data, the training is triggered when a new ML Application Instance of the ML Application Implementation is provisioned. When a new ML Application Instance is created, a new job run is created and triggered, which loads data from the consumer Object Storage bucket into the provider Object Storage bucket, trains the model, stores the model in the model catalog, and then deploys the model. A new job run needs to be created for every instance (customer). The job run is an instance component. Also, the target Object Storage bucket is created for each instance, so it's an instance component. Similarly, the model deployment is also an instance component.

The Terraform configuration for both application components and instance components could be parameterized. All such parameters required for the provisioning of ML Applications and ML Application Instance can be specified in the descriptor.yaml file . For example, the docker image to be used with the job run could be parameterized. The Data Science project under which the job must be created could be parameterized. All such parameters that belong to the application components and are required when provisioning new implementations could be specified under packageArguments in the descriptor.yaml file. In general, packageArguments can be used for providing environment-specific values such as infrastructure OCIDs and some environment-specific scaling values.

Similarly, the name of the source OS bucket (from the consumer tenancy) is needed when creating an ML Application Instance and could be different from instance to instance (consumer to consumer). So this might be a parameter whose value is provided by the consumer during ML Application Instance creation. All such parameters can be defined under configurationSchema in the descriptor.yaml file.

Thus the final structure of an Application package directory looks similar to this:

  • <ml-app-package-name>-<version>.zip
    • application_components: the directory with all application component definitions.
    • instance_components: the directory with all instance component definitions.
    • descriptor.yaml: the package descriptor file.
    • *.trigger.yaml: the trigger definition file.

Some important notes on the Application package structure:

  • Both Application components and Instance components must be defined in the corresponding directories.
  • The application_components and instance_components directories are optional. An Application package without an application_components or instance_components directory is valid.
  • The directories must be named exactly (lowercase) as application_components and instance_components.
  • Components whose Terraform config isn't present under the application_components directory aren't considered application components.
  • Components whose Terraform config isn't present under the instance_components directory aren't considered instance components.
  • At the moment, not all OCI resources are supported as application or instance components.
  • A Data Science Job is the supported application component, while Data Science model, model deployment, job run, Object Storage bucket, and object are the supported instance components.

The next section describes the schema of the package descriptor file in more detail.

ML Applications Building Blocks

ML Applications are built by using other OCI resources. The following table lists allowed resource types:
Allowed Resource Types
Component Type Allowed OCI Resources Notes
Application components Data Science
  • Job
  • Pipeline
  • Model
Data Flow
  • Data Flow Application

Multitenant components are shared across all ML Application Instances within an implementation.

Data Science:

  • Jobs and Pipelines are commonly used as application components, defining workflows or tasks performed by the application. When a workflow or task is triggered for a customer, a new Pipeline Run or Job Run is created, typically with customer-specific parameters provided by the trigger.
  • Models are used as application components when a pretrained, out-of-the-box model is available for the application to use.

Data Flow Applications can be used to transform large

They can be used as steps within a pipeline.

When a pipeline containing a Data Flow step is run, it automatically creates and manages a new run of the Data Flow Application associated with that step. The Data Flow run is treated like any other step in the pipeline, when successfully completed, the pipeline continues its run, beginning later steps as part of the pipeline's orchestration.

For more information, see Data Flow Integration.

Instance components Data Science
  • Model
  • Model deployment
  • Scheduler
    • Schedules
Object Storage
  • Bucket
  • Object
Note:

ML Application triggers can be used as instance components.

ML Application triggers aren't OCI Resources but they can be used as instance components.

Triggers are the entry points for workflows (such as training) defined in your applications. They define under which conditions a workflow is started and ensure that the workflow is started with the identity of ML Application Instance (datasciencemlappinstance Resource Principal).

Single-tenant resources are created uniquely for each ML Application Instance (SaaS customer).

  • Models are used as instance components when a new model is trained specifically for each customer using their data.
  • Model Deployments serve as instance components to expose customer-specific models as services.
  • Buckets function as customer-specific storage for ingested, transformed, or processed data.
  • Objects are typically used for storing configurations specific to the customer.
  • Schedules enable periodic execution of workflows based on a defined interval. They're linked to ML Application Triggers which they invoke at scheduled intervals.
Note

ML Applications doesn't impose limits on the number of components you can use. While an application might require one pipeline, one trigger, one model, and one model deployment, you can build more complex applications, such as those with several pipelines, triggers, models, and model deployments. For example, five pipelines with five triggers, three models, and three model deployments. Also, ML Applications can be created without pipelines or model deployments, if they're not needed.

Package Descriptor File

The following is a schema for the descriptor:
descriptorSchemaVersion
  • description: The schema version for package descriptor letting further development of the schema. It has a major and a minor version (for example, "1.0") where the major version is increased for backward incompatible changes and the minor for backward compatible changes.
  • required: true
  • type: string
description
  • description: The description of the ML Application Implementation packaged as the specific ML Applications package. This value is shown as a description field in ML Applications implementation.
  • required: false
  • type: string
mlApplicationVersion
  • description: The version of the ML Applications contract (that's version field of ML Applications Version resource) which is implemented by the particular package.
    Note

    This is a placeholder that's reserved to be used in the future when the ML Applications Version resource is introduced. The provided value is ignored.
  • required: true
  • type: string
packageVersion
  • description: the version of the ML Applications package. This value is shown as a Package version field in ML Application Implementation.
  • required: true
  • type: string
packageArguments
  • description: The list of supported arguments. Arguments can be used for providing environment-specific values such as infrastructure OCIDs and some environment-specific scaling values.
  • type: map (the argument name maps to the properties of argument)
  • required: false
  • argument properties:
    • type
      • mandatory
      • type:
        • type: enum (string or ocid)
        • required: true
        • description: The type of the argument value.
      • Boolean (true or false)
      • required: false (default is true)
      • description: Whether the specific argument is mandatory or not.
    • description
      • type: string
      • required: true
      • description: The argument description.
    • validationRegexp
      • type: string
      • required: false
      • description: The regular expression used for validation of argument value.
    • defaultValue
      • type: string
      • required: false
      • description: The value used if the argument or configuration schema property isn't specified (it can be specified only when mandatory is false).
configurationSchema
  • description: The schema of the configuration which the consumer must provide as metadata of the ML Application Instance. This value is shown as a configurationSchema field in ML Application Implementation.
  • type: map (the configuration property name maps to the properties of the configuration property)
  • required: false
  • argument properties:
    • type
      • type: enum (string or secret)
      • required: true
      • description: The type of the configuration value.
    • mandatory
      • type: Boolean (true or false)
      • required: false (default is true)
      • description: Whether the specific configuration property is mandatory or not.
    • description
      • type: string
      • required: true
      • description: the configuration property description.
    • validationRegexp
      • type: string
      • required: false
      • description: The regular expression used for validation of configuration value.
    • sampleValue
      • type: string
      • required: true
      • description: The sample value used for validation of instance components.
    • defaultValue
      • type: string
      • required: false
      • description: The value used if argument or configuration schema property isn't specified (mandatory must be false).

Mandatory Terraform Attributes

All terraform definitions of data science jobs must ensure that the related job runs are automatically deleted when deleting the job.
resource oci_datascience_job ingestion_job {
  ...
  delete_related_job_runs = true
  ...
}
All terraform definitions of data science pipelines must ensure that the related pipeline runs are automatically deleted when deleting the pipeline.
resource oci_datascience_pipeline ingestion_pipeline {
  ...
  delete_related_pipeline_runs = true
  ...
}
Note

Failure to correctly specify the delete_related_xxx_runs attributes blocks the deletion of the ML Application Implementation version. The provider needs to remove the run resources to unblock the deletion.

Tenant Isolation and OCI SDK Version

Tenant isolation ensures the segregation of data and workloads for each customer. The ML Application service propagates the resource principal (identity) of ML Application Instances to workloads (Pipeline or Job Runs) started by ML Application triggers.

The propagation of the ML Application Instance resource principal requires corresponding support in the OCI SDKs:

  • Python SDK: Version 2.126.4 or later
  • Java SDK: Version 3.44.4 or later