Process media by using serverless job management and ephemeral compute workers

Processing large media files can be a resource intensive operation requiring large compute shapes for timely and efficient processing. In scenarios where media processing requests might be ad-hoc and on-demand, leaving instances idle while waiting for new work is not cost effective.

By utilizing Oracle Cloud Infrastructure's (OCI) server-less capabilities, including OCI Functions and OCI NoSQL, we can quickly create a management system for processing media content using ephemeral OCI Compute workers.

Architecture

This reference architecture shows how you can process digital media by using OCI compute instances. It takes a file (a "media asset"), performs a compute-intensive operation on it (for example, ffmpeg transcoding or other processing function), and directs the result to an output bucket that a subsequent activity can put up and use.

As part of the job creation, a worker instance is launched to process the uploaded object. The worker then processes the job by reading the video from the source object storage bucket, transcoding the video, and uploading the new version to the destination object storage for consumption. Upon successful upload to the destination object storage bucket, the worker instance terminates itself.

Jobs management is performed by OCI functions and job state is stored in an OCI NoSQL table. One compute worker instance is launched per job using preemptible capacity. If preemptible capacity is not available, the worker is launched using on-demand capacity. A regular health check runs to validate that any jobs not being processed due to quotas or limits being reached are retried. All job management is logged centrally and workers send notifications for jobs that cannot be processed.

The following diagram illustrates this reference architecture.

Description of proc_med_arch.png follows
Description of the illustration proc_med_arch.png

proc_med_arch-oracle.zip

This architecture has the following OCI resources:
  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Object Storage

    The Oracle Cloud Infrastructure Object Storage service is an internet-scale, high-performance storage platform that offers reliable and cost-efficient data durability. The Object Storage service can store an unlimited amount of unstructured data of any content type, including analytic data and rich content, like images and videos.

    Object Storage buckets are used as the location for original (source) and processed media (destination).

  • Preemptible Compute Instances

    Preemptible instances behave the same as regular compute instances, but the capacity is reclaimed when it's needed elsewhere, and the instances are terminated. If your workloads are fault-tolerant and can withstand interruptions, then preemptible instances can reduce your costs. For example, you can use preemptible instances to optimize costs for workloads that can tolerate interruptions, such as tests that can be stopped and resumed later.

    The ephemeral compute workers are first attempted to be launched using the preemptible instances.

  • Functions

    Oracle Functions is a fully managed, multitenant, highly scalable, on-demand, Functions-as-a-Service platform. It is powered by the Fn Project open source engine. Functions enable you to deploy your code, and either call it directly or trigger it in response to events. Oracle Functions uses Docker containers hosted in Oracle Cloud Infrastructure Registry.

    Oracle Functions written with Python are used to create and monitor media processing.

  • NoSQL

    A NoSQL database service offering on-demand throughput and storage based provisioning that supports JSON, Table and Key-Value datatypes, all with flexible transaction guarantees.

    A NoSQL database is used to track the status of each processing job. Job management functions and ephemeral compute workers use the NoSQL table to asynchronously communcate during the media processing workflow.

  • Logging

    The Oracle Cloud Infrastructure Logging service is a highly scalable and fully managed single pane of glass for all the logs in your tenancy. Logging provides access to logs from Oracle Cloud Infrastructure resources. These logs include critical diagnostic information that describes how resources are performing and being accessed.

    A log group is used as a centralized location for messages related to job management. Job creation events, worker launch status, worker preemption, and job retry queue processing is all logged.

  • Health Checks

    The Oracle Cloud Infrastructure Health Checks service provides users with high frequency external monitoring to determine the availability and performance of any publicly facing service, including hosted websites, API endpoints, or externally facing load balancers. By using Health Checks, users can ensure that they are immediately aware of any availability issue affecting their customers.

    Health checks are used in-conjunction with the an API gateway to periodically check for queued or interrupted jobs that can be retried.

Additionally, this architecture has the following components not directly depicted above:
  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • IAM Policy

    Oracle Cloud Infrastructure Identity and Access Management (IAM) lets you control who has access to your cloud resources. A policy is a document that specifies who can access which resources, and how. Access is granted at the group and compartment level, which means you can write a policy that gives a group a specific type of access within a specific compartment, or to the tenancy itself. If you give a group access to the tenancy, the group automatically gets the same type of access to all the compartments inside the tenancy.

  • Events

    Oracle Cloud Infrastructure Events enables you to create automation based on the state changes of resources throughout your tenancy. Use Events to allow your development teams to automatically respond when a resource changes its state.

  • Notifications

    The Oracle Cloud Infrastructure Notifications service broadcasts messages to distributed components through a publish-subscribe pattern, delivering secure, highly reliable, low latency and durable messages for applications hosted on Oracle Cloud Infrastructure and externally. Use Notifications to get notified when event rules are triggered or alarms are breached, or to directly publish a message.

  • Custom Image

    Oracle Cloud Infrastructure uses images to launch instances. You specify an image to use when you launch an instance. You can create a custom image of a bare metal instance’s boot disk and use it to launch other instances. Instances you launch from your image include the customizations, configuration, and software installed when you created the image.

  • API Gateway

    Oracle API Gateway service enables you to publish APIs with private endpoints that are accessible from within your network, and which you can expose to the public internet if required. The endpoints support API validation, request and response transformation, CORS, authentication and authorization, and request limiting.

Recommendations

Use the following recommendations as a starting point when you process media by using serverless job management and ephemeral compute workers. Your requirements might differ from the architecture described here.
  • VCN

    When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

    Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

    When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

  • Security

    Use Oracle Cloud Guard to monitor and maintain the security of your resources in Oracle Cloud Infrastructure proactively. Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for risky activities. When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with taking those actions, based on responder recipes that you can define.

    For resources that require maximum security, Oracle recommends that you use security zones. A security zone is a compartment associated with an Oracle-defined recipe of security policies that are based on best practices. For example, the resources in a security zone must not be accessible from the public internet and they must be encrypted using customer-managed keys. When you create and update resources in a security zone, Oracle Cloud Infrastructure validates the operations against the policies in the security-zone recipe, and denies operations that violate any of the policies.

  • Job Tracking Retention

    OCI NoSQL has the ability to set Time-to-Live (TTL) on table rows. Use TTL to set the appropriate job retention period for the type of processing that is being performed.

  • Health Checks

    OCI Health Checks are time base, set the appropriate interval for the type of processing that is being performed.

  • Ephemeral Workers

    Workers should be launched from custom images designed specifically for the type of media processing needed. Instances should automatically start processing once running and not allow any external connections to be established to it.

Considerations

Consider the following points when deploying this reference architecture.

  • Logging OCI

    Logging integrates with many OCI services and is used for simple logging from within an OCI Function.

  • OCI Application Performance Monitoring

    OCI Application Performance Monitoring can be used if features like distributed tracking for complex function interaction is required.

  • Job Tracking

    OCI NoSQL is a lightweight method for tracking job status. More advanced database services can be substituted when additional features are needed.

  • Preemptible Compute

    The emphemeral workers are launched using preemptible capacity. Preemptible capacity costs 50% less than on-demand capacity in all regions but not all compute shapes are supported.

Deploy

The Terraform code for this reference architecture is available as a sample stack in Oracle Cloud Infrastructure Resource Manager. You can also download the code from GitHub, and customize it to suit your specific requirements.

  • Deploy using the sample stack in Oracle Cloud Infrastructure Resource Manager:
    1. Click Deploy to Oracle Cloud.

      If you aren't already signed in, enter the tenancy and user credentials.

    2. Select the region where you want to deploy the stack.
    3. Follow the on-screen prompts and instructions to create the stack.
    4. After creating the stack, click Terraform Actions, and select Plan.
    5. Wait for the job to be completed, and review the plan.

      To make any changes, return to the Stack Details page, click Edit Stack, and make the required changes. Then, run the Plan action again.

    6. If no further changes are necessary, return to the Stack Details page, click Terraform Actions, and select Apply.
  • Deploy using the Terraform code in GitHub:
    1. Go to GitHub.
    2. Clone or download the repository to your local computer.
    3. Follow the instructions in the README document.

Explore More

Learn more about media processing using serverless job management and ephemeral compute workers.

Review these additional resources:

Acknowledgements

  • Authors: Lawrence Gabriel, Sathya Mohankalyan