Note:

Process Files on Oracle Cloud Infrastructure Object Storage with a Scalable Cloud Native Flow

Introduction

In our applications, we often need to process large quantities of files. In the past, this was done in batch form, but with new technologies and advent of the cloud, we are now able to transform many serial processes into parallel ones. The use of message queues, Kubernetes clusters and event-driven architectures are some of the technologies and architectures widely used to get the best out of large volume processing.

Oracle Cloud Infrastructure (OCI) has resources to allow scalability and cost reduction. Let’s explore the cloud native services.

In this tutorial, we will see a very common way of processing large amounts of files, where applications can deposit their files in a bucket in OCI Object Storage and when these files are deposited, an event is generated allowing a function can be triggered to write the URL of this file to OCI Streaming.

Note: We can imagine this solution just with some source application saving the content of the files in OCI Streaming while our application just reads this content, but it is not a good practice to transfer large volumes of data within a Kakfa queue. To do this, our approach will use a pattern called Claim-Check, which will do exactly as our proposal, instead of sending the file through the message queue, we will send the reference to this file. We will delegate reading the file to the application that will be in charge of processing it.

This tutorial will feature these components: OCI Object Storage, Events Service, Functions and Streaming.

At the end of this chain, we will have the application consuming the streaming queue, however, we will not discuss how the file will be processed.

img.png

Objectives

Prerequisites

Task 1: Create the OCI Streaming Instance

OCI Streaming is a Kafka like managed streaming service. We can develop applications using the Kafka APIs and common SDKs in the market. In this task, we will create an instance of OCI Streaming and configure it to execute in both applications to publish and consume a high volume of data.

  1. Log in to the OCI Console, click Analytics & AI and Streams.

  2. Select Compartment and click Create Stream.

    create-stream.png

  3. Enter the Stream Name of stream instance and keep the other parameters with the default values. Click Create to initialize the instance and wait for the Active status.

    save-create-stream.png

    Note:

    • In the streaming creation process, we can select Auto-Create a default stream pool, so default pool will be created automatically.

    • You can create your stream instance in a private subnet. In this case, attention for the function in Task 4, it must be on the same private subnet or in a subnet that has access to the private subnet stream instance. Check your VCN, subnets, security lists, service gateway or other security components. Be sure that your function can access the OCI Streaming instance.

  4. Click on the DefaultPool link.

    default-pool-option.png

  5. Click Kafka Connection Settings and view the connection settings. Note down the information as it is required in the next tasks.

    stream-conn-settings.png

    kafka-conn.png

Task 2: Create an OCI Object Storage Bucket

We need to create a bucket. Buckets are logical containers for storing objects, so all files used for this demo will be stored in this bucket.

  1. Open the OCI Console and navigate to Storage, Buckets. In the Buckets section, select Compartment, compartment will be the same as OCI Streaming instance created in Task 1.

    select-compartment.png

  2. Click Create Bucket and enter a Bucket Name. Keep the other parameters with the default values and click Create.

    create-bucket.png

    We can see the bucket created.

    buckets-dataflow.png

    Note: Review the OCI IAM Policies for the bucket. You need to set up the policies if you want to use these buckets in your demo applications. For more information, see Overview of Object Storage and OCI IAM Policies.

Task 3: Activate OCI Object Storage Bucket for OCI Events Services

We need to enable the bucket to emit events. So, click on your bucket details and search for Emit Object Events Edit link and activate it.

img_8.png

Task 4: Create OCI Functions

To execute the following task, download code from here: OCI_Streaming_Claim_Check.zip.

Task 5: Configure the OCI Events

Let’s configure an Event Rule to trigger your function to obtain the bucket information and send it to OCI Streaming.

  1. Select the same compartment for the rule and click Create Rule.

    img_10.png

  2. Enter the following information.

    1. In the Rules Condition section.

      • Condition: Event Type.
      • Service Name: Object Storage.
      • Event Type: Object-Create, Object-Delete, Object-Update.
    2. In the Action section.

      • Action Type: Functions.
      • Function Compartment: <your function compartment name>.
      • Function Application: <your function app, in this example ocistreaming-app>.
      • Function: fn_stream.

    img_9.png

Task 6: Test your Circuit of Events

Note: For private networks, the test code needs to be executed in a bastion connected to the same private subnet of your OCI Streaming.

In the OCI_Streaming_Claim_Check.zip source code package, we can find a folder named monitoring and a file named consume.py. We can use this code to monitor and test if the solution works correctly.

We need to configure the code.

img_11.png

After configuring your stream parameters, you can run the code and verify the circuit, which is bucket, event, function and streaming.

img_12.png

Acknowledgments

More Learning Resources

Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.

For product documentation, visit Oracle Help Center.