Note:
- This tutorial requires access to Oracle Cloud. To sign up for a free account, see Get started with Oracle Cloud Infrastructure Free Tier.
- It uses example values for Oracle Cloud Infrastructure credentials, tenancy, and compartments. When completing your lab, substitute these values with ones specific to your cloud environment.
Process Files on Oracle Cloud Infrastructure Object Storage with a Scalable Cloud Native Flow
Introduction
In our applications, we often need to process large quantities of files. In the past, this was done in batch form, but with new technologies and advent of the cloud, we are now able to transform many serial processes into parallel ones. The use of message queues, Kubernetes clusters and event-driven architectures are some of the technologies and architectures widely used to get the best out of large volume processing.
Oracle Cloud Infrastructure (OCI) has resources to allow scalability and cost reduction. Let’s explore the cloud native services.
-
OCI Object Storage enables customers to securely store any type of data in its native format. With built-in redundancy, OCI Object Storage is ideal for building modern applications that require scale and flexibility, as they can be used to consolidate multiple data sources for analysis, backup, or archival purposes.
-
OCI Streaming service is an Apache Kafka-compatible, serverless, real-time event streaming platform for developers and data scientists. Streaming is fully integrated with OCI, Database, GoldenGate and Integration Cloud. The service also offers out-of-the-box integrations for hundreds of third-party products in categories such as DevOps, databases, Big data, and SaaS applications.
-
OCI Events Service tracks changes made to resources using events that comply with the Cloud Native Computing Foundation (CNCF) cloud events standard. Developers can respond to changes made in real time by triggering code with OCI Functions, recording to OCI Streaming, or sending alerts using OCI Notifications.
-
OCI Functions is a serverless computing service that allows developers to build, run, and scale applications without managing any infrastructure. OCI Functions has native integrations with other OCI services and SaaS applications. OCI Functions is based on the open source Fn project, therefore developers can create applications that can be easily ported to other cloud and on-premises environments. Functions-based code typically runs for short periods of time, is stateless, and executes for a single logic purpose. Customers only pay for the resources they use.
-
Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE) is a managed Kubernetes service that simplifies large-scale, enterprise-grade Kubernetes operations. It reduces the time, cost, and effort required to manage complex Kubernetes infrastructure. OKE lets you deploy Kubernetes clusters to ensure reliable operations on the control plane and worker nodes with automatic scaling, updates, and security patches. Additionally, OKE offers a fully serverless Kubernetes experience with virtual nodes.
In this tutorial, we will see a very common way of processing large amounts of files, where applications can deposit their files in a bucket in OCI Object Storage and when these files are deposited, an event is generated allowing a function can be triggered to write the URL of this file to OCI Streaming.
Note: We can imagine this solution just with some source application saving the content of the files in OCI Streaming while our application just reads this content, but it is not a good practice to transfer large volumes of data within a Kakfa queue. To do this, our approach will use a pattern called Claim-Check, which will do exactly as our proposal, instead of sending the file through the message queue, we will send the reference to this file. We will delegate reading the file to the application that will be in charge of processing it.
This tutorial will feature these components: OCI Object Storage, Events Service, Functions and Streaming.
At the end of this chain, we will have the application consuming the streaming queue, however, we will not discuss how the file will be processed.
Objectives
- Implement a scalable event architecture that will allow processing a large number of files through the use of OCI Object Storage, Events Service, Functions and Streaming.
Prerequisites
-
VNC, subnet(s) and all security settings configured for bucket, function and streaming.
-
Oracle Cloud Infrastructure Identity and Access Management (OCI IAM) user configured to manage buckets, events service, function and streaming.
Task 1: Create the OCI Streaming Instance
OCI Streaming is a Kafka like managed streaming service. We can develop applications using the Kafka APIs and common SDKs in the market. In this task, we will create an instance of OCI Streaming and configure it to execute in both applications to publish and consume a high volume of data.
-
Log in to the OCI Console, click Analytics & AI and Streams.
-
Select Compartment and click Create Stream.
-
Enter the Stream Name of stream instance and keep the other parameters with the default values. Click Create to initialize the instance and wait for the Active status.
Note:
-
In the streaming creation process, we can select Auto-Create a default stream pool, so default pool will be created automatically.
-
You can create your stream instance in a private subnet. In this case, attention for the function in Task 4, it must be on the same private subnet or in a subnet that has access to the private subnet stream instance. Check your VCN, subnets, security lists, service gateway or other security components. Be sure that your function can access the OCI Streaming instance.
-
-
Click on the DefaultPool link.
-
Click Kafka Connection Settings and view the connection settings. Note down the information as it is required in the next tasks.
Task 2: Create an OCI Object Storage Bucket
We need to create a bucket. Buckets are logical containers for storing objects, so all files used for this demo will be stored in this bucket.
-
Open the OCI Console and navigate to Storage, Buckets. In the Buckets section, select Compartment, compartment will be the same as OCI Streaming instance created in Task 1.
-
Click Create Bucket and enter a Bucket Name. Keep the other parameters with the default values and click Create.
We can see the bucket created.
Note: Review the OCI IAM Policies for the bucket. You need to set up the policies if you want to use these buckets in your demo applications. For more information, see Overview of Object Storage and OCI IAM Policies.
Task 3: Activate OCI Object Storage Bucket for OCI Events Services
We need to enable the bucket to emit events. So, click on your bucket details and search for Emit Object Events Edit link and activate it.
Task 4: Create OCI Functions
To execute the following task, download code from here: OCI_Streaming_Claim_Check.zip.
-
Understand the Code
There are two code files, the main code
HelloFunction.java
and the OCI Streaming producer codeProducer.java
.-
HelloFunction.java
.In this part of the code, we need to capture data coming from the OCI Events Services, so there are 3 sources.
- Context: This property cmes from the RuntimeContext and we use the
REGION
variable. - Event Data: OCI Events Services produces data as
resourceName
. - Additional Event Details Data: OCI Events Services for OCI Object Storage produces data as
namespace
andbucketName
.
We can mount the OCI Object Storage file URL.
And the main code can pass the URL to the OCI Streaming producer.
- Context: This property cmes from the RuntimeContext and we use the
-
Producer.java
.This is the
Message
class structure to produce the Kafka information for the Claim-check pattern. Just onlykey
andvalue
.And this is the basic code to produce to the streaming.
-
-
Build and deploy the OCI Function
In this step, we will need to use the OCI CLI to create the OCI Functions and deploy code into your OCI tenancy. To create an OCI Functions, see Functions: Get Started using the CLI, follow the steps and search for the Java option. You need to create your function with the following information.
Application: ocistreaming-app (follow the link Functions: Get Started using CLI) fn create app ocistreaming-app --annotation oracle.com/oci/subnetIds='["<the same OCID of your streaming subnet>"]' Context Variable: REGION=<your streaming region name, ex: us-ashburn-1> fn config app ocistreaming-app REGION=us-ashburn-1
Remember the compartment where you deployed your function. You will need this information to configure the OCI Events Service.
Task 5: Configure the OCI Events
Let’s configure an Event Rule to trigger your function to obtain the bucket information and send it to OCI Streaming.
-
Select the same compartment for the rule and click Create Rule.
-
Enter the following information.
-
In the Rules Condition section.
- Condition:
Event Type
. - Service Name:
Object Storage
. - Event Type:
Object-Create, Object-Delete, Object-Update
.
- Condition:
-
In the Action section.
- Action Type:
Functions
. - Function Compartment:
<your function compartment name>
. - Function Application:
<your function app, in this example ocistreaming-app>
. - Function:
fn_stream
.
- Action Type:
-
Task 6: Test your Circuit of Events
Note: For private networks, the test code needs to be executed in a bastion connected to the same private subnet of your OCI Streaming.
In the OCI_Streaming_Claim_Check.zip source code package, we can find a folder named monitoring
and a file named consume.py
. We can use this code to monitor and test if the solution works correctly.
We need to configure the code.
After configuring your stream parameters, you can run the code and verify the circuit, which is bucket, event, function and streaming.
Related Links
Acknowledgments
- Author - Cristiano Hoshikawa (Oracle LAD A-Team Solution Engineer)
More Learning Resources
Explore other labs on docs.oracle.com/learn or access more free learning content on the Oracle Learning YouTube channel. Additionally, visit education.oracle.com/learning-explorer to become an Oracle Learning Explorer.
For product documentation, visit Oracle Help Center.
Process Files on Oracle Cloud Infrastructure Object Storage with a Scalable Cloud Native Flow
F93582-01
March 2024
Copyright © 2024, Oracle and/or its affiliates.