Process Bulk Data Using OCI Data Integration and Oracle Integration Cloud Services

Process or integrate bulk data from external sources to targeted systems or applications.

Considering this scenario: You receive data in bulk from an external source (e.g., customers, suppliers, employees, products, etc.). Before it reaches your end systems or applications, the data needs to be orchestrated, enriched, combined, or organized. As part of the flow to accomplish this, you need to integrate with two or more intermediate applications or services, or apply complex transformations to the data. This process may add additional attributes to the data after making calls or orchestrating with various third-party applications (based on REST, SOAP, etc.). This transactional data may also need complex transformations (JSON or XML), look-ups, or cross-references.

This scenario can be easily implemented with two cloud services: OCI Data integration (OCI DI) and Oracle Integration Cloud (OIC), where OCI DI addresses all of your data integration or "Extract, Transform, Load" (ETL) needs and OIC addresses all of your application integration or enterprise-grade connectivity, regardless of the applications you are connecting or where they reside.

Architecture

This reference architecture represents a use case for using OCI DI and OIC Service to process bulk data.

This reference architecture also addresses the challenges of processing Apache Parquet, Apache Avro, and Microsoft Excel files in OIC through OCI DI. For example, to process financial reporting data (e.g., accounts payable, accounts receivable, GLs, cash flows, assets and liabilities, revenue, etc.) OCI DI converts these file formats into comma-separated value (CSV) files, which are then processed by OIC.

The following diagram illustrates this reference architecture.



oci-bulk-data-integration-architecture-diagram-oracle.zip

Here is an explanation of the steps shown in the above reference architecture:

  1. External sources (e.g., custom applications, non-Oracle applications, Oracle databases running on third-party clouds, third-party cloud services, on-premises databases, and applications) upload or drop the bulk data load file into an OCI Object Storage bucket.
  2. OCI Observability & Management service - OCI Events service looks for an object or file uploaded into the OCI Object Storage bucket.
  3. OCI Events service triggers an action to invoke OCI Functions with a bucket and a file name.
  4. OCI Functions receives the event and invokes the OCI DI pipeline with input parameters: bucket name and file name.
  5. OCI DI pipeline reads the bulk data load file from the OCI Object Storage bucket and splits the single, large data file into numerous, smaller files. It then uploads the split files into the OCI Object Storage bucket.
  6. Another instance of an OCI Events service looks for split files uploaded into the OCI Object Storage bucket.
  7. OCI Events service triggers an action to invoke OCI Functions with a bucket name and for each file name.
  8. OCI Functions receives the event and invokes OIC integration flow with the input parameters of bucket name and each file name.
  9. OIC integration reads each file from the OCI Object Storage bucket.
  10. OIC integration, based on the requirement, orchestrates and enriches the data by making invocations to one or more intermediate applications or systems. It then performs complex transformations, look-ups, cross-references, etc. and finally processes the data to downstream systems or applications.

The architecture has the following components:

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Data Integration

    OCI Data Integration is a fully managed, multi-tenant service that helps data engineers and "extract, transform, and load" (ETL) developers with common ETL tasks such as ingesting data from a variety of data assets; cleansing, transforming, and reshaping that data; and efficiently loading it to target data assets.

  • Oracle Integration Cloud

    With Oracle Integration Cloud, you have the power to integrate your cloud and on-premises applications, automate business processes, gain insight into your business processes, develop visual applications, use an SFTP-compliant file server to store and retrieve files, and exchange business documents with a B2B trading partner.

  • Events

    OCI Events Service tracks resource changes using events that comply with the Cloud Native Computing Foundation (CNCF) CloudEvents standard. Developers can respond to changes in real-time by triggering code with Functions, writing to Streaming, or sending alerts using Notifications.

  • Functions

    OCI Functions is a serverless platform that lets developers create, run, and scale applications without managing any infrastructure. Functions integrates with OCI, platform services, and SaaS applications. Because Functions is based on the open source Fn Project, developers can create applications that can be easily ported to other cloud and on-premises environments. Code based on Functions typically runs for short durations, and customers pay only for the resources they use.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Security list

    For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.

  • Route table

    Virtual route tables contain rules to route traffic from subnets to destinations outside a VCN, typically through gateways.

Acknowledgments

  • Authors: Pavan Rajalbandi
  • Contributors: John Sulyok