Implement a real-time multicloud data analytics architecture across regions

Organizations often want to leverage existing analytics platforms for all their data analytics requirements when it comes to the cloud.

This multicloud solution describes a customer-inspired data analytics architecture with the Oracle E-Business Suite application on Oracle Cloud Infrastructure (OCI) in the US, while the Oracle E-Business Suite data is replicated to Microsoft Azure in Europe for Azure Synapse analytics in near real-time.

The analytics tools and data sources are connected by a dedicated private network to provide low latency and high bandwidth for data replication. Oracle Cloud Infrastructure GoldenGate (OCI GoldenGate) is used for data replication. The customer requirements were addressed with a multicloud data analytics solution by:

  • Migrating the on-premises Oracle Database to Oracle Base Database Service which provides the benefits of maximum database uptime, performance, scalability, security, and productivity.
  • Keeping the Analytics stack in Microsoft Azure, therefore eliminating the need to make any significant changes in configurations/integrations for downstream consumers.
  • Using OCI GoldenGate to replicate change-data from Oracle Database to Azure Data Lake Gen2 and Azure Synapse in real-time.

Architecture

This reference architecture shows how you can enable private low-latency connectivity between the data analytics tools in a Microsoft Azure region and the data resource in a remote OCI region.

A FastConnect partner connects Azure ExpressRoute and Oracle Cloud Infrastructure FastConnect to join the two remote cloud networks. The virtual network (VNet) on Microsoft Azure traffic traverses the private interconnection to the virtual cloud network (VCN) on OCI.

The Oracle E-Business Suite production database is deployed on Oracle Base Database Service as an example. The solution is applicable to Oracle Autonomous Database and Exadata Database Service as the backend database.

As the source is Oracle Database and target is Azure Synapse/Azure Data Lake Storage Gen 2, OCI GoldenGate replication is deployed in a separate subnet with the following two deployments:
  1. Oracle deployment for capturing data from Oracle E-Business Suite database.
  2. Big Data deployment to apply the data captured from Oracle E-Business Suite database to Azure Synapse.
OCI GoldenGate captures data from the Oracle Database and replicates that data to Azure Data Lake Gen2 and Azure Synapse in near real-time through FastConnect. OCI GoldenGate replication to Synapse uses stage and merge data flow. The change data is staged in a temporary location i.e., Azure Data Lake Storage Gen 2 in micro-batches and eventually merged into the synapse target table.

The following diagram illustrates this reference architecture.



oci-multicloud-db-analytics-azure-arch-oracle.zip

The architecture has the following components:

Oracle Cloud Infrastructure components

  • Autonomous Transaction Processing

    Oracle Autonomous Transaction Processing is a self-driving, self-securing, self-repairing database service that is optimized for transaction processing workloads. You do not need to configure or manage any hardware, or install any software. Oracle Cloud Infrastructure handles creating the database, as well as backing up, patching, upgrading, and tuning the database.

  • FastConnect

    Oracle Cloud Infrastructure FastConnect provides an easy way to create a dedicated, private connection between your data center and Oracle Cloud Infrastructure. FastConnect provides higher-bandwidth options and a more reliable networking experience when compared with internet-based connections.

  • Availability domain

    Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.

  • Virtual cloud network (VCN) and subnet

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Security list

    For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.

  • Route table

    Virtual route tables contain rules to route traffic from subnets to destinations outside a VCN, typically through gateways.

  • Dynamic routing gateway (DRG)

    The DRG is a virtual router that provides a path for private network traffic between VCNs in the same region, between a VCN and a network outside the region, such as a VCN in another Oracle Cloud Infrastructure region, an on-premises network, or a network in another cloud provider.

  • Oracle Cloud Infrastructure GoldenGate

    Oracle Cloud Infrastructure GoldenGate is a fully managed service that allows data ingestion from sources residing on premises or in any cloud, leveraging the GoldenGate CDC technology for a non intrusive and efficient capture of data and delivery to Oracle Autonomous Data Warehouse in real time and at scale in order to make relevant information available to consumers as quickly as possible.

Microsoft Azure components
  • Virtual network (VNet) and subnet

    A VNet is a virtual network that you define in Azure. A VNet can have multiple non-overlapping CIDR blocks subnets that you can add after your create the VNet. You can segment a VNet into subnets, which can be scoped to a region or to an availability zones. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VNet. Use VNet to isolate your Microsoft Azure resources logically at the network level.

  • ExpressRoute

    Azure ExpressRoute lets you set up a private connection between a VNet and another network, such as your on-premises network or a network in another cloud provider. ExpressRoute is a more reliable and faster alternative to typical internet connections, because the traffic over ExpressRoute does not traverse the public internet.

  • Virtual network gateway

    A virtual network gateway allows traffic between an Azure VNet and a network outside Azure, either over the public internet or using ExpressRoute, depending on the gateway type that you specify.

  • Route table

    Route tables direct traffic between Azure subnets, VNets, and networks outside Azure.

  • Network security group

    A network security group contains rules to control network traffic between the Azure resources within a VNet. Each rule specifies the source or destination, port, protocol, and direction of network traffic that's allowed or denied.

  • Azure Synapse Analytics

    Azure Synapse Analytics is an analytics service that brings together data integration, enterprise data warehousing, and big data analytics. It allows querying data on your terms, using either serverless or dedicated options, at scale. Azure Synapse brings these concepts together with a unified experience to ingest, explore, prepare, transform, manage, and serve data for immediate BI and machine learning needs.

  • Azure Data Lake Storage Gen2

    Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on Azure Blob Storage. Data Lake Storage Gen2 converges the capabilities of Azure Data Lake Storage Gen1 with Azure Blob Storage. For example, Data Lake Storage Gen2 provides file system semantics, file-level security, and scale. Since these capabilities are built on Blob storage, you also get low-cost tiered storage with high availability and disaster recovery capabilities.

Recommendations

Use the following recommendations as a starting point. Your requirements might differ from the architecture described here.
  • Provisioning

    Choose the right size of the Oracle Cloud Infrastructure FastConnect and Azure ExpressRoute virtual circuits to support the bandwidth need for the workload.

    Provision the Oracle Database using the OCI virtual cloud network (VCN)/subnet that is connected to the OCI Dynamic Routing Gateway (DRG) and OCI FastConnect.

    Configure routing and security/Network Security Group (NSG) on OCI to allow Azure Synapse Analytics network traffic to Oracle Database.

    For Oracle Cloud Database in a private endpoint, specify the VCN configuration to allow traffic only from the specified VCN. This blocks access to the database from all public IPs or VCNs.

Considerations

Consider the following points when deploying this reference architecture.

  • Cost

    Oracle Cloud Infrastructure FastConnect: The cost of FastConnect is the same across all the Oracle Cloud Infrastructure regions. There are no separate ingress or egress data charges.

    Azure ExpressRoute: The Azure ExpressRoute cost varies from one region to another. Azure has more than one SKU available for an express route; Oracle recommends using the Local setting, because it has no separate ingress or egress charges, and it starts at the minimum bandwidth of 1 Gbps. The Standard and Premium configurations offer lower bandwidth, but incur separate egress charges in a metered setup.

    The Auto Scaling of Oracle CPU (OCPU) in Oracle Autonomous Transaction Processing enables handling of peak workloads when required and also reduces license costs to a great extent as a result.

  • Performance

    For the customer use case in this reference architecture, the requirement was near real-time data replication from primary database on OCI to Azure endpoints. With the help of OCI GoldenGate, customer’s heterogeneous and multicloud big data reservoirs were always up to date with real-time data from their operational and analytical production systems enabling real-time analysis.

  • Networking

    Oracle Interconnect for Microsoft Azure can be used also be used as an alternative network solution. Oracle Interconnect for Microsoft Azure is available only on specific Azure-OCI paired regions. For more information, see Learn what Azure and OCI regions support OracleDB for Azure in the Explore More section.

    If OCI and Azure regions don't support Oracle Interconnect for Microsoft Azure, you can use the backbone of each cloud provider (OCI and Azure) to carry the traffic. If you use the OCI backbone, you must create an intermediary region that includes the available Oracle Interconnect for Microsoft Azure region inside OCI and Remote Peering Connection (RPC) to the region that does not support Oracle Interconnect for Microsoft Azure.

    Note:

    If you use the OCI backbone with RPC in a design, you must use custom routing at the Dynamic Routing Gateway (DRG) level to route the traffic from the intermediary region to the other region, which does not include the Interconnect eligible region. To use the Azure backbone, consider the ExpressRoute SKUs local, Standard, and premium to ensure you have the right SKUs to connect the two regions inside Azure. You can also consider using the FastConnect provider with layer three services like Megaport cloud router.

Acknowledgments

  • Author: Shrinidhi Kulkarni
  • Contributors: Wei Han, Atefeh Yousefi Attaei