Deploy a Multicloud Analytics Pipeline with Microsoft Azure Synapse and Oracle Autonomous Database

A cloud deployment is a reflection of the enterprises’ heterogeneous IT environment. Enterprises want to optimize the cost performance and use the best-of-breed services when migrating to the cloud. A multicloud split stack data analytics pipeline meets the enterprise’s needs by directly connecting Azure Synapse Analytics to Oracle Autonomous Database on Shared Exadata Infrastructure through an interconnect for real-time business insight.

This architecture uses Azure integration runtime (IR) to create a private endpoint for Azure Synapse Analytics. The Synapse traffic is routed through the private Oracle Interconnect for Azure to the private endpoint of Oracle Autonomous Database on Shared Exadata Infrastructure on OCI.

The following are some of the benefits:
  • Multicloud data analytics pipeline delivers the real-time business insights
  • Oracle Autonomous Database provides machine learning driven managed service with low TCO
  • Oracle Interconnect for Azure provides a private dedicated, high bandwidth, low latency network connection
  • Azure Synapse Analytics brings together data integration, enterprise data warehousing, and big data analytics

Architecture

This architecture shows a typical multicloud deployment with Oracle E-Business Suite on Oracle Cloud Infrastructure (OCI) and Azure Synapse Analytics on Microsoft Azure.

The Oracle E-Business Suite full stack is deployed on OCI. The production Oracle E-Business Suite data is replicated to Oracle Autonomous Data Warehouse in real-time using Oracle Cloud Infrastructure GoldenGate. Azure Synapse Analytics accesses the data warehouse directly through the Oracle Interconnect for Azure. Oracle Interconnect for Azure provides a dedicated, high bandwidth and low latency connection between Azure and OCI.

The following diagram illustrates this reference architecture.

Description of multicloud-data-analytics-pipeline-azure.png follows
Description of the illustration multicloud-data-analytics-pipeline-azure.png

multicloud-data-analytics-pipeline-azure-oracle.zip

On-premises applications and users connect to both clouds through VPN or a dedicated connection, such as Oracle Cloud Infrastructure FastConnect or Azure ExpressRoute.

The private interconnection between OCI and Azure uses a private IP or endpoint to route the traffic. The Autonomous Database with Private Endpoint is deployed in an OCI region with Interconnection with Azure. Azure Synapse Analytics is a platform-as-a-service (PaaS) offering that doesn’t have a private endpoint for Oracle databases. However, Azure offers self-hosted Integration Runtime (IR) that you can deploy on a virtual machine (VM) and bridge between Oracle Autonomous Database and Azure Synapse. Because the Azure Managed Virtual Network (VNet) for Synapse workspace can’t attach directly to the Oracle Database for Azure VNet. A self-managed VNet is required to deploy the IR.

The self-hosted integration runtime host has a limitation of network bandwidth that might not be enough to transfer a high volume of data from the autonomous database to Azure Synapse Analytics within the required window. We recommend deploying multiple IRs for high availability.

The architecture has the following components on OCI:

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Availability domains

    Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Bastion service

    Oracle Cloud Infrastructure Bastion provides restricted and time-limited secure access to resources that don't have public endpoints and that require strict resource access controls, such as bare metal and virtual machines, Oracle MySQL Database Service, Autonomous Transaction Processing (ATP), Oracle Container Engine for Kubernetes (OKE), and any other resource that allows Secure Shell Protocol (SSH) access. With Oracle Cloud Infrastructure Bastion service, you can enable access to private hosts without deploying and maintaining a jump host. In addition, you gain improved security posture with identity-based permissions and a centralized, audited, and time-bound SSH session. Oracle Cloud Infrastructure Bastion removes the need for a public IP for bastion access, eliminating the hassle and potential attack surface when providing remote access.

  • Load balancer

    The Oracle Cloud Infrastructure Load Balancing service provides automated traffic distribution from a single entry point to multiple servers in the back end.

  • Oracle E-Business Suite

    Oracle E-Business Suite is a suite of integrated business applications that enable organizations to make better decisions, reduce costs, and increase performance. Products provide solutions for customer relationship management, service management, financial management, human capital management, project portfolio management, advanced procurement, supply chain management, value chain planning, and value chain execution.

  • Oracle E-Business Suite Cloud Manager

    Oracle E-Business Suite Cloud Manager is a web-based application that drives all the principal automation flows for Oracle E-Business Suite on Oracle Cloud Infrastructure, including provisioning new environments, performing lifecycle management activities on those environments, and restoring environments from on-premises.

    Oracle E-Business Suite Cloud Manager was designed to simplify the diverse tasks Oracle E-Business Suite database administrators (DBAs) perform on a daily basis, with the goal of reducing the effort needed to perform them.

  • Oracle Cloud Infrastructure GoldenGate

    Oracle Cloud Infrastructure GoldenGate is a fully managed service that allows data ingestion from sources residing on premises or in any cloud, leveraging the GoldenGate CDC technology for a non intrusive and efficient capture of data and delivery to Oracle Autonomous Data Warehouse in real time and at scale in order to make relevant information available to consumers as quickly as possible.

  • Autonomous Database

    Oracle Autonomous Database is a fully managed, preconfigured database environments that you can use for transaction processing and data warehousing workloads. You do not need to configure or manage any hardware, or install any software. Oracle Cloud Infrastructure handles creating the database, as well as backing up, patching, upgrading, and tuning the database.

  • Data Safe

    Oracle Data Safe is a fully-integrated, regional cloud service focused that provides a complete set of features for protecting sensitive and regulated data in Oracle databases. Data Safe also supports on-premises databases, Oracle Exadata Database Service on Cloud@Customer, and multicloud deployments. All Oracle Database customers can reduce the risk of a data breach and simplify compliance by using Oracle Data Safe to assess configuration and user risk, monitor and audit user activity, and to discover, classify, and mask sensitive data.

  • Object storage

    Object storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can seamlessly scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.

  • Audit

    The Oracle Cloud Infrastructure Audit service automatically records calls to all supported Oracle Cloud Infrastructure public application programming interface (API) endpoints as log events. Currently, all services support logging by Oracle Cloud Infrastructure Audit.

  • Logging
    Logging is a highly scalable and fully managed service that provides access to the following types of logs from your resources in the cloud:
    • Audit logs: Logs related to events emitted by the Audit service.
    • Service logs: Logs emitted by individual services such as API Gateway, Events, Functions, Load Balancing, Object Storage, and VCN flow logs.
    • Custom logs: Logs that contain diagnostic information from custom applications, other cloud providers, or an on-premises environment.
  • Policy

    An Oracle Cloud Infrastructure Identity and Access Management policy specifies who can access which resources, and how. Access is granted at the group and compartment level, which means you can write a policy that gives a group a specific type of access within a specific compartment, or to the tenancy.

  • Identity and Access Management (IAM)

    Oracle Cloud Infrastructure Identity and Access Management (IAM) is the access control plane for Oracle Cloud Infrastructure (OCI) and Oracle Cloud Applications. The IAM API and the user interface enable you to manage identity domains and the resources within the identity domain. Each OCI IAM identity domain represents a standalone identity and access management solution or a different user population.

  • Dynamic routing gateway (DRG)

    The DRG is a virtual router that provides a path for private network traffic between VCNs in the same region, between a VCN and a network outside the region, such as a VCN in another Oracle Cloud Infrastructure region, an on-premises network, or a network in another cloud provider.

  • Internet gateway

    The internet gateway allows traffic between the public subnets in a VCN and the public internet.

  • Service gateway

    The service gateway provides access from a VCN to other services, such as Oracle Cloud Infrastructure Object Storage. The traffic from the VCN to the Oracle service travels over the Oracle network fabric and never traverses the internet.

  • Web Application Firewall (WAF)

    Oracle Cloud Infrastructure Web Application Firewall (WAF) is a payment card industry (PCI) compliant, regional-based and edge enforcement service that is attached to an enforcement point, such as a load balancer or a web application domain name. WAF protects applications from malicious and unwanted internet traffic. WAF can protect any internet facing endpoint, providing consistent rule enforcement across a customer's applications.

  • Route table

    Virtual route tables contain rules to route traffic from subnets to destinations outside a VCN, typically through gateways.

  • Network security group (NSG)

    Network security group (NSG) acts as a virtual firewall for your cloud resources. With the zero-trust security model of Oracle Cloud Infrastructure, all traffic is denied, and you can control the network traffic inside a VCN. An NSG consists of a set of ingress and egress security rules that apply to only a specified set of VNICs in a single VCN.

  • Security list

    For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.

The architecture has the following components on Azure:

  • Azure ExpressRoute

    Microsoft Azure ExpressRoute lets you set up a private connection between a VNet and another network, such as your on-premises network or a network in another cloud provider. ExpressRoute is a more reliable and faster alternative to typical internet connections, because the traffic over ExpressRoute doesn't traverse the public internet.

  • Microsoft Azure VNet

    Microsoft Azure Virtual Network (VNet) is the fundamental building block for your private network in Azure. VNet enables many types of Azure resources, such as Azure virtual machines (VM), to securely communicate with each other, the internet, and on-premises networks.

  • Integration Runtime

    Integration Runtime provides data integration capabilities across different networks with public accessible endpoints.

    Microsoft Azure services, such as Azure Synapse Analytics, use Integration Runtime for data integration.

  • Azure Synapse Analytics

    Azure Synapse Analytics is a Microsoft service that provides analytics for data warehouses and big data systems.

  • Azure Active Directory

    Azure Active Directory is a Microsoft service that stores information about objects on the network and makes this information easy for administrators and users to find and use (such as accounts, privileges, security policies, DNS). Azure Active Directory uses a structured data store as the basis for a logical, hierarchical organization of directory information.

Recommendations

Use the following recommendations as a starting point. Your requirements might differ from the architecture described here.
  • Provisioning
    • Provision a larger virtual machine (VM) with higher network bandwidth on Azure to host Integration Runtime (IR) and enable parallel threads for data transfer between Oracle Autonomous Data Warehouse and Azure Synapse Analytics.
    • Provision more than one VM on Microsoft Azure for self-hosted integration runtime to avoid a single point of failure, and the combined bandwidth of the IR hosts provides the required throughput for the data transfer between Oracle Autonomous Data Warehouse and Azure Synapse Analytics.
    • Provision and deploy Azure IR in different Azure availability zone to achieve maximum availability.
    • Choose the right size of the Oracle Cloud Infrastructure FastConnect and Azure ExpressRoute virtual circuits to support the bandwidth need for the workload.
    • Provision the Oracle Autonomous Data Warehouse using the Oracle Cloud Infrastructure (OCI) virtual cloud network (VCN)/subnet that is connected to the OCI Dynamic Routing Gateway (DRG) and OCI FastConnect.
    • Configure routing and security/Network Security Group (NSG) on OCI to allow Azure Synapse Analytics network traffic to Oracle Autonomous Data Warehouse.
    • For Oracle Autonomous Database on Shared Exadata Infrastructure private endpoint, specify the VCN configuration to allow traffic only from the specified VCN. This blocks access to the database from all public IPs or VCNs.
    • VCN

      When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

      Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

      After you create a VCN, you can change, add, and remove its CIDR blocks.

      When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

Considerations

When deploying this architecture, consider the following:

  • Packaged Apps

    This architecture uses Oracle E-Business Suite as an example. It applies to other package apps, such as PeopleSoft, JD Edwards EnterpriseOne, Siebel, or any 3rd party application built on Oracle Database.

  • Oracle Autonomous Data Warehouse
    • Appropriately size the Autonomous Data Warehouse database with the required compute and storage that is best suitable for the workload.
    • Enable Auto-scaling for the Autonomous Data Warehouse database to support any additional workloads.
    • Enable Auto-backup and select the appropriate retention period that supports your business.
    • Enable Oracle Autonomous Data Guard to enable a standby (peer) database to provide data protection disaster recovery for your business need.
  • Data replication
    • If Oracle E-Business Suite has a disaster recovery (DR) instance, then you can replicate the data from the DR instance to Autonomous Data Warehouse to offload the workload from the production Oracle E-Business Suite database.
    • As an alternative to Oracle GoldenGate, you may use Oracle Cloud Infrastructure Data Integration to replicate the data from the Oracle E-Business Suite database to the Autonomous Data Warehouse.

Acknowledgments

  • Authors: Wei Han, Niranjan Mohapatra, Ejaz Akram