Establish a Multicloud Data Solution Between OCI and Microsoft Azure
Organizations can establish an enterprise data lake house or data warehouse for storing live and archived data in one centralized location.
This approach simplifies the process of creating a centralized data store that serves as a comprehensive solution for all data analytics needs.
By leveraging a multicloud data analytics solution, organizations can efficiently conduct analytics using a central data lake house or data warehouse integrated with various data sources, including Fusion SaaS, flat files, on-premises and cloud databases, Salesforce, and e-commerce websites.
The ultimate objective is to create a centralized repository of data that has been extracted and analysed by business units to enhance end-to-end business visibility and provide data-driven insights. Benefits include:
- Unified data analytics pipeline
Streamline access to data from various cloud and on-premises sources, such as databases and object stores.
- Ease of integration
Seamless integration of data across diverse systems, formats, APIs, applications, and devices to ensure secure collaboration and compliance with security protocols without the need for manual coding.
- High performance analytics
Efficient data querying leading to faster decisions and improved customer service.
- Cost, security, and availability
Minimize CapEx and OpEx while achieving optimal cost-effectiveness, performance, security, and availability.
Architecture
This reference architecture illustrates an enterprise multicloud data pipeline that collects and formats data from various sources, transferring it to the enterprise data lake or data warehouse. It includes batch integration, data integration, and real-time integration scenarios.
Oracle Interconnect for Microsoft Azure links Azure ExpressRoute and Oracle Cloud Infrastructure FastConnect to connect two separate cloud networks efficiently.
Azure's Virtual Network (VNet) traffic routes through a private interconnection to OCI's virtual cloud network (VCN).
The following diagram illustrates this reference architecture.
oci-azure-multicloud-data-solution-diagram-oracle.zip
OCI Data Integration connects, and extracts data from, on-premises and cloud sources using native adapters, accesses Oracle SaaS applications using BICC connector, conducts transformations on the data, and loads it into an OCI data lake through adapters (Oracle Autonomous Database or OCI Object Storage).
Oracle application integration services collect real-time data from diverse source systems such as Oracle SaaS applications, internet-of-things (IoT), streaming services, social media, on-premises systems, and other cloud providers via native adapters. It then executes transformation and orchestration processes before loading the data into an OCI data lake using adapters (Oracle Autonomous Database or OCI Object Storage).
OCI GoldenGate captures data from Oracle Autonomous Database and replicates it to Azure Data Lake Gen2 and Azure Synapse Analytics in near real-time via OCI FastConnect. The replication to Synapse involves staging and merging the change data in micro-batches in Azure Data Lake Storage Gen2 before merging it into the Synapse target table.
Flow of events
- Data extraction and transfer
- Customer data is transferred from the data source to OCI Object Storage either directly or via default, source-specific drivers.
- On-premises flat files are moved to OCI Object Storage using the customer's Python script or by establishing an FTP connection with OCI Object Storage for seamless connectivity to Oracle Integration Cloud Service.
- Data is securely uploaded in its raw form to OCI Object Storage buckets with encryption.
- Data ingestion and transformation
- OCI Data Integration retrieves data from OCI Object Storage and other sources, transforms it according to business needs using Apache Spark and a proposed architecture flow, then stores the transformed data back into OCI Object Storage alongside the autonomous database.
- This process follows the Delta Lake architecture for active ACID properties and compression. The data is now structured, can be queried, and is ready for further analytics.
- OCI Logging manages all processing logs.
- Orchestration and scheduling
- OCI Data Integration manages data flow processes, scheduling the execution of Data Flow applications and Data Science notebooks as necessary.
- Developers can run Data Flow applications from the UI or Data Science service notebooks for flexibility.
- Data archival
- OCI Object Storage lifecycle policies, which are defined and implemented by customers, play a crucial role in automating the process of data archival. These policies facilitate the seamless shifting of data to more cost-effective storage tiers or the systematic deletion of outdated information, all in accordance with predefined rules and guidelines. This automation is essential for ensuring not only efficient data management but also compliance with various retention policies that organizations must adhere to.
- By utilizing these lifecycle policies, customers can optimize their storage costs while maintaining control over their data retention practices and ensuring that they are aligned with legal and regulatory requirements.
- Data replication to Azure
- OCI GoldenGate is used for data replication to Azure via a dedicated network established with Oracle Interconnect for Microsoft Azure.
- OCI GoldenGate integrates closely with Azure Data Lake and Azure Synapse Analytics for seamless data loading.
- Data analysis and reporting
- Oracle Analytics Cloud and Power BI are examples of business intelligence tools that can establish a connection with OCI Object Storage or Oracle Autonomous Database.
- These tools gather the data that has been transformed and produce user-friendly dashboards showcasing key business key performance indicators (KPIs).
- Through these dashboards, valuable insights can be obtained from the data, facilitating well-informed decision-making.
The architecture has the following components:
- Tenancy
A tenancy is a secure and isolated partition that Oracle sets up within Oracle Cloud when you sign up for Oracle Cloud Infrastructure. You can create, organize, and administer your resources in Oracle Cloud within your tenancy. A tenancy is synonymous with a company or organization. Usually, a company will have a single tenancy and reflect its organizational structure within that tenancy. A single tenancy is usually associated with a single subscription, and a single subscription usually only has one tenancy.
- Region
An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).
- Compartment
Compartments are cross-region logical partitions within an Oracle Cloud Infrastructure tenancy. Use compartments to organize your resources in Oracle Cloud, control access to the resources, and set usage quotas. To control access to the resources in a given compartment, you define policies that specify who can access the resources and what actions they can perform.
- Availability domains
Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain shouldn't affect the other availability domains in the region.
- Virtual cloud network (VCN) and subnets
A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.
- ExpressRoute
Azure ExpressRoute lets you set up a private connection between a VNet and another network, such as your on-premises network or a network in another cloud provider.
Azure ExpressRoute is a more reliable and faster alternative to typical internet connections because the traffic over Azure ExpressRoute does not traverse the public internet.
- Autonomous
Database
Oracle Autonomous Database is a fully managed, preconfigured database environments that you can use for transaction processing and data warehousing workloads. You do not need to configure or manage any hardware, or install any software. Oracle Cloud Infrastructure handles creating the database, as well as backing up, patching, upgrading, and tuning the database.
- Object storage
Object storage provides quick access to large amounts of structured and unstructured data of any content type, including database backups, analytic data, and rich content such as images and videos. You can safely and securely store and then retrieve data directly from the internet or from within the cloud platform. You can scale storage without experiencing any degradation in performance or service reliability. Use standard storage for "hot" storage that you need to access quickly, immediately, and frequently. Use archive storage for "cold" storage that you retain for long periods of time and seldom or rarely access.
- Data integration
Oracle Cloud Infrastructure Data Integration is a fully managed, serverless, cloud-native service that extracts, loads, transforms, cleanses, and reshapes data from a variety of data sources into target Oracle Cloud Infrastructure services, such as Autonomous Data Warehouse and Oracle Cloud Infrastructure Object Storage. ETL (extract transform load) leverages fully-managed scale-out processing on Spark, and ELT (extract load transform) leverages full SQL push-down capabilities of the Autonomous Data Warehouse in order to minimize data movement and to improve the time to value for newly ingested data. Users design data integration processes using an intuitive, codeless user interface that optimizes integration flows to generate the most efficient engine and orchestration, automatically allocating and scaling the execution environment. Oracle Cloud Infrastructure Data Integration provides interactive exploration and data preparation and helps data engineers protect against schema drift by defining rules to handle schema changes.
- Oracle GoldenGate Cloud
Service
Oracle GoldenGate Cloud Service is a fully managed service that allows data ingestion from sources residing on premises or in any cloud, leveraging the GoldenGate CDC technology for a non- intrusive and efficient capture of data and delivery to Oracle Autonomous Data Warehouse in real time and at scale in order to make relevant information available to consumers as quickly as possible.
- Oracle Integration
Oracle Integration provides pre-built connectivity to SaaS and on-premises applications, run-ready process automation templates, and a low-code visual builder for web and mobile application development. It gives you native access to events in Oracle Cloud ERP, HCM, and CX. Connect app-specific analytic silos to simplify requisition-to-receipt, recruit-to-pay, lead-to-invoice, and other critical processes, providing your IT and business leaders with end-to-end visibility.
- Azure Synapse Analytics
Azure Synapse Analytics is an analytics service that brings together data integration, enterprise data warehousing, and big data analytics. It allows querying data on your terms, using either serverless or dedicated options, at scale. Azure Synapse Analytics brings these concepts together with a unified experience to ingest, explore, prepare, transform, manage, and serve data for immediate BI and machine learning needs.
- Azure Data Lake Storage Gen2
Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on Azure Blob Storage. Data Lake Storage Gen2 converges the capabilities of Azure Data Lake Storage Gen1 with Azure Blob Storage.
For example, Azure Data Lake Storage Gen2 provides file system semantics, file-level security, and scale. Since these capabilities are built on Blob storage, you also get low-cost tiered storage with high availability and disaster recovery capabilities.
- Azure Application Gateway
Azure Application Gateway is a web traffic (OSI layer 7) load balancer that enables you to manage traffic to your web applications. Traditional load balancers operate at the transport layer (OSI layer 4 - TCP and UDP) and route traffic based on source IP address and port, to a destination IP address and port. Azure Application Gateway can make routing decisions based on additional attributes of an HTTP request; for example URI path or host headers.
For example, you can route traffic based on the incoming URL. So, if
/images
is in the incoming URL, you can route traffic to a specific set of servers (known as a pool) configured for images. If/video
is in the URL, that traffic is routed to another pool that's optimized for videos.
Recommendations
- Provisioning
- Select the appropriate size for the OCI FastConnect and Azure ExpressRoute virtual circuits to meet the bandwidth requirements of the workload.
- Deploy the Oracle Database within the OCI VCN and subnet that is linked to the OCI Dynamic Routing Gateway (DRG) and OCI FastConnect.
- Set up routing and security measures or network security group (NSG) on OCI to enable Azure Synapse Analytics network traffic to reach the Oracle Database.
- When configuring the Oracle Database with a private endpoint, define the VCN settings to allow traffic exclusively from the designated VCN, restricting access from any public IPs or VCNs.
Considerations
Consider the following points when deploying this reference architecture.
- Cost
OCI FastConnect: The price for OCI FastConnect remains consistent across all OCI regions, with no additional fees for data ingress or egress.
Azure ExpressRoute: The pricing for Azure ExpressRoute differs depending on the region.
- Performance
In this reference architecture, the customer required near real-time data replication from the primary database on OCI to Azure endpoints for their use case. By utilizing OCI GoldenGate, the customer ensured that their heterogeneous and multicloud big data reservoirs were consistently updated with real-time data from both operational and analytical production systems, facilitating real-time analysis.
- Networking
Oracle Interconnect for Microsoft Azure serves as an alternative network solution and is compatible with specific Azure-OCI paired regions. To find out which Azure and OCI regions support Oracle Database Service for Microsoft Azure, see Explore More for Oracle Database Service for Azure Regional Availability.
In cases where OCI and Azure regions do not support Oracle Interconnect for Microsoft Azure, you can utilize the backbone of each cloud provider to handle the traffic. If opting for the OCI backbone, it is necessary to establish an intermediary region that encompasses the Oracle Interconnect for Microsoft Azure region within OCI and a remote peering connection (RPC) to the region lacking support for Oracle Interconnect for Microsoft Azure.