Deploy a Data Lake Leveraging Power BI on Oracle Database@Azure

Many businesses leverage Microsoft Power BI with data lakes on Microsoft Azure to derive actionable business insights.

You can expand these capabilities by using a medallion architecture that includes Azure Data Factory, Azure Data Lake Storage, Azure Compute, Oracle Database@Azure (either a fully managed Oracle Autonomous Database or a co-managed Oracle Exadata Database Service instance), and Power BI to address several key data challenges faced by customers:

  • Data Silos and Integration: Azure Data Factory ingests data from diverse sources into a unified data lake, breaking down silos and providing a single source of truth.
  • Data Quality and Consistency: Autonomous Data Warehouse in the Curation Layer ensures clean, consistent, and high-quality data through deduplication and quality rules, reducing errors and enhancing decision-making.
  • Scalability and Performance: Azure's scalable compute resources and Autonomous Data Warehouse's serverless architecture or Oracle Exadata Database Service handle large-scale data processing efficiently, while maintaining optimal performance as data volumes and user adoption (concurrency) grow.
  • Complex Transformations: Azure Compute and Autonomous Data Warehouse or Oracle Exadata Database Service perform complex transformations and analytics efficiently, reducing processing time and focusing on insights.
  • Cost Management: The serverless and pay-as-you-go models for Azure services and Autonomous Data Warehouse or Oracle Exadata Database Service optimize costs, ensuring that you only pay for what you use.
  • Data Governance and Compliance: Structured data management layers facilitate better governance, traceability, and regulatory compliance.
  • Built-in Analytics: Users are able to apply analytics directly to their data by using built-in features such as artificial intelligence (AI), machine learning (ML), graph, spatial, and text analytics.

Typical use cases include:

  • Retail Analytics: Integrates data from online sales, in-store transactions, and customer feedback, optimizing inventory and marketing strategies.
  • Financial Services: Analyzes transaction data for fraud detection and regulatory compliance, mitigating risks.
  • Healthcare Analytics: Integrates patient data from EHRs, lab results, and wearable devices, improving patient care and health management.

This architecture enables enterprise customers across industries to leverage data effectively to empower their business users to make informed decisions to drive better business outcomes.

Logical Architecture

The analytical data lake can ingest data from multiple sources and can provide business insights by using Power BI running on Microsoft Azure.

  • Data Sources: The analytical data lake can ingest data from multiple sources. Azure Data Factory can ingest data from Microsoft SQL Server and Azure Blob Storage. Oracle Database@Azure can ingest data from Oracle Cloud ERP, Oracle Cloud Infrastructure Object Storage, Azure Cosmos Database, Azure SQL Database, various types of table storage data (Azure, PostgresSQL, Azure MariaDB), and other types of on-premises relational databases.
  • Data Tier: Oracle Database@Azure ingests source data from Azure Data Lake Storage in conjunction with Azure Data Factory.
  • Consumption Tier: Oracle Database@Azure provides insights to Microsoft Power BI running on Microsoft Azure.

The following diagram illustrates the functional architecture:



data-lake-db-azure-process-oracle.zip

Medallion Architecture

This section demonstrates how you can deploy Oracle Database@Azure as the data warehouse within the Azure medallion architecture.

The medallion architecture is a data management framework that structures data handling in a data lakehouse into distinct stages (bronze, silver, and gold), representing the different stages of data processing:

  • Bronze stage: Data from various sources is ingested, validated, and curated.
  • Silver stage: The data is stored and processed for analytics and reporting.
  • Gold stage: Refined data is delivered for analysis and reporting.

The following diagram illustrates the architecture:



data-lake-db-azure-medallion-oracle.zip

The medallion stages are further divided into the following deployment areas:

  • Ingestion Framework: Ingests data from various data sources using Azure Data Factory. Raw data is stored in Azure Data Lake Storage Gen 2 and Delta Lake. This framework ensures data consistency and accuracy across source and sink systems. This framework constitutes a robust set of scripts to ensure quality by using audit, balance and control mechanisms across platforms.
  • Validation: Raw data is ingested into Oracle Autonomous Data Warehouse Serverless or Oracle Exadata Database Service for deduplication and data quality check. This workflow performs basic cleansing masking of PII and PHI data along with validation of raw files through a rules-driven framework to perform schema checks. The validation framework can be implemented using Azure Data Factory.
  • Rejection Workflow: Any record that is rejected during the ingestion stage due to validation errors or other processing errors is staged on a separate Azure Data Lake Storage path. Automated email notifications using Logic App are sent to the support team based on defined software license agreements (SLAs). Standardized data remains in Oracle Autonomous Data Warehouse Serverless or Oracle Exadata Database Service.
  • Orchestration: A scheduling system manages data processing jobs, scheduling, and job dependencies. Azure Data Factory can be used for the orchestration of ETL jobs. The Orchestration stage includes Oracle Autonomous Data Warehouse Serverless or Oracle Exadata Database Service, Delta Lake, and Azure Data Lake Storage Gen 2.
  • Reporting/Analytics: The reporting stage includes Power BI and data services such as external feeds and data monetization.

The architecture has the following infrastructure components:

  • Region

    An Azure region is a geographical area in which one or more physical Azure data centers, called availability zones, reside. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

    Azure and OCI regions are localized geographic areas. For Oracle Database@Azure, an Azure region is connected to an OCI region, with availability zones (AZs) in Azure connected to availability domains (ADs) in OCI. Azure and OCI region pairs are selected to minimize distance and latency.

  • Availability zone

    An availability zone is a physically separate data center within a region designed to be available and fault-tolerant. Availability zones are close enough to have low-latency connections to other availability zones.

  • Virtual network (VNet) and subnet

    A VNet is a virtual network that you define in Azure. A VNet can have multiple non-overlapping CIDR blocks subnets that you can add after your create the VNet. You can segment a VNet into subnets, which can be scoped to a region or to an availability zones. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VNet. Use VNet to isolate your Azure resources logically at the network level.

  • Azure ExpressRoute

    Azure ExpressRoute lets you set up a private connection between a VNet and another network, such as your on-premises network or a network in another cloud provider. ExpressRoute is a more reliable and faster alternative to typical internet connections, because the traffic over ExpressRoute doesn't traverse the public internet.

  • Virtual network gateway

    A virtual network gateway allows traffic between an Azure VNet and a network outside Azure, either over the public internet or using ExpressRoute, depending on the gateway type that you specify.

  • Route table

    Route tables direct traffic between Azure subnets, VNets, and networks outside Azure.

  • Network security group

    A network security group contains rules to control network traffic between the Azure resources within a VNet. Each rule specifies the source or destination, port, protocol, and direction of network traffic that's allowed or denied.

  • Site-to-site VPN

    Provides a site-to-site IPSec VPN between your on-premises network and your VCN over a secure, encrypted connection.

The architecture has the following Oracle products and services:

  • Oracle Database@Azure

    Oracle Database@Azure is the Oracle Database service (Oracle Exadata Database Service on Dedicated Infrastructure and Oracle Autonomous Database Serverless) running on Oracle Cloud Infrastructure (OCI), deployed in Microsoft Azure data centers. The service offers features and price parity with OCI. Purchase the service on Azure Marketplace.

    Oracle Database@Azure integrates Oracle Exadata Database Service, Oracle Real Application Clusters (Oracle RAC), and Oracle Data Guard technologies into the Azure platform. Users manage the service on the Azure console and with Azure automation tools. The service is deployed in Azure Virtual Network (VNet) and integrated with the Azure identity and access management system. The OCI and Oracle Database generic metrics and audit logs are natively available in Azure. The service requires users to have an Azure subscription and an OCI tenancy.

    Autonomous Database is built on Oracle Exadata infrastructure, is self-managing, self-securing, and self-repairing, helping eliminate manual database management and human errors. Autonomous Database enables development of scalable AI-powered apps with any data using built-in AI capabilities using your choice of large language model (LLM) and deployment location.

    Both Oracle Exadata Database Service and Oracle Autonomous Database Serverless are easily provisioned through the native Azure Portal, enabling access to the broader Azure ecosystem.

  • Oracle Cloud Infrastructure Object Storage

    Oracle Cloud Infrastructure Object Storage stores content as objects within storage containers called buckets. Each object includes the data and the descriptive metadata to enable easy retrieval and management.

The architecture has the following Microsoft products and services:

  • Azure Data Factory

    Azure Data Factory (ADF) is a cloud-based data integration service provided by Microsoft Azure. It is designed to create, schedule, and orchestrate data workflows at scale. ADF allows organizations to ingest, transform, and move data from various sources to destinations, enabling data-driven decision-making and analytics.

  • Azure Functions

    Azure Functions is a serverless solution that allows you to write less code, maintain less infrastructure, and save on costs.

  • Azure Service Bus

    Azure service bus is a fully managed enterprise message broker with queues and publish-subscribe topics.

  • Azure Monitor

    Azure Monitor provides base-level infrastructure metrics and logs for most Azure services. Azure diagnostic logs are emitted by a resource and provide rich, frequent data about the operation of that resource. Azure Data Factory (ADF) can write diagnostic logs in Azure Monitor.

  • Azure DevOps (CICD)

    Azure Data Factory integrates with both GitHub and Azure DevOps Git to enable source control, release management and CI/CD. With source control, developers can collaborate, track and save their changes to a branch of choice (in this case, it will be DEV branch). These changes will be merged into the main branch and deployed to the higher-level environments (QA, UAT, Prod), where it will also be tested and Validated

  • Azure Active Directory

    Big data platform on Azure services can integrate with existing Azure Active Directory services to provide fine grain security control. This enables organization to reuse existing single-sign on process to access data & services on Azure cloud.

  • Azure ML

    Azure ML is a GUI-based integrated development environment for constructing and implementing Machine Learning workflow on Azure.

  • Power BI

    Power BI is Microsoft’s suite of business analytics tools designed to help organizations visualize data, share insights, and make informed decisions. It enables users to connect to a wide range of data sources, create interactive dashboards and reports, and collaborate across teams. Power BI is part of the Microsoft Power Platform and integrates seamlessly with other Microsoft products like Excel, Teams, and Azure services.

    Key Components of Power BI:

    • Power BI Service: A cloud-based platform where users can share, publish, and collaborate on reports and dashboards to enable real-time data monitoring and sharing across organizations.
    • Power BI Mobile: Mobile apps for iOS, Android, and Windows devices to access dashboards and reports on the go.
    • Power BI Embedded: A service that allows developers to integrate Power BI visuals and analytics into custom applications, portals, or websites.
  • Azure SQL Server

    Azure SQL Server is a cloud-based relational database service provided by Microsoft Azure. It is part of the Azure SQL family and offers a fully managed platform as a service (PaaS) environment for hosting and managing SQL Server databases in the cloud. Azure SQL Server is highly scalable, secure, and provides features for high availability, automated backups, and disaster recovery.

  • Azure PostgreSQL

    Azure Database for PostgreSQL is a managed relational database service provided by Microsoft Azure. It is designed to simplify the deployment, scaling, and management of PostgreSQL databases in the cloud, while offering high availability, security, and scalability. Azure Database for PostgreSQL allows developers to focus on building applications without worrying about infrastructure management.

  • Azure Cosmos

    Azure Cosmos DB is a globally distributed, multimodel database service provided by Microsoft Azure. It is designed for building highly responsive, scalable, and globally available applications. Cosmos DB supports multiple data models and APIs, making it a versatile choice for diverse application scenarios.

  • Azure BLOB Storage

    Azure Blob Storage is Microsoft Azure's object storage solution for the cloud, designed to store vast amounts of unstructured data. It is highly scalable, secure, and cost-effective, making it ideal for various use cases, including application development, data archiving, and media storage.

  • Delta Lake

    Delta Lake is an open-source storage layer that brings reliability to data lakes. It is designed to provide ACID (Atomicity, Consistency, Isolation, Durability) transactions, scalable metadata handling, and unifies streaming and batch data processing. When integrated with Azure Data Lake Service, Delta Lake enhances the capabilities of Azure's data storage and analytics services.

  • Azure Data Lake Storage

    Azure Data Lake Storage (ADLS) is a highly scalable and secure data storage service optimized for big data analytics. It combines the scalability of object storage with the performance and security features of a hierarchical file system, making it ideal for managing large-scale data and enabling advanced analytics and machine learning.

    Comparison to Azure Blob Storage

    Feature Azure Data Lake Storage Azure Blob Storage
    Use Case Big data analytics, hierarchical data General-purpose object storage
    Name Space Hierarchical (file system-like) Flat (no directory structure)
    Integration Optimized for big data frameworks General-purpose use and application storage
    Access Control File-level ACLs and RBAC RBAC and SAS tokens

Recommendations

Use the following recommendations as a starting point to deploy a data analytics pipeline on Microsoft Azure by using Oracle Database@Azure.Your requirements might differ from those described here.
  • Ensure that you have access to an Azure subscription and directory.
  • Ensure that you have access to an Oracle Cloud Infrastructure (OCI) tenancy.
  • Ensure that you have an active Oracle Database@Azure multicloud link between the Azure and OCI (this connection is created by default when you provision Oracle Database@Azure and is managed by Oracle).
  • Ensure that you have non-overlapping CIDR blocks between any Azure VNets and OCI VCNs.
  • Ensure that you have adequate Oracle Exadata Database Service or Oracle Autonomous Database Serverless service limit prior to provisioning.

Considerations

When you deploy a data analytics pipeline on Microsoft Azure by using Oracle Database@Azure, consider the following.

  • Disaster Recovery (DR)

    Disaster recovery is not represented in this architecture and is the responsibility of the customer.

  • Networking Setup for Oracle Database@Azure
    • Azure environment: Uses an Azure Virtual Network (VNet) for networking and creates virtual network interface cards (VNICs) inside a pre-created delegated subnet.
    • OCI environment: Oracle Database@Azure connects to a client subnet within an Oracle Cloud Infrastructure (OCI) virtual cloud network (VCN).
  • Client Connectivity

    This network setup allows client connectivity from Azure resources.

  • Oracle Autonomous Database Serverless Networking
    • Use a VNet with 1 database subnet in Azure. Use a /27 subnet, but allow for planned future growth.
    • Create and delegate the database subnet to Oracle Database@Azure before deploying Oracle Autonomous Database Serverless.
    • Reuse (or share) the delegated subnet with additional databases.
  • Azure ExpressRoute

    The Azure ExpressRoute cost varies from one region to another. Azure has more than one SKU available for an express route; Oracle recommends using the Local setting, because it has no separate ingress or egress charges, and it starts at the minimum bandwidth of 1 Gbps. The Standard and Premium configurations offer lower bandwidth, but incur separate egress charges in a metered setup.

  • Oracle Autonomous Database Serverless
    • Integration

      Oracle Autonomous Database has built-in integration services for over 100 applications, databases, and object stores. Use Data Studio's graphical user interface to design data transformations for data integration with Autonomous Database.

    • Analytics

      Oracle Autonomous Database has built-in analytics services that allow you to apply a range of analytics (graph, spatial, machine learning and AI), directly to your data, reducing analytic latency to quickly provide actionable insights.

    • Cost

      Provisioning and basic management functions for an Oracle Autonomous Database Serverless instance is done from the Oracle Database@Azure page. Additional management functions are available within the Oracle Cloud Infrastructure (OCI) portal available by using the link to the specific Autonomous Database from the Oracle Database@Azure page.

      Control resource use using the following configuration options:
      • ECPU count allows setting the ECPU count from 2 to 512.
      • Compute auto scaling allows you to automatically scale its computing allocation up to 512. By default, this is selected.
      • Storage allows setting the storage allocation from 1 TB to 383 TB or 20 GB to 393216 GB.
      • Storage unit size allows you to select whether your storage is allocated in GB or TB.
      • Storage auto scaling allows you to scale its storage allocation automatically up to 383 TB or 393216 GB. By default, this is not selected
    • Performance
      • High availability (HA) is provided for Oracle Autonomous Database Serverless.
      • Check and measure the network latency as part of application performance testing.
      • Consider region affinity. Network latency between applications and databases hosted in different cloud data centers must be less than 10 ms. We recommend selecting nearby application and database regions to achieve optimal end-to-end performance.
    • Availability

      Oracle Autonomous Database Serverless in available in multiple Microsoft Azure regions.

      See the service license agreement (SLA) link in the Explore More section.

  • Oracle Exadata Database Service on Dedicated Infrastructure
    • Cost
      • Deployed infrastructure has consistent cost and can be shut down at any time (Minimum 48 hours is charged).
      • Run time costs are determined by the number of OCPUs assigned to the VM which is scalable.
      • Licensing options include both Bring Your Own License (BYOL) and License Included.
      • Oracle Support Rewards are available for BYOL
    • Performance
      • Customers experience the same performance that they would experience with any other Exadata deployment (Oracle Exadata Database Service, Oracle Exadata Database Service on Cloud@Customer, or on-premises).
      • Latency (one way/round trip) from an Azure VM of any type to the Oracle Database@Azure service is solely the province of Azure cloud services. The goal is .5ms, but can vary due to Azure networking considerations.
      • Oracle does not charge any data egress fees for Oracle Database@Azure, but depending on Azure architecture Microsoft may charge data movement fees.
    • Availability

      Exadata deployments have a 99.99% service level objective (SLO). Higher numbers can be reached by using a maximum availability architecture (MAA) which can include setting up a disaster recovery site and using backup and restore capabilities such as Oracle Database Zero Data Loss Autonomous Recovery Service.

Acknowledgments

  • Authors: Amrita Mukherjee, CCSP
  • Contributors: : Wei Han, Roger Simon, Tammy Bednar, Martin Gubar, Roy Rodan, Robert Lies