This diagram shows the components and stages in the medallion architecture
for a data lakehouse.
Enterprise data management for the architecture is provided by Microsoft Purview.
Infrastructure and security services provided for the archictecture include monitoring,
DevOps and CI/CD, identity and access management and encyption, and multi-region
disaster recovery failover.
Data sources include source systems, on-prmises relational database management systems
(RDBMS), cloud RDBMS, internet of things (IoT) devices, and other unstructured data
sources.
The medallion architecture divides source data movement into distinct stages
listed across the top of the diagram:
- Bronze stage: Data from various sources is ingested, validated, and
curated.
- Silver stage: The data is stored and processed for analytics and
reporting.
- Gold stage: Refined data is delivered for analysis and reporting.
Within these stages, component groups are further identified by whether they
provide compute or storage functionality:
- Compute: Data engineering pipelines that process and transform data
and play a critical role in preparing data for analysis and reporting by
executing various transformation rules such as de-duplication, data quality,
applying data modeling rules for star schema, and so on.
- Storage: Data is ingested, stored, and managed as the foundation for
data retrieval by Azure Data Lake Service, Oracle Database@Azure, SQL pools and so on.
The medallion stages are further divided into the following deployment areas
through which data moves sequentially:
- Azure SQL Database (compute): Ingests data using Azure Data
Factory.
- Landing - raw zone view (storage): Files are stored in Azure Data Lake Storage.
- Raw - raw zone view (storage): The Ingestion Framework stage manages
files and changes in data in Azure Data Lake Storage by using a Delta Lake and the
monitoring service.
- Curation (compute): The Validation stage ingests raw data into Oracle Autonomous Data Warehouse Serverless or Oracle Exadata Database
Service for deduplication and data quality check.
- Data Lake - curated (storage): In the Rejection Workflow stage, data
governance insures that any record that is rejected during the ingestion stage due
to validation errors or other processing errors is staged on a separate Azure Data
Lake Storage path. The DevOps and CI/CD service provides input to this stage.
- Standardized (compute): In the Rejection Workflow stage, data governance
insures that any record that is rejected during the ingestion stage due to
validation errors or other processing errors is staged on a separate Azure Data Lake
Storage path. The DevOps and CI/CD service provides input to this stage.
- Data Warehouse - Consumption Layer (storage): In the Orchestration
stage, a scheduling system manages data processing jobs, scheduling, and job
dependencies. Azure Data Factory can be used for the orchestration of ETL jobs. The
Orchestration stage includes Oracle Autonomous Data Warehouse Serverless or Oracle Exadata Database
Service, Delta Lake, and Azure Data Lake Storage Gen 2
- Reporting/Analytics: This stage includes Power BI and data services
such as external feeds and data monetization..