This architecture shows how a data lakehouse Oracle Cloud Infrastructure (OCI) is used to create a modern data platform to ingest, process, store, serve, and visualize data from structured and unstructured sources.
Architectural components are divided into 4 stages and presented as a functional data flow:
- Data Producers: Streaming data producers include Kafka producers for price and exchange rate data and event producers for trades data. Unstructured data is pushed directly to object storage ("bronze" data lake). Scheduled or event-triggered data producers include batch data from file storage and reference data from database systems.
- Ingest/Load: Persistent streaming data is passed on to "bronze" object storage. Streaming data is also processed as anomalous trade data and real-time fund insights. Scheduled or event-triggered data is processed by Data Integration and is passed on to "bronze" object storage.
- Persist/Transform/Compute:
- Streaming data is processed by Kafka Connect to produce real-time fund insights and streaming analytics. Service Connector Hub coordinates anomalous trade data and with Oracle Cloud Infrastructure Notifications for user insights.
- Oracle Cloud Infrastructure Events, Oracle Functions, and OCI Vision provides OCR and text extraction for Fax images from bronze object storage and pass the resulting data onto "silver" object storage.
- Within the data lakehouse,Oracle Cloud Infrastructure Data Flow cleanses data from bronze object storage and passes it on to silver object storage. Data Flow also processes data from silver object storage and passes it on to "gold" object storage. Oracle Autonomous Data Warehouse (ADW) and Oracle Cloud Infrastructure Data Catalog provide "gold" data for end users and analytics.
- Serve/Visualize: End users access streaming anomalous data or use OpenSearch for real-time analytics and insights. Users can use Oracle Analytics Cloud or third-party analytics to leverage data from the data lakehouse. Data scientists can also use Oracle Cloud Infrastructure Data Science to leverage the data from the data lakehouse.