Enterprise Data Warehousing - a Predictive Maintenance Example
You have access to streaming data in real time, so how can you apply advanced analytics and data science capabilities to understand the context for an actionable event, gain insight, and create a response?
Use Oracle Cloud Infrastructure Object Storage and Oracle Cloud Infrastructure Data Flow to process streaming event and log data for predictive analysis and machine learning.
This reference architecture positions the technology solution within the overall business context:
At a conceptual level, the technology solution addresses the problem as follows:
Architecture
This architecture uses data science and machine learning to analyze streaming and log data to provide context and insight for actionable events.
The following diagram illustrates this reference architecture.
analysis-streamed-data-architecture-oracle.zip
The architecture focuses on the following logical divisions:
- Data refinery
Ingests and refines the data for use in each of the data layers in the architecture. The shape is intended to illustrate the differences in processing costs for storing and refining data at each level and for moving data between them.
- Data persistence platform (curated information layer)
Facilitates access and navigation of the data to show the current business view. For relational technologies, data may be logical or physically structured in simple relational, longitudinal, dimensional or OLAP forms. For non-relational data, this layer contains one or more pools of data, either output from an analytical process or data optimized for a specific analytical task.
- Access and interpretation
Abstracts the logical business view of the data for the consumers. This abstraction facilitates agile approaches to development, migration to the target architecture, and the provision of a single reporting layer from multiple federated sources.
The architecture has the following components:
-
Data streaming
Oracle Cloud Infrastructure Streaming service provides a fully managed, scalable, and durable storage solution for ingesting continuous, high-volume streams of data that you can consume and process in real time. Streaming can be used for messaging, high-volume application logs, operational telemetry, web click-stream data, or other publish-subscribe messaging model use cases in which data is produced and processed continually and sequentially.
- Service connectors
Oracle Cloud Infrastructure Service Connector Hub is a cloud message bus platform that orchestrates data movement between services in OCI. You can use it to move data between services in Oracle Cloud Infrastructure. Data is moved using service connectors. A service connector specifies the source service that contains the data to be moved, the tasks to perform on the data, and the target service to which the data must be delivered when the specified tasks are completed.
You can use Oracle Cloud Infrastructure Service Connector Hub to quickly build a logging aggregation framework for SIEM systems. An optional task might be a function task to process data from the source or a log filter task to filter log data from the source.
-
Data flow
Oracle Cloud Infrastructure Data Flow is a fully managed big data service that lets you run Apache Spark applications with no infrastructure to deploy or manage. It lets you deliver big data and AI applications faster because you can focus on your applications without getting distracted by operations. Data flow applications are reusable templates consisting of a Spark application, its dependencies, default parameters, and a default run-time resource specification.
-
Object storage
Oracle Cloud Infrastructure Object Storage is an internet-scale, high-performance storage platform that offers reliable and cost-efficient data durability. Oracle Cloud Infrastructure Object Storage can store an unlimited amount of unstructured data of any content type, including analytic data. You can safely and securely store or retrieve data directly from the internet or from within the cloud platform. Multiple management interfaces let you easily start small and scale seamlessly, without experiencing any degradation in performance or service reliability.
-
Data science
Data Science provides infrastructure, open source technologies, libraries, and packages, and data science tools for data science teams to build, train, and manage machine learning (ML) models in Oracle Cloud Infrastructure. The collaborative and project-driven workspace provides an end-to-end cohesive user experience and supports the lifecycle of predictive models.
Considerations
When processing streaming data and a broad range of enterprise data resources for business analysis and machine learning, consider these implementation options.
Guidance | Data Refinery | Data Persistence Platform | Access & Interpretation |
---|---|---|---|
Recommended |
|
Oracle Cloud Infrastructure Object Storage | Oracle Cloud Infrastructure Data Science |
Other Options | Oracle Cloud Infrastructure Data Integration | Oracle Database Exadata Cloud Service | Third-party tools |
Rationale |
Oracle Cloud Infrastructure Streaming service is a fully managed service. Service Connector Hub is a cloud message bus platform that orchestrates data movement between services in OCI. |
Oracle Cloud Infrastructure Object Storage stores unlimited data in raw format. |
Data Science is a fully managed, self-service platform for data science teams to build, train, and manage machine learning (ML) models in Oracle Cloud Infrastructure. The Data Science service provides infrastructure and data science tools. |
Deploy
The code required to deploy this reference architecture is available in GitHub. You can pull the code into Oracle Cloud Infrastructure Resource Manager with a single click, create the stack, and deploy it. Alternatively, download the code from GitHub to your computer, customize the code, and deploy the architecture by using the Terraform command line interface (CLI).
- Deploy using the sample stack in Oracle Cloud Infrastructure Resource
Manager:
- Click
If you aren't already signed in, enter the tenancy and user credentials.
- Review and accept the terms and conditions.
- Select the region where you want to deploy the stack.
- Follow the on-screen prompts and instructions to create the stack.
- After creating the stack, click Terraform Actions, and select Plan.
- Wait for the job to be completed, and review the plan.
To make any changes, return to the Stack Details page, click Edit Stack, and make the required changes. Then, run the Plan action again.
- If no further changes are necessary, return to the Stack Details page, click Terraform Actions, and select Apply.
- Click
- Deploy using the Terraform code in GitHub using the Terraform CLI:
- Go to GitHub.
- Clone or download the repository to your local computer.
- Follow the instructions in the
README
document.
Change Log
This log lists significant changes:
December 6, 2021 |
|