Enterprise Data Warehousing - a Predictive Maintenance Example

You have access to streaming data in real time, so how can you apply advanced analytics and data science capabilities to understand the context for an actionable event, gain insight, and create a response?

Use Oracle Cloud Infrastructure Object Storage and Oracle Cloud Infrastructure Data Flow to process streaming event and log data for predictive analysis and machine learning.

This reference architecture positions the technology solution within the overall business context:



At a conceptual level, the technology solution addresses the problem as follows:



Architecture

This architecture uses data science and machine learning to analyze streaming and log data to provide context and insight for actionable events.

The following diagram illustrates this reference architecture.



The architecture focuses on the following logical divisions:

  • Data refinery

    Ingests and refines the data for use in each of the data layers in the architecture. The shape is intended to illustrate the differences in processing costs for storing and refining data at each level and for moving data between them.

  • Data persistence platform (curated information layer)

    Facilitates access and navigation of the data to show the current business view. For relational technologies, data may be logical or physically structured in simple relational, longitudinal, dimensional or OLAP forms. For non-relational data, this layer contains one or more pools of data, either output from an analytical process or data optimized for a specific analytical task.

  • Access and interpretation

    Abstracts the logical business view of the data for the consumers. This abstraction facilitates agile approaches to development, migration to the target architecture, and the provision of a single reporting layer from multiple federated sources.

The architecture has the following components:

  • Data streaming

    Oracle Cloud Infrastructure Streaming service provides a fully managed, scalable, and durable storage solution for ingesting continuous, high-volume streams of data that you can consume and process in real time. Streaming can be used for messaging, high-volume application logs, operational telemetry, web click-stream data, or other publish-subscribe messaging model use cases in which data is produced and processed continually and sequentially.

  • Kafka Connect

    Kafka Connect is a scalable and reliable tool for streaming data between Apache Kafka and other systems. Kafka Connect is an open source framework for connecting Kafka and services such as Oracle Cloud Infrastructure Streaming service with external sources.

  • Data flow

    Oracle Cloud Infrastructure Data Flow is a fully managed big data service that lets you run Apache Spark applications with no infrastructure to deploy or manage. It lets you deliver big data and AI applications faster because you can focus on your applications without getting distracted by operations. Data flow applications are reusable templates consisting of a Spark application, its dependencies, default parameters, and a default run-time resource specification.

  • Object storage

    Oracle Cloud Infrastructure Object Storage is an internet-scale, high-performance storage platform that offers reliable and cost-efficient data durability. Oracle Cloud Infrastructure Object Storage can store an unlimited amount of unstructured data of any content type, including analytic data. You can safely and securely store or retrieve data directly from the internet or from within the cloud platform. Multiple management interfaces let you easily start small and scale seamlessly, without experiencing any degradation in performance or service reliability.

  • Data science

    Data Science provides infrastructure, open source technologies, libraries, and packages, and data science tools for data science teams to build, train, and manage machine learning (ML) models in Oracle Cloud Infrastructure. The collaborative and project-driven workspace provides an end-to-end cohesive user experience and supports the lifecycle of predictive models.

Considerations

When processing streaming data and a broad range of enterprise data resources for business analysis and machine learning, consider these implementation options.

Guidance Data Refinery Data Persistence Platform Access & Interpretation
Recommended
  • Oracle Cloud Infrastructure Streaming service
  • Kafka Connect
Oracle Cloud Infrastructure Object Storage Oracle Infrastructure Cloud Data Science
Other Options Oracle Infrastructure Cloud Data Integration Oracle Database Exadata Cloud Service Third-party tools
Rationale

Oracle Cloud Infrastructure Streaming service is a fully managed service.

Use Apache Kafka VM Image for enterprise Kafka.

Oracle Cloud Infrastructure Object Storage stores unlimited data in raw format.

Data Science is a fully managed, self-service platform for data science teams to build, train, and manage machine learning (ML) models in Oracle Cloud Infrastructure. The Data Science service provides infrastructure and data science tools.

Deploy

The Terraform code for this reference architecture is available on GitHub.

  1. Go to GitHub.
  2. Follow the instructions in the README document.

Explore More

Learn more about the features of this architecture.

Best practices framework for Oracle Cloud Infrastructure