Enterprise Data Warehousing - an HCM Data Enrichment Example

Human resource data is often distributed in multiple systems across the enterprise and can't easily be integrated and analyzed to produce actionable insights.

Enrich enterprise application data from PeopleSoft Human Capital Management with raw data and event data from other sources to produce actionable and predictive insights.

This reference architecture positions the technology solution within the overall business context:



As departments consolidate data from multiple sources into data marts to gain targeted insights, the enterprise data warehouse must change and adapt to be able to leverage available data marts and other structured and unstructured sources. This is especially true for human resource information as the enterprise itself changes.

Data warehouses separate analysis workload from transaction workload and enable an organization to consolidate data from several sources to facilitate querying and analyzing historical data. Combining warehoused data with streaming and transaction data is essential for machine learning and predictive analysis.

At a conceptual level, the technology solution addresses the problem as follows:



Architecture

This architecture collects and combines application data and streaming event data for analysis and machine learning to provide actionable insights.



The architecture focuses on the following logical divisions:

  • Data refinery

    Ingests and refines the data for use in each of the data layers in the architecture. The shape is intended to illustrate the differences in processing costs for storing and refining data at each level and for moving data between them.

  • Data persistence platform (curated information layer)

    Facilitates access and navigation of the data to show the current business view. For relational technologies, data may be logical or physically structured in simple relational, longitudinal, dimensional or OLAP forms. For non-relational data, this layer contains one or more pools of data, either output from an analytical process or data optimized for a specific analytical task.

  • Access and interpretation

    Abstracts the logical business view of the data for the consumers. This abstraction facilitates agile approaches to development, migration to the target architecture, and the provision of a single reporting layer from multiple federated sources.

The architecture has the following components:

  • Data integration

    Oracle Data Integrator provides comprehensive data integration from high-volume and high-performance batch loads, to event-driven, trickle-feed integration processes, to SOA-enabled data services. A declarative design approach means faster and simpler development and maintenance and a unique approach to extract-load transform (E-LT) guarantees the highest level of performance possible for the execution of data transformation and validation processes.

  • Data streaming

    Oracle Cloud Infrastructure Streaming service provides a fully managed, scalable, and durable storage solution for ingesting continuous, high-volume streams of data that you can consume and process in real time. Streaming can be used for messaging, high-volume application logs, operational telemetry, web click-stream data, or other publish-subscribe messaging model use cases in which data is produced and processed continually and sequentially.

  • Kafka Connect

    Kafka Connect is a scalable and reliable tool for streaming data between Apache Kafka and other systems. Kafka Connect is an open source framework for connecting Kafka and services such as Oracle Cloud Infrastructure streaming service with external sources.

  • Autonomous data warehouse

    Oracle Autonomous Data Warehouse is a fully managed, preconfigured database environment. You do not need to configure or manage any hardware, or install any software. After provisioning, you can scale the number of CPU cores or the storage capacity of the database at any time without impacting availability or performance.

  • Object storage

    Oracle Cloud Infrastructure Object Storage is an internet-scale, high-performance storage platform that offers reliable and cost-efficient data durability. Oracle Cloud Infrastructure Object Storage can store an unlimited amount of unstructured data of any content type, including analytic data. You can safely and securely store or retrieve data directly from the internet or from within the cloud platform. Multiple management interfaces let you easily start small and scale seamlessly, without experiencing any degradation in performance or service reliability.

  • Analytics

    Oracle Analytics Cloud is a scalable and secure public cloud service that provides a full set of capabilities to explore and perform collaborative analytics for you, your workgroup, and your enterprise.

    With Oracle Analytics Cloud you also get flexible service management capabilities, including fast setup, easy scaling and patching, and automated lifecycle management.

  • Machine learning

    Oracle Machine Learning provides powerful new machine learning capabilities tightly integrated in Oracle Autonomous Database, with new support for Python. Upcoming integration with Oracle Cloud Infrastructure Data Science will enable data scientists to develop models using both open source and scalable in-database algorithms. Uniquely, bringing algorithms to the data in Oracle Database speeds time to results by reducing data preparation and movement.

  • Data science

    Data Science provides infrastructure, open source technologies, libraries, and packages, and data science tools for data science teams to build, train, and manage machine learning (ML) models in Oracle Cloud Infrastructure. The collaborative and project-driven workspace provides an end-to-end cohesive user experience and supports the lifecycle of predictive models.

Recommendations

Use the following recommendations as a starting point to collect and combine application data and streaming event data for analysis and machine learning.

Your requirements might differ from the architecture described here.

  • Oracle Autonomous Data Warehouse

    This architecture uses Oracle Autonomous Data Warehouse on shared infrastructure. Enable auto scaling to give the database workloads up to three times the processing power.

    Consider using Oracle Autonomous Data Warehouse on dedicated infrastructure if you want the self-service database capability within a private database cloud environment running on the public cloud.

Considerations

When collecting and combining application data and streaming event data for analysis and machine learning, consider the following implementation options.

Guidance Data Refinery Data Persistence Platform Access & Interpretation
Recommended
  • Oracle Infrastructure Cloud Data Integration
  • Oracle Cloud Infrastructure Streaming service
  • Kafka Connect
  • Oracle Cloud Infrastructure Object Storage
  • Oracle Autonomous Data Warehouse
  • Oracle Analytics Cloud
  • Oracle Infrastructure Cloud Data Science
  • Oracle Machine Learning
Other Options Oracle Data Integrator Oracle Database Exadata Cloud Service Third-party tools
Rationale

Oracle Data Integrator provides two specialized knowledge modules (KMs) for transforming PeopleSoft data structures in the form of datastores.

Oracle Cloud Infrastructure Streaming service is a fully managed service.

Apache Kafka VM Image for enterprise Kafka.

Oracle Cloud Infrastructure Object Storage stores unlimited data in raw format.

Oracle Autonomous Data Warehouse is an easy-to- use, fully autonomous database that scales elastically, delivers fast query performance and requires no database administration. It also offers direct access to the data from object storage via external tables.

Oracle Analytics Cloud is a fully managed and tightly integrated with the Curated Data Layer (Oracle Autonomous Data Warehouse).

Data Science is a fully managed, self-service platform for data science teams to build, train, and manage machine learning (ML) models in Oracle Cloud Infrastructure. The Data Science service provides infrastructure and data science tools

Deploy

The Terraform code for this reference architecture is available on GitHub.

  1. Go to GitHub.
  2. Follow the instructions in the README document.

Explore More

Learn more about the features of this architecture.

Best practices framework for Oracle Cloud Infrastructure

Change Log

This log lists only the significant changes: