Learn About Loading the Data Warehouse for Business Analytics

If you want to leverage your bulk, transactional, and transformed data for business analytics, then you’ll want to learn how to use Oracle Cloud or Oracle Cloud at Customer data integration methods to load the data warehouse and then connect business analytics to the data warehouse for visualization and business intelligence.

Data warehouses are an important tool for enterprises to manage their most important business data as a source for business intelligence. The data in relational databases is highly structured and must often be transformed into the desired structure before it’s loaded into the data warehouse.

Data integration is the process of extracting the source data, and then transferring and transforming that data for the data warehouse. There are two common methods:
  • Extract, load, transform (E-LT) loads data into the data warehouse and then transforms it there.

  • Extract, transform, load (ETL) transfers the data to an intermediate service for transformation before it’s loaded into the data warehouse.

The choice of a method depends on the amount of transformation that needs to be performed, which affects the processing time and computing resources required. E-LT is the more direct method, but there may be circumstances where it’s cost-effective to off-load the transformation processing from the data warehouse platform to a data lake.

Description of data-warehouse-process-flow.png follows
Description of the illustration data-warehouse-process-flow.png
This E-LT method requires the following components:
  • Data integration: Processes and transforms heterogeneous data sources in real time or in batches while ensuring data quality

  • Data warehouse: Provides scalable, dynamic data warehousing to support modern analysis and data management

  • Analytics tools: Provide data storytelling with visualization, what-if analysis and collaboration from your desktop or mobile device

The ETL method requires the following additional components:
  • Storage: Provides scalable data storage as part of a data lake for historical and raw bulk data

  • Big data management: Executes Apache Spark workloads to process data at scale

You can implement this solution either on Oracle Cloud or on Oracle Cloud at Customer. The Oracle Cloud architecture describes the integration of multiple Oracle Cloud services. The Oracle Cloud at Customer architecture uses the Oracle Cloud at Customer infrastructure to enable the cloud services that run in the customer’s data center. You usually decide whether to deploy a solution on Oracle Cloud or on Oracle Cloud at Customer during the purchasing process prior to implementation. The same basic pattern for loading a data warehouse for business analytics applies to both cloud and cloud at customer architectures. However, the components and some integration points differ.

Architecture for Loading the Data Warehouse on Oracle Cloud at Customer

Within an Oracle Cloud at Customer environment, data moves from one or more sources through the extract, load, and transform (E-LT) and extract, transform, and load (ETL) data integration processes into the data warehouse from which Oracle Analytics Cloud then accesses the data to provide business intelligence and data visualization.

The Oracle Database Exadata Cloud at Customer platform is the host for the data warehouse. ETL offload is performed by using the Cloudera distribution of Apache Hadoop and other components in Oracle Big Data Cloud at Customer. Oracle Data Integration Platform Cloud and Oracle Analytics Cloud run on the Oracle Cloud at Customer hardware and all components reside in the customer’s data center.

When the data source is a database, Oracle Data Integration Platform Cloud captures database transactions in real time and replicates those records in the data warehouse with minimal transformation. When dealing with bulk data sources, such as extracts from business applications, Oracle Data Integration Platform Cloud extracts the data directly from the source systems. Oracle Data Integration Platform Cloud then loads the data into the data warehouse and uses the E-LT method to orchestrate the transformation.

To off-load the data using the ETL method, Oracle Data Integrator, a component of Oracle Data Integration Platform Cloud that resides on Oracle Big Data Cloud at Customer, extracts bulk data sources and stores them in the Hadoop Distributed File System (HDFS). Transformation processing is orchestrated by Oracle Data Integrator in Apache Spark on Oracle Big Data Cloud at Customer. Oracle Data Integrator then loads the transformed data into the data warehouse by generating code for and orchestrating technologies such as Spark, SQOOP, or the Oracle Loader for Hadoop (OLH).


Description of data-warehouse-cloud-customer.png follows
Description of the illustration data-warehouse-cloud-customer.png

About Required Services for Loading the Data Warehouse on Oracle Cloud at Customer

You should be familiar with provisioning and using the recommended services and technologies. In an Oracle Cloud at Customer environment, loading the data warehouse for business analytics uses the following services and technologies:
  • Oracle Database Exadata Cloud at Customer 

  • Oracle Data Integration Platform Cloud  

    • Standard Edition for bulk data movement use cases

    • Enterprise Edition for bulk and real-time data replication use cases

    • Governance Edition for bulk, real-time and data quality use cases

  • Oracle Big Data Cloud at Customer (for the extract, transform, load (ETL) off-load process)

See Learn how to get Oracle Cloud services for Oracle Solutions to get the cloud services you need.

About Roles and Privileges

You don’t need any special users or roles to load the data warehouse for business analytics on Oracle Cloud at Customer.

Considerations for Loading Data Warehouse for Business Analytics

To help ensure successful service integration and data analysis, keep the following considerations in mind:

Information about provisioning and using the services referenced here is not provided in this document. You should have some familiarity with the services and technologies and their documentation.