Learn About Loading the Data Warehouse for Business Analytics

If you want to leverage your bulk, transactional, and transformed data for business analytics, then you’ll want to learn how to use Oracle Cloud or Oracle Cloud at Customer data integration methods to load the data warehouse and then connect business analytics to the data warehouse for visualization and business intelligence.

Data warehouses are an important tool for enterprises to manage their most important business data as a source for business intelligence. The data in relational databases is highly structured and must often be transformed into the desired structure before it’s loaded into the data warehouse.

Data integration is the process of extracting the source data, and then transferring and transforming that data for the data warehouse. There are two common methods:
  • Extract, load, transform (E-LT) loads data into the data warehouse and then transforms it there.

  • Extract, transform, load (ETL) transfers the data to an intermediate service for transformation before it’s loaded into the data warehouse.

The choice of a method depends on the amount of transformation that needs to be performed, which affects the processing time and computing resources required. E-LT is the more direct method, but there may be circumstances where it’s cost-effective to off-load the transformation processing from the data warehouse platform to a data lake.

Description of data-warehouse-process-flow.png follows
Description of the illustration data-warehouse-process-flow.png
This E-LT method requires the following components:
  • Data integration: Processes and transforms heterogeneous data sources in real time or in batches while ensuring data quality

  • Data warehouse: Provides scalable, dynamic data warehousing to support modern analysis and data management

  • Analytics tools: Provide data storytelling with visualization, what-if analysis and collaboration from your desktop or mobile device

The ETL method requires the following additional components:
  • Storage: Provides scalable data storage as part of a data lake for historical and raw bulk data

  • Big data management: Executes Apache Spark workloads to process data at scale

You can implement this solution either on Oracle Cloud or on Oracle Cloud at Customer. The Oracle Cloud architecture describes the integration of multiple Oracle Cloud services. The Oracle Cloud at Customer architecture uses the Oracle Cloud at Customer infrastructure to enable the cloud services that run in the customer’s data center. You usually decide whether to deploy a solution on Oracle Cloud or on Oracle Cloud at Customer during the purchasing process prior to implementation. The same basic pattern for loading a data warehouse for business analytics applies to both cloud and cloud at customer architectures. However, the components and some integration points differ.

Architecture for Loading the Data Warehouse on Oracle Cloud

Within an Oracle Cloud environment, data moves from one or more sources through the extract, load, and transform (E-LT) and extract, transform, and load (ETL) data integration processes into the data warehouse from which Oracle Analytics Cloud then accesses the data to provide business intelligence and data visualization.

When the data source is a database in the customer’s environment, the Oracle Data Integration Platform Cloud remote agent is used to capture database transactions in real time and replicate those transactions in the data warehouse with minimal transformation. The Oracle Data Integration Platform Cloud remote agent resides inside the customer’s firewall near the source data and it’s responsible for capturing the source records and transferring them to the cloud.

When dealing with bulk data sources, such as extracts from business applications, Oracle Data Integration Platform Cloud captures the data directly without using the remote agent. Oracle Data Integration Platform Cloud can then either load the data into the data warehouse and orchestrate its transformation (the E-LT method), or it can deliver the data to object storage where it can be transformed by using Apache Spark in Oracle Big Data Cloud (the ETL method).

After the data is transformed in Oracle Big Data Cloud, it’s written back to object storage. Object storage is shared between Oracle Big Data Cloud and Oracle Autonomous Data Warehouse which then allows Oracle Autonomous Data Warehouse to ingest the transformed data.


Description of data-warehouse-public-cloud.png follows
Description of the illustration data-warehouse-public-cloud.png

About Required Services for Loading the Data Warehouse on Oracle Cloud

You should be familiar with provisioning and using the recommended services and technologies. In an Oracle Cloud environment, loading the data warehouse for business analytics uses the following services and features:
  • Oracle Autonomous Data Warehouse (provisioned in Oracle Cloud Infrastructure)

  • Oracle Data Integration Platform Cloud (provisioned in Oracle Cloud Infrastructure Classic)

    • Standard Edition for bulk data movement use cases

    • Enterprise Edition for bulk and real-time data replication use cases

    • Governance Edition for bulk, real-time and data quality use cases

  • Oracle Data Integration Platform Cloud Remote Agent (downloadable component)

  • Oracle Cloud Infrastructure Object Storage (provisioned in Oracle Cloud Infrastructure)

  • Oracle Big Data Cloud - for the Oracle Cloud off-load process (provisioned in Oracle Cloud Infrastructure)

  • Oracle Analytics Cloud

See Learn how to get Oracle Cloud services for Oracle Solutions to get the cloud services you need.

In addition, it uses the following technologies:

  • Virtual network computing (VNC) server and client: Graphical desktop sharing system to manage configuration files on the Oracle Data Integration Platform Cloud host instance. There are a number of freely available applications. See Connecting to ODI Studio with VNC Server.

  • Java Development Kit (JDK): To use Oracle Data Integration Platform Cloud remote agents for real-time data replication using Oracle GoldenGate, you need to install the JDK on the host before you configure the remote agent. Oracle Data Integration Platform Cloud is certified for Java 1.8 and later.

  • Oracle SQL Developer: To connect to Oracle Autonomous Data Warehouse using an Oracle Wallet, you need to download and install Oracle SQL Developer. Oracle SQL Developer is a free integrated development environment that simplifies the development and management of Oracle Database in both traditional and cloud deployments.

    SQL Developer can connect to Oracle Autonomous Data Warehouse using an Oracle Wallet and contains enhancements for key Oracle Autonomous Data Warehouse features. Oracle SQL Developer provides support for wallet files using the Cloud Wallet Connection Type (SQL Developer 18.2 shows this as Cloud PDB). Oracle recommends that you use version 18.2 (or later); however, earlier versions of SQL Developer will work with Oracle Autonomous Data Warehouse.

About Roles and Privileges

To integrate the services needed to load and analyze data, you need certain user roles and privileges. The steps for creating the roles and privileges are provided in later sections.

  • To create a wallet with client access credentials for Oracle Autonomous Data Warehouse, you must have an administrator account for Oracle Autonomous Data Warehouse.

  • To accept data from Oracle Data Integration Platform Cloud, you must create a user in Oracle Autonomous Data Warehouse with the dwrole role.

  • You must unlock the predefined user ggadmin in Oracle GoldenGate to connect to Oracle Autonomous Data Warehouse.

  • To replicate data from Oracle GoldenGate to Oracle Autonomous Data Warehouse, you must create a user in Oracle Autonomous Data Warehouse.

Considerations for Loading the Data Warehouse for Business Analytics

To help ensure successful service integration and data analysis, keep the following considerations in mind:

  • Information about provisioning and using the services referenced here is not provided in this document. You should have some familiarity with the services and technologies and their documentation.

  • The Oracle Autonomous Data Warehouse DBMS_CLOUD package currently supports only delimited files for upload.