1 Big Data Integration with Oracle Data Integrator

This chapter provides an overview of Big Data integration using Oracle Data Integrator. It also provides a compatibility matrix of the supported Big Data technologies.

This chapter includes the following sections:

Overview of Hadoop Data Integration

Oracle Data Integrator combined with Hadoop, can be used to design the integration flow to process huge data from non-relational data sources.

Apache Hadoop is designed to handle and process data that is typically from data sources that are non-relational and data volumes that are beyond what is handled by relational databases.

You can use Oracle Data Integrator to design the 'what' of an integration flow and assign knowledge modules to define the 'how' of the flow in an extensible range of mechanisms. The 'how' is whether it is Oracle, Teradata, Hive, Spark, Pig, etc.

Employing familiar and easy-to-use tools and preconfigured knowledge modules (KMs), Oracle Data Integrator lets you to do the following:

Big Data Knowledge Modules Matrix

Big Data Knowledge Modules Matrix depicts the Big Data Loading and Integration KMs that are provided by Oracle Data Integrator.

Depending on the source and target technologies, you can use the KMs shown in the following table in your integration projects. You can also use a combination of these KMs. For example, to read data from SQL into Spark, you can load the data from SQL into Spark first using LKM SQL to Spark, and then use LKM Spark to HDFS to continue.

The Big Data knowledge modules that start with LKM File for example, LKM File to SQL SQOOP support both OS File and HDFS File, as described in this matrix. We provide additional KMs, starting with LKM HDFS to Spark, LKM HDFS File to Hive. These support HDFS files only, unlike the other KMs, however, they have additional capabilities, for example, Complex Data can be described in an HDFS data store and used in a mapping using the flatten component.

The following table shows the Big Data Loading and Integration KMs that Oracle Data Integrator provides to integrate data between different source and target technologies.

The following table shows the Big Data Reverse Engineering KMs provided by ODI.

Table 1-2 Big Data Reverse-Engineering Knowledge Modules

Technology Knowledge Module

HBase

RKM HBase

Hive

RKM Hive

Cassandra

RKM Cassandra