1 Big Data Integration with Oracle Data Integrator

This chapter provides an overview of Big Data integration using Oracle Data Integrator. It also provides a compatibility matrix of the supported Big Data technologies.

This chapter includes the following sections:

1.1 Overview of Hadoop Data Integration

Apache Hadoop is designed to handle and process data that is typically from data sources that are non-relational and data volumes that are beyond what is handled by relational databases.

Oracle Data Integrator can be used to design the 'what' of an integration flow and assign knowledge modules to define the 'how' of the flow in an extensible range of mechanisms. The 'how' is whether it is Oracle, Teradata, Hive, Spark, Pig, etc.

Employing familiar and easy-to-use tools and pre-configured knowledge modules (KMs), Oracle Data Integrator lets you to do the following:

1.2 Big Data Knowledge Modules Matrix

Depending on the source and target technologies, you can use the KMs shown in the following table in your integration projects. You can also use a combination of these KMs. For example, to read data from SQL into Spark, you can load the data first in HDFS using LKM SQL to File Direct, and then use LKM File to Spark to continue.

The following table shows the Big Data KMs that Oracle Data Integrator provides to integrate data between different source and target technologies.

Table 1-1 Big Data Knowledge Modules

Source Target Knowledge Module

OS File

HDFS File

-

Hive

LKM File to Hive LOAD DATA Direct

HBase

-

Pig

LKM File to Pig

Spark

LKM File to Spark

Generic SQL

HDFS File

LKM SQL to File SQOOP Direct

Hive

LKM SQL to Hive SQOOP

HBase

LKM SQL to HBase SQOOP Direct

Pig

-

Spark

-

HDFS File

OS File

-

Generic SQL

LKM File to SQL SQOOP

Oracle SQL

LKM File to Oracle OLH-OSCH Direct

HDFS File

-

Hive

LKM File to Hive LOAD DATA Direct

HBase

-

Pig

LKM File to Pig

Spark

LKM File to Spark

Hive

OS File

LKM Hive to File Direct

Generic SQL

LKM Hive to SQL SQOOP

Oracle SQL

LKM Hive to Oracle OLH-OSCH Direct

HDFS File

LKM Hive to File Direct

Hive

IKM Hive Append

HBase

LKM Hive to HBase Incremental Update HBASE-SERDE Direct

Pig

LKM Hive to Pig

Spark

LKM Hive to Spark

HBase

OS File

-

Generic SQL

LKM HBase to SQL SQOOP

Oracle SQL

-

HDFS File

-

Hive

LKM HBase to Hive HBASE-SERDE

HBase

-

Pig

LKM HBase to Pig

Spark

-

Pig

OS File

LKM Pig to File

Generic SQL

LKM SQL to Pig SQOOP

Oracle SQL

-

HDFS File

LKM Pig to File

Hive

LKM Pig to Hive

HBase

LKM Pig to HBase

Pig

-

Spark

-

Spark

OS File

LKM Spark to File

Generic SQL

-

Oracle SQL

-

HDFS File

LKM Spark to File

Hive

LKM Spark to Hive

HBase

-

Pig

-

Spark

-