This chapter provides an overview of Big Data integration using Oracle Data Integrator. It also provides a compatibility matrix of the supported Big Data technologies.
This chapter includes the following sections:
Apache Hadoop is designed to handle and process data that is typically from data sources that are non-relational and data volumes that are beyond what is handled by relational databases.
Oracle Data Integrator can be used to design the 'what' of an integration flow and assign knowledge modules to define the 'how' of the flow in an extensible range of mechanisms. The 'how' is whether it is Oracle, Teradata, Hive, Spark, Pig, etc.
Employing familiar and easy-to-use tools and pre-configured knowledge modules (KMs), Oracle Data Integrator lets you to do the following:
Load data into Hadoop directly from Files or SQL databases.
For more information, see Section 4.1, "Integrating Hadoop Data".
Validate and transform data within Hadoop with the ability to make the data available in various forms such as Hive, HBase, or HDFS.
For more information, see Section 4.15, "Validating and Transforming Data Within Hive".
Load the processed data from Hadoop into Oracle database, SQL database, or Files.
For more information, see Section 4.1, "Integrating Hadoop Data".
Execute integration projects as Oozie workflows on Hadoop.
For more information, see Section 5.1, "Executing Oozie Workflows with Oracle Data Integrator".
Audit Oozie workflow execution logs from within Oracle Data Integrator.
For more information, see Section 5.5, "Auditing Hadoop Logs".
Generate code in different languages for Hadoop, such as HiveQL, Pig Latin, or Spark Python.
For more information, see Section 6.8, "Generating Code in Different Languages"
Depending on the source and target technologies, you can use the KMs shown in the following table in your integration projects. You can also use a combination of these KMs. For example, to read data from SQL into Spark, you can load the data first in HDFS using LKM SQL to File Direct
, and then use LKM File to Spark
to continue.
The following table shows the Big Data KMs that Oracle Data Integrator provides to integrate data between different source and target technologies.
Table 1-1 Big Data Knowledge Modules
Source | Target | Knowledge Module |
---|---|---|
OS File |
HDFS File |
- |
Hive |
||
HBase |
- |
|
Pig |
||
Spark |
||
Generic SQL |
HDFS File |
|
Hive |
||
HBase |
||
Pig |
- |
|
Spark |
- |
|
HDFS File |
OS File |
- |
Generic SQL |
||
Oracle SQL |
||
HDFS File |
- |
|
Hive |
||
HBase |
- |
|
Pig |
||
Spark |
||
Hive |
OS File |
|
Generic SQL |
||
Oracle SQL |
||
HDFS File |
||
Hive |
||
HBase |
||
Pig |
||
Spark |
||
HBase |
OS File |
- |
Generic SQL |
||
Oracle SQL |
- |
|
HDFS File |
- |
|
Hive |
||
HBase |
- |
|
Pig |
||
Spark |
- |
|
Pig |
OS File |
|
Generic SQL |
||
Oracle SQL |
- |
|
HDFS File |
||
Hive |
||
HBase |
||
Pig |
- |
|
Spark |
- |
|
Spark |
OS File |
|
Generic SQL |
- |
|
Oracle SQL |
- |
|
HDFS File |
||
Hive |
||
HBase |
- |
|
Pig |
- |
|
Spark |
- |