This chapter includes the following sections:
The Big Data Configurations wizard provides a single entry point to set up multiple Hadoop technologies. You can quickly create data servers, physical schema, logical schema, and set a context for different Hadoop technologies such as Hadoop, HBase, Oozie, Spark, Hive, Pig, etc.
The default metadata for different distributions, such as properties, host names, port numbers, etc., and default values for environment variables are pre-populated for you. This helps you to easily create the data servers along with the physical and logical schema, without having in-depth knowledge about these technologies.
After all the technologies are configured, you can validate the settings against the data servers to test the connection status.
Note:
If you do not want to use the Big Data Configurations wizard, you can set up the data servers for the Big Data technologies manually using the information mentioned in the subsequent sections.
To run the Big Data Configurations Wizard:
The following table describes the options that you need to set on the General Settings panel of the Big Data Configurations wizard.
Table 3-1 General Settings Options
Option | Description |
---|---|
Prefix |
Specify a prefix. This prefix is attached to the data server name, logical schema name, and physical schema name. |
Distribution |
Select a distribution, either Manual or CDH <version>. |
Base Directory |
Specify the base directory. This base directory is automatically populated in all other panels of the wizard. Note: This option appears only if the distribution is other than Manual. |
Distribution Type |
Select a distribution type, either Normal or Kerberized. |
Technologies |
Select the technologies that you want to configure. Note: Data server creation panels only for the selected technologies are displayed. |
The following table describes the options that you must specify to create a HDFS data server.
Note:
Only the fields required or specific for defining a HDFS data server are described.Table 3-2 HDFS Data Server Definition
Option | Description |
---|---|
Name |
Type a name for the data server. This name appears in Oracle Data Integrator. |
User/Password |
User name with its password. |
Hadoop Data Server |
Hadoop data server that you want to associate with the HDFS data server. |
Additional Classpath |
Specify additional classpaths. |
The following table describes the options that you must specify to create an HBase data server.
Note: Only the fields required or specific for defining a HBase data server are described.
Table 3-3 HBase Data Server Definition
Option | Description |
---|---|
Name |
Type a name for the data server. This name appears in Oracle Data Integrator. |
HBase Quorum |
Quorum of the HBase installation. For example, |
User/Password |
User name with its password. |
Hadoop Data Server |
Hadoop data server that you want to associate with the HBase data server. |
Additional Classpath |
By default, the following classpaths are added:
Specify the additional classpaths, if required. |
The following table describes the options that you must specify to create a Kafka data server.
Note:
Only the fields required or specific for defining a Kafka data server are described.Table 3-4 Kafka Data Server Definition
Option | Description |
---|---|
Name |
Type a name for the data server. This name appears in Oracle Data Integrator. |
User/Password |
User name with its password. |
Hadoop Data Server |
Hadoop data server that you want to associate with the Kafka data server. |
Additional Classpath |
The following additional classpaths are added by default:
If required, you can add more additional classpaths. Note: This field appears only when you are creating the Kafka Data Server using the Big Data Configuration wizard. |
The following table describes the Kafka data server properties that you need to add on the Properties tab when creating a new Kafka data server.
Table 3-5 Kafka Data Server Properties
Key | Value |
---|---|
metadata.broker.list |
There are two values, PLAINTTEXT or SASL_PLAINTTEXT. SASL_PLAINTTEXT is used for Kerberized Kafka server. Default value is PLAINTTEXT. |
oracle.odi.prefer.dataserver.packages |
Retrieves the topic and message from Kafka server. The address is oracle.odi. |
To create and initialize the Hadoop data server:
The following table describes the fields that you must specify on the Definition tab when creating a new Hadoop data server.
Note: Only the fields required or specific for defining a Hadoop data server are described.
Table 3-6 Hadoop Data Server Definition
Field | Description |
---|---|
Name |
Name of the data server that appears in Oracle Data Integrator. |
Data Server |
Physical name of the data server. |
User/Password |
Hadoop user with its password. If password is not provided, only simple authentication is performed using the username on HDFS and Oozie. |
Authentication Method |
Select one of the following authentication methods:
|
HDFS Node Name URI |
URI of the HDFS node name.
|
Resource Manager/Job Tracker URI |
URI of the resource manager or the job tracker.
|
ODI HDFS Root |
Path of the ODI HDFS root directory.
|
Additional Class Path |
Specify additional classpaths. Add the following additional classpaths:
|
The following table describes the properties that you can configure in the Properties tab when defining a new Hadoop data server.
Note: These properties can be inherited by other Hadoop technologies, such as Hive or HDFS. To inherit these properties, you must select the configured Hadoop data server when creating data server for other Hadoop technologies.
Table 3-7 Hadoop Data Server Properties Mandatory for Hadoop and Hive
Property | Description/Value |
---|---|
HADOOP_HOME |
Location of Hadoop dir. For example, |
HADOOP_CONF |
Location of Hadoop configuration files such as core-default.xml, core-site.xml, and hdfs-site.xml. For example, |
HIVE_HOME |
Location of Hive dir. For example, |
HIVE_CONF |
Location of Hive configuration files such as hive-site.xml. For example, |
HADOOP_CLASSPATH |
|
HADOOP_CLIENT_OPTS |
|
ODI_ADDITIONAL_CLASSPATH |
|
HIVE_SESSION_JARS |
|
Table 3-8 Hadoop Data Server Properties Mandatory for HBase (In addition to base Hadoop and Hive Properties)
Property | Decription/Value |
---|---|
HBASE_HOME |
Location of HBase dir. For example, |
HADOOP_CLASSPATH |
|
ODI_ADDITIONAL_CLASSPATH |
|
HIVE_SESSION_JARS |
$HBASE_HOME/hbase.jar:$HBASE_HOME/lib/hbase-sep-api-*.jar:$HBASE_HOME/lib/hbase-sep-impl-*hbase*.jar:/$HBASE_HOME/lib/hbase-sep-impl-common-*.jar:/$HBASE_HOME/lib/hbase-sep-tools-*.jar:$HIVE_HOME/lib/hive-hbase-handler-*.jar
Note: Follow the steps for Hadoop Security models, such as Apache Sentry, to allow the Hive ADD JAR call used inside ODI Hive KMs:
|
Table 3-9 Hadoop Data Server Properties Mandatory for Oracle Loader for Hadoop (In addition to base Hadoop and Hive properties)
Property | Description/Value |
---|---|
OLH_HOME |
Location of OLH installation. For example, |
OLH_FILES |
|
ODCH_HOME |
Location of OSCH installation. For example, |
HADOOP_CLASSPATH |
In order to work with OLH, the Hadoop jars in the |
OLH_JARS |
Comma-separated list of all JAR files required for custom input formats, Hive, Hive SerDes, and so forth, used by Oracle Loader for Hadoop. All filenames have to be expanded without wildcards. For example:
|
OLH_SHAREDLIBS |
|
ODI_ADDITIONAL_CLASSPATH |
|
Table 3-10 Hadoop Data Server Properties Mandatory for SQOOP (In addition to base Hadoop and Hive properties)
Property | Description/Value |
---|---|
SQOOP_HOME |
Location of Sqoop dir. For example, |
SQOOP_LIBJARS |
Location of the SQOOP library jars. For example, |
Create a Hadoop physical schema using the standard procedure, as described in Creating a Physical Schema in Administering Oracle Data Integrator.
Create for this physical schema a logical schema using the standard procedure, as described in Creating a Logical Schema in Administering Oracle Data Integrator and associate it in a given context.
You must configure the Oracle Data Integrator agent to execute Hadoop jobs.
To configure the Oracle Data Integrator agent:
If you want to use Oracle Loader for Hadoop, you must install and configure Oracle Loader for Hadoop on your Oracle Data Integrator agent computer.
To install and configure Oracle Loader for Hadoop:
To run the Oracle Data Integrator agent on a Hadoop cluster that is protected by Kerberos authentication, you must configure a Kerberos-secured cluster.
To use a Kerberos-secured cluster:
For executing Hadoop jobs on the local agent of an Oracle Data Integrator Studio installation, follow the configuration steps in the Configuring the Oracle Data Integrator Agent to Execute Hadoop Jobs with the following change: Copy JAR files into the Oracle Data Integrator userlib
directory.
For example:
Linux: $USER_HOME/.odi/oracledi/userlib
directory.
Windows: C:\Users\<USERNAME>\AppData\Roaming\odi\oracledi\userlib
directory