A Hive Knowledge Modules

This appendix provides information about the Hive knowledge modules.

This chapter includes the following sections:

A.1 LKM SQL to Hive SQOOP

This KM integrates data from a JDBC data source into Hive.

Create a Hive staging table.
Create a SQOOP configuration file, which contains the upstream query.
Execute SQOOP to extract the source data and import into Hive
Drop the Hive staging table.

This is a direct load LKM and will ignore any of the target IKM.

The following table descriptions the options for LKM SQL to Hive SQOOP.

Table A-1 LKM SQL to Hive SQOOP

Option	Description
DELETE_TEMPORARY_OBJECTS	Delete temporary objects at end of mapping. Set this option to NO, if you wish to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. Default: true.
SQOOP_PARALLELISM	Number of SQOOP parallel mappers Specifies the degree of parallelism. More precisely the number of mappers. Number of mapper processes used for extraction. When SQOOP_PARALLELISM > 1, SPLIT_BY must be defined.
SPLIT_BY	Target column name for splitting the source data. Specifies the unqualified target column name to be used for splitting the source data into n chunks for parallel extraction, where n is SQOOP_PARALLELISM. To achieve equally sized data chunks the split column should contain homogeneously distributed values. For calculating the data chunk boundaries a query similar to SELECT MIN(EMPNO), MAX(EMPNO) from EMPLOYEE EMP is used. To avoid an extra full table scan the split column should be backed by an index.
BOUNDARY_QUERY	Query to retrieve min/max value for calculating data chunks using SPLIT_BY column. For splitting the source data into chunks for parallel extraction the minimum and maximum value of the split column is retrieved (KM option SPLIT-BY). In certain situations this may not be the best boundaries or not the most performant way to retrieve the boundaries. In such cases this KM option can be set to a SQL query returning one row with two columns, lowest value and highest value to be used for split-column. This range will be divided into SQOOP_PARALLELISM chunks for parallel extraction. Example for hard-coded ranges for an Oracle source: SELECT 1000, 2000 FROM DUAL For preserving context independence regular table names should be inserted through odiRef.getObjectName calls. For example: SELECT MIN(EMPNO), MAX(EMPNO) FROM <%=odiRef.getObjectName(EMP")%>"
TEMP_DIR	Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)".
MAPRED_OUTPUT_BASE_DIR	MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data.
USE_GENERIC_JDBC_CONNECTOR	Use SQOOP's generic JDBC connector? For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector may provide a solution.
EXTRA_HADOOP_CONF_PROPERTIES	Optional generic Hadoop properties. Extra optional properties for SQOOP file: section Hadoop properties.
EXTRA_SQOOP_CONF_PROPERTIES	Optional SQOOP properties. Extra optional properties for SQOOP file: section SQOOP properties.
EXTRA_SQOOP_CONNECTOR_CONF_PROPERTIES	Optional SQOOP connector properties. Extra optional properties for SQOOP file: section SQOOP connector properties.

A.2 LKM SQL to File SQOOP Direct

This KM extracts data from a JDBC data source into an HDFS file

It executes the following steps:

Create a SQOOP configuration file, which contains the upstream query.
Execute SQOOP to extract the source data and store it as an HDFS file

This is a direct load LKM and must be used without any IKM.

Note:

The entire target directory will be removed prior to extraction.

The following table descriptions the options for LKM SQL to File SQOOP Direct.

Table A-2 LKM SQL to File SQOOP Direct

Option	Description
DELETE_TEMPORARY_OBJECTS	Delete temporary objects at end of mapping. Set this option to NO, if you wish to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. Default: true.
SQOOP_PARALLELISM	Number of SQOOP parallel mappers Specifies the degree of parallelism. More precisely the number of mappers. Number of mapper processes used for extraction. When SQOOP_PARALLELISM > 1, SPLIT_BY must be defined.
SPLIT_BY	Target column name for splitting the source data. Specifies the unqualified target column name to be used for splitting the source data into n chunks for parallel extraction, where n is SQOOP_PARALLELISM. To achieve equally sized data chunks the split column should contain homogeneously distributed values. For calculating the data chunk boundaries a query similar to SELECT MIN(EMPNO), MAX(EMPNO) from EMPLOYEE EMP is used. To avoid an extra full table scan the split column should be backed by an index.
BOUNDARY_QUERY	Query to retrieve min/max value for calculating data chunks using SPLIT_BY column. For splitting the source data into chunks for parallel extraction the minimum and maximum value of the split column is retrieved (KM option SPLIT-BY). In certain situations this may not be the best boundaries or not the most performant way to retrieve the boundaries. In such cases this KM option can be set to a SQL query returning one row with two columns, lowest value and highest value to be used for split-column. This range will be divided into SQOOP_PARALLELISM chunks for parallel extraction. Example for hard-coded ranges for an Oracle source: SELECT 1000, 2000 FROM DUAL For preserving context independence regular table names should be inserted through odiRef.getObjectName calls. For example: SELECT MIN(EMPNO), MAX(EMPNO) FROM <%=odiRef.getObjectName(EMP")%>"
TEMP_DIR	Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)".
MAPRED_OUTPUT_BASE_DIR	MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data.
USE_GENERIC_JDBC_CONNECTOR	Use SQOOP's generic JDBC connector? For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector may provide a solution.
EXTRA_HADOOP_CONF_PROPERTIES	Optional generic Hadoop properties. Extra optional properties for SQOOP file: section Hadoop properties.
EXTRA_SQOOP_CONF_PROPERTIES	Optional SQOOP properties. Extra optional properties for SQOOP file: section SQOOP properties.
EXTRA_SQOOP_CONNECTOR_CONF_PROPERTIES	Optional SQOOP connector properties. Extra optional properties for SQOOP file: section SQOOP connector properties.

A.3 LKM SQL to HBase SQOOP Direct

This KM extacts data from a JDBC data source and imports the data into HBase.

It executes the following steps:

Create a SQOOP configuration file, which contains the upstream query.
Execute SQOOP to extract the source data and import into HBase.

This is a direct load LKM and must be used without any IKM.

The following table descriptions the options for LKM SQL to HBase SQOOP Direct.

Table A-3 LKM SQL to HBase SQOOP Direct

Option	Description
CREATE_TARG_TABLE	Create target table? Check this option, if you wish to create the target table.
TRUNCATE	Replace existing target data? Set this option to true, if you wish to replace any existing target table content with the new data.
DELETE_TEMPORARY_OBJECTS	Delete temporary objects at end of mapping. Set this option to NO, if you wish to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. Default: true.
SQOOP_PARALLELISM	Number of SQOOP parallel mappers Specifies the degree of parallelism. More precisely the number of mappers. Number of mapper processes used for extraction. When SQOOP_PARALLELISM > 1, SPLIT_BY must be defined.
SPLIT_BY	Target column name for splitting the source data. Specifies the unqualified target column name to be used for splitting the source data into n chunks for parallel extraction, where n is SQOOP_PARALLELISM. To achieve equally sized data chunks the split column should contain homogeneously distributed values. For calculating the data chunk boundaries a query similar to SELECT MIN(EMPNO), MAX(EMPNO) from EMPLOYEE EMP is used. To avoid an extra full table scan the split column should be backed by an index.
BOUNDARY_QUERY	Query to retrieve min/max value for calculating data chunks using SPLIT_BY column. For splitting the source data into chunks for parallel extraction the minimum and maximum value of the split column is retrieved (KM option SPLIT-BY). In certain situations this may not be the best boundaries or not the most performant way to retrieve the boundaries. In such cases this KM option can be set to a SQL query returning one row with two columns, lowest value and highest value to be used for split-column. This range will be divided into SQOOP_PARALLELISM chunks for parallel extraction. Example for hard-coded ranges for an Oracle source: SELECT 1000, 2000 FROM DUAL For preserving context independence regular table names should be inserted through odiRef.getObjectName calls. For example: SELECT MIN(EMPNO), MAX(EMPNO) FROM <%=odiRef.getObjectName(EMP")%>"
TEMP_DIR	Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)".
MAPRED_OUTPUT_BASE_DIR	MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data.
USE_GENERIC_JDBC_CONNECTOR	Use SQOOP's generic JDBC connector? For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector may provide a solution.
EXTRA_HADOOP_CONF_PROPERTIES	Optional generic Hadoop properties. Extra optional properties for SQOOP file: section Hadoop properties.
EXTRA_SQOOP_CONF_PROPERTIES	Optional SQOOP properties. Extra optional properties for SQOOP file: section SQOOP properties.
EXTRA_SQOOP_CONNECTOR_CONF_PROPERTIES	Optional SQOOP connector properties. Extra optional properties for SQOOP file: section SQOOP connector properties.

A.4 LKM File to SQL SQOOP

This KM integrates data from HDFS files into a JDBC target.

It executes the following steps:

Create a SQOOP configuration file
Load data using SQOOP into a work table on RDBMS
Drop the work table.

The following table descriptions the options for LKM File to SQL SQOOP.

Table A-4 LKM File to SQL SQOOP

Option	Description
SQOOP_PARALLELISM	Number of SQOOP parallel mappers. Specifies the degree of parallelism. More precisely the number of mappers. Number of mapper processes used for extraction. When SQOOP_PARALLELISM > 1, SPLIT_BY must be defined.
WORK_TABLE_OPTIONS	Work table options. Use this option if you wish to override standard technology specific work table options. When left blank, these options values are used. Oracle: NOLOGGING DB2 UDB: NOT LOGGED INITIALLY Teradata: no fallback, no before journal, no after journal
TERADATA_WORK_TABLE_TYPE	Teradata work table type. Use SET or MULTISET table for work table.
TERADATA_OUTPUT_METHOD	Teradata Load Method. Specifies the way the Teradata Connector will load the data. Valid values are: batch.insert: multiple JDBC connections using batched prepared statements (simplest to start with) multiple.fastload: multiple FastLoad connections internal.fastload: single coordinated FastLoad connections (most performant) Please see Cloudera's Teradata Connectors User Guide for more details.
TEMP_DIR	Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)".
MAPRED_OUTPUT_BASE_DIR	MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data.
USE_GENERIC_JDBC_CONNECTOR	Use SQOOP's generic JDBC connector? For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector may provide a solution.
EXTRA_HADOOP_CONF_PROPERTIES	Optional generic Hadoop properties. Extra optional properties for SQOOP file: section Hadoop properties.
EXTRA_SQOOP_CONF_PROPERTIES	Optional SQOOP properties. Extra optional properties for SQOOP file: section SQOOP properties.
EXTRA_SQOOP_CONNECTOR_CONF_PROPERTIES	Optional SQOOP connector properties. Extra optional properties for SQOOP file: section SQOOP connector properties.

A.5 LKM Hive to SQL SQOOP

This KM integrates data from Hive into a JDBC target.

It executes the following steps:

Unload data into HDFS
Create a SQOOP configuration file
Load data using SQOOP into a work table on RDBMS
Drop the work table

The following table descriptions the options for LKM Hive to SQL SQOOP.

Table A-5 LKM Hive to SQL SQOOP

Option	Description
DELETE_TEMPORARY_OBJECTS	Delete temporary objects at end of mapping. Set this option to NO, if you wish to retain temporary objects (tables, files and scripts) after integration. Useful for debugging.
SQOOP_PARALLELISM	Number of SQOOP parallel mappers. Specifies the degree of parallelism. More precisely the number of mappers. Number of mapper processes used for extraction. When SQOOP_PARALLELISM > 1, SPLIT_BY must be defined.
WORK_TABLE_OPTIONS	Work table options. Use this option if you wish to override standard technology specific work table options. When left blank, these options values are used. Oracle: NOLOGGING DB2 UDB: NOT LOGGED INITIALLY Teradata: no fallback, no before journal, no after journal
TERADATA_WORK_TABLE_TYPE	Teradata work table type. Use SET or MULTISET table for work table.
TERADATA_OUTPUT_METHOD	Teradata Load Method. Specifies the way the Teradata Connector will load the data. Valid values are: batch.insert: multiple JDBC connections using batched prepared statements (simplest to start with) multiple.fastload: multiple FastLoad connections internal.fastload: single coordinated FastLoad connections (most performant) Please see Cloudera's Teradata Connectors User Guide for more details.
TEMP_DIR	Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)".
MAPRED_OUTPUT_BASE_DIR	MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data.
USE_GENERIC_JDBC_CONNECTOR	Use SQOOP's generic JDBC connector? For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector may provide a solution.
EXTRA_HADOOP_CONF_PROPERTIES	Optional generic Hadoop properties. Extra optional properties for SQOOP file: section Hadoop properties.
EXTRA_SQOOP_CONF_PROPERTIES	Optional SQOOP properties. Extra optional properties for SQOOP file: section SQOOP properties.
EXTRA_SQOOP_CONNECTOR_CONF_PROPERTIES	Optional SQOOP connector properties. Extra optional properties for SQOOP file: section SQOOP connector properties.

A.6 LKM HBase to SQL SQOOP

This KM integrates data from HBase into a JDBC target.

It executes the following steps:

Create a SQOOP configuration file
Create a Hive table definition for the HBase table
Unload data from Hive (HBase) using SQOOP into a work table on RDBMS
Drop the work table.

The following table descriptions the options for LKM HBase to SQL SQOOP.

Table A-6 LKM HBase to SQL SQOOP

Option	Description
DELETE_TEMPORARY_OBJECTS	Delete temporary objects at end of mapping. Set this option to NO, if you wish to retain temporary objects (tables, files and scripts) after integration. Useful for debugging. Default: true.
HIVE_STAGING_LSCHEMA	Logical schema name for Hive-HBase-SerDe table. The unloading from HBase data is done via Hive. This KM option defines the Hive database, which will be used for creating the Hive HBase-SerDe table for unloading the HBase data.
SQOOP_PARALLELISM	Number of SQOOP parallel mappers. Specifies the degree of parallelism. More precisely the number of mappers. Number of mapper processes used for extraction. When SQOOP_PARALLELISM > 1, SPLIT_BY must be defined.
WORK_TABLE_OPTIONS	Work table options. Use this option if you wish to override standard technology specific work table options. When left blank, these options values are used. Oracle: NOLOGGING DB2 UDB: NOT LOGGED INITIALLY Teradata: no fallback, no before journal, no after journal
TERADATA_WORK_TABLE_TYPE	Teradata work table type. Use SET or MULTISET table for work table.
TERADATA_OUTPUT_METHOD	Teradata Load Method. Specifies the way the Teradata Connector will load the data. Valid values are: batch.insert: multiple JDBC connections using batched prepared statements (simplest to start with) multiple.fastload: multiple FastLoad connections internal.fastload: single coordinated FastLoad connections (most performant) Please see Cloudera's Teradata Connectors User Guide for more details.
TEMP_DIR	Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)".
MAPRED_OUTPUT_BASE_DIR	MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data.
USE_GENERIC_JDBC_CONNECTOR	Use SQOOP's generic JDBC connector? For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector may provide a solution.
EXTRA_HADOOP_CONF_PROPERTIES	Optional generic Hadoop properties. Extra optional properties for SQOOP file: section Hadoop properties.
EXTRA_SQOOP_CONF_PROPERTIES	Optional SQOOP properties. Extra optional properties for SQOOP file: section SQOOP properties.
EXTRA_SQOOP_CONNECTOR_CONF_PROPERTIES	Optional SQOOP connector properties. Extra optional properties for SQOOP file: section SQOOP connector properties.

A.7 LKM HDFS File to Hive

This KM will load data only from HDFS file into Hive. The file can be in the format of JSON, Avro, Parquet, Delimited with complex data.

Table A-7 LKM HDFS File to Hive

Option	Description
STOP_ON_FILE_NOT_FOUND	This checkbox option defines whether the KM should stop, if no input file is found.
OVERRIDE_ROW_FORMAT	This option allows to override the entire Hive row format definition of the staging table or the target table.
DELETE_TEMPORARY_OBJECTS	Set this option to No, if you want to retain the temporary objects (tables, files and scripts) post integration.

A.8 LKM HDFS File to Hive (Direct)

This KM will load data only from HDFS file into Hive Data Direct directly into hive target table, bypassing the staging table for better performance.

Table A-8 LKM HDFS to Hive (Direct)

Option	Description
STOP_ON_FILE_NOT_FOUND	This checkbox option defines whether the KM should stop, if no input file is found.
OVERRIDE_ROW_FORMAT	This option allows to override the entire Hive row format definition of the staging table or the target table.
DELETE_TEMPORARY_OBJECTS	Set this option to No, if you want to retain the temporary objects (tables, files and scripts) post integration.
CREATE_TARG_TABLE	Create target table? Check this option, if you wish to create the target table.
TRUNCATE	Replace existing target data? Set this option to true, if you wish to replace any existing target table content with the new data.

A.9 IKM Hive Append

This KM integrates data into a Hive target table in append or replace (truncate) mode.

The following table descriptions the options for IKM Hive Append.

Table A-9 IKM Hive Append

Option

Description

CREATE_TARG_TABLE

Create target table.

Check this option if you wish to create the target table.

TRUNCATE

Replace all target table data.

Set this option to true, if you wish to replace the target table content with the new data.

Note:

If there is a column containing a Complex Type in the target Hive table, this must not be left unmapped. Hive does not allow setting null values to complex columns.

A.10 IKM Hive Incremental Update

This IKM integrates data incrementally into a Hive target table. The KM should be assigned on Hive target node.

Target data store integration type needs to be defined as Incremental Update in order to get this KM on the list of available KMs for assignment.

Table A-10 IKM Hive Incremental Update

Option

Description

CREATE_TARG_TABLE

Create target table.

Check this option if you wish to create the target table.

TRUNCATE

Replace all target table data.

Set this option to true, if you wish to replace the target table content with the new data.

A.11 LKM File to Hive LOAD DATA

Integration from a flat file staging area to Hive using Hive's LOAD DATA command.

This KM executes the following steps:

Create a flow table in Hive
Declare data files to Hive (LOAD DATA command)
Load data from Hive staging table into target table

The KM can handle filename wildcards (*, ?).">

The following table describes the options for LKM File to Hive LOAD DATA.

Table A-11 LKM File to Hive LOAD DATA

Option	Description
DELETE_TEMPORARY_OBJECTS	Delete temporary objects at end of mapping. Set this option to NO, if you wish to retain temporary objects (tables, files and scripts) after integration. Useful for debugging.
EXTERNAL_TABLE	Preserve file in original location? Defines whether to declare the target/staging table as externally managed. Default: false For non-external tables Hive manages all data files. That is, it will move any data files into <hive.metastore.warehouse.dir>/<table_name>. For external tables Hive does not move or delete any files. It will load data from the location given by the ODI schema. If EXTERNAL_TABLE is set to true: All files in the directory given by the physical data schema will be loaded. So any filename or wildcard information from the source DataStore's resource name will be ignored. The directory structure and file names must comply with Hives directory organization for tables, e.g. for partitioning and clustering. The directory and its files must reside in HDFS. No Hive LOAD-DATA-statements are submitted and thus loading of files to a specific partition (using a target-side expression) is not possible.
FILE_IS_LOCAL	Is this a local file? Defines whether the source file is to be considered local (= outside of the current Hadoop cluster). Default: true If FILE_IS_LOCAL is set to true, the data file(s) are copied into the Hadoop cluster first. If FILE_IS_LOCAL is set to false, the data file(s) are moved into the Hadoop cluster and therefore will no longer be available at their source location. If the source file is already in HDFS, FILE_IS_LOCAL=false results in just a file rename and therefore very fast operation. This option only applies, if EXTERNAL_TABLE is set to false.
STOP_ON_FILE_NOT_FOUND	Stop if no input file was found? This checkbox option defines whether the KM should stop, if no input file has been found.
OVERRIDE_ROW_FORMAT	Custom row format clause. This option allows to override the entire Hive row format definition of the staging table (in case USE_STAGE_TABLE is set to true) or the target table (in case USE_STAGE_TABLE is set to false). It contains the text to be used for row format definition. Example for reading Apache Combined WebLog files: ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' <EOL>WITH SERDEPROPERTIES ( <EOL> input.regex" = "([^ ]) ([^ ]) ([^ ]) (-\|\\[[^\\]]\\]) ([^ \"]\|\"[^\"]\") (-\|[0-9]) (-\|[0-9]) (\".?\") (\".?\") (\".*?\")"

A.12 LKM File to Hive LOAD DATA Direct

Direct integration from a flat file into Hive without any staging using Hive's LOAD DATA command.

This is a direct load LKM and must be used without any IKM.

The KM can handle filename wildcards (*, ?).

The following table describes the options for LKM File to Hive LOAD DATA Direct.

Table A-12 LKM File to Hive LOAD DATA Direct

Option	Description
CREATE_TARG_TABLE	Create target table. Check this option if you wish to create the target table.
TRUNCATE	Replace all target table data. Set this option to true, if you wish to replace the target table content with the new data.
DELETE_TEMPORARY_OBJECTS	Delete temporary objects at end of mapping. Set this option to NO, if you wish to retain temporary objects (tables, files and scripts) after integration. Useful for debugging.
EXTERNAL_TABLE	Preserve file in original location? Defines whether to declare the target/staging table as externally managed. Default: false For non-external tables Hive manages all data files. That is, it will move any data files into <hive.metastore.warehouse.dir>/<table_name>. For external tables Hive does not move or delete any files. It will load data from the location given by the ODI schema. If EXTERNAL_TABLE is set to true: All files in the directory given by the physical data schema will be loaded. So any filename or wildcard information from the source DataStore's resource name will be ignored. The directory structure and file names must comply with Hives directory organization for tables, e.g. for partitioning and clustering. The directory and its files must reside in HDFS. No Hive LOAD-DATA-statements are submitted and thus loading of files to a specific partition (using a target-side expression) is not possible.
FILE_IS_LOCAL	Is this a local file? Defines whether the source file is to be considered local (= outside of the current Hadoop cluster). Default: true If FILE_IS_LOCAL is set to true, the data file(s) are copied into the Hadoop cluster first. If FILE_IS_LOCAL is set to false, the data file(s) are moved into the Hadoop cluster and therefore will no longer be available at their source location. If the source file is already in HDFS, FILE_IS_LOCAL=false results in just a file rename and therefore very fast operation. This option only applies, if EXTERNAL_TABLE is set to false.
STOP_ON_FILE_NOT_FOUND	Stop if no input file was found? This checkbox option defines whether the KM should stop, if no input file has been found.
OVERRIDE_ROW_FORMAT	Custom row format clause. This option allows to override the entire Hive row format definition of the staging table (in case USE_STAGE_TABLE is set to true) or the target table (in case USE_STAGE_TABLE is set to false). It contains the text to be used for row format definition. Example for reading Apache Combined WebLog files: ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' <EOL>WITH SERDEPROPERTIES ( <EOL> input.regex" = "([^ ]) ([^ ]) ([^ ]) (-\|\\[[^\\]]\\]) ([^ \"]\|\"[^\"]\") (-\|[0-9]) (-\|[0-9]) (\".?\") (\".?\") (\".*?\")"

A.13 LKM HBase to Hive HBASE-SERDE

This LKM provides read access to a HBase table from the Hive.

This is achieved by defining a temporary load table definition on Hive which represents all relevant columns of the HBase source table.

A.14 LKM Hive to HBase Incremental Update HBASE-SERDE Direct

This LKM loads data from Hive into HBase and supports inserting new rows as well as updating existing data.

This is a direct load LKM and must be used without any IKM.

The following table describes the options for LKM Hive to HBase Incremental Update HBASE-SERDE Direct.

Table A-13 LKM Hive to HBase Incremental Update HBASE-SERDE Direct

Option	Description
CREATE_TARG_TABLE	Create target table. Check this option if you wish to create the target table.
TRUNCATE	Replace all target table data. Set this option to true, if you wish to replace the target table content with the new data.
HBASE_WAL	Disable Write-Ahead-Log. HBase uses a Write-Ahead-Log to protect against data loss. For better performance, WAL can be disabled. Please note that this setting applies to all Hive commands executed later in this session.
DELETE_TEMPORARY_OBJECTS	Delete temporary objects at end of mapping. Set this option to NO, if you wish to retain temporary objects (tables, files and scripts) after integration. Useful for debugging.

A.15 LKM Hive to File Direct

This LKM unloads data from Hive into flat files.

This is a direct load LKM and must be used without any IKM.

The following table describes the options for LKM Hive to File Direct.

Table A-14 LKM Hive to File Direct

Option	Description
FILE_IS_LOCAL	Is this a local file? Defines whether the target file is to be considered local (outside of the current Hadoop cluster).
STORED_AS	File format. Defines whether the target file is to be stored as plain text file (TEXTFILE) or compressed (SEQUENCEFILE).

Option

Description

FILE_IS_LOCAL

Is this a local file?

Defines whether the target file is to be considered local (outside of the current Hadoop cluster).

STORED_AS

File format.

Defines whether the target file is to be stored as plain text file (TEXTFILE) or compressed (SEQUENCEFILE).

A.16 XKM Hive Sort

This XKM sorts data using an expression.

The following table describes the options for XKM Hive Sort.

Table A-15 XKM Hive Sort

Option	Description
SORT_MODE	Select the mode the SORT operator will generate code for.

A.17 LKM File to Oracle OLH-OSCH

This KM integrates data from an HDFS file into an Oracle staging table using Oracle Loader for Hadoop (OLH) and/or Oracle SQL Connector for Hadoop (OSCH).

The KM can handle filename wildcards (*, ?).

The following table describes the options for LKM File to Oracle OLH-OSCH.

Table A-16 LKM Hive to Oracle OLH-OSCH

Option	Description
DELETE_TEMPORARY_OBJECTS	Delete temporary objects at end of mapping. Set this option to NO, if you wish to retain temporary objects (tables, files and scripts) after integration. Useful for debugging.
OLH_OUTPUT_MODE	How to transfer data into Oracle? This option specifies how to load the Hadoop data into Oracle. Permitted values are JDBC, OCI, DP_COPY\|DP_OSCH, and OSCH. JDBC output mode: The data is inserted using a number of direct insert JDBC connections. In very rare cases JDBC mode may result in duplicate records in target table due to Hadoop trying to restart tasks. OCI output mode: The data is inserted using a number of direct insert OCI connections in direct path mode. For direct loading (no C$ table), the target table must be partitioned. For standard loading, FLOW_TABLE_OPTIONS must explicitely specify partitioning: e.g. PARTITION BY HASH(COL1) PARTITIONS 4". In very rare cases OCI mode may result in duplicate records in target table due to Hadoop trying to restart tasks. DP_COPY output mode: OLH creates a number of DataPump export files. These files are transferred by a "Hadoop fs -copyToLocal" command to the local path specified by EXT_TAB_DIR_LOCATION. - Please note that the path must be accessible by the Oracle Database engine. Once the copy job is complete.
REJECT_LIMIT	Max number of errors for OLH/EXTTAB. Enter the maximum number of errors allowed in the file. Examples: UNLIMITED to except all errors. Integer value (10 to allow 10 rejections). This value is used in OLH job definitions as well as in external table definitions.
EXT_TAB_DIR_LOCATION	Directory for ext tab data files. File system path of the external table. Note: Only applicable, if OLH_OUTPUT_MODE = DP_* or OSCH For OLH_OUTPUT_MODE = DP_: this path must be accessible both from the ODI agent and from the target database engine. For OLH_OUTPUT_MODE = DP_: the name of the external directory object is the I$ table name. For OLH_OUTPUT_MODE = DP_COPY: ODI agent will use hadoop-fs command to copy dp files into this directory. For OLH_OUTPUT_MODE = DP_*\|OSCH: this path will contain any external table log/bad/dsc files. ODI agent will remove any files from this directory during clean up before launching OLH/OSCH.
WORK_TABLE_OPTIONS	Option for Flow table creation. Use this option to specify the attributes for the integration table at create time and used for increasing performance. This option is set by default to NOLOGGING. This option may be left empty.
OVERRIDE_INPUTFORMAT	Class name of InputFormat. By default the InputFormat class is derived from the source DataStore/Technology (DelimitedTextInputFormat or HiveToAvroInputFormat). This option allows the user to specify the class name of a custom InputFormat. Default: <empty>. Cannot be used with OLH_OUTPUT_MODE=OSCH. For example, for reading custom file formats like web log files the OLH RegexInputFormat can be used by assigning the value: oracle.hadoop.loader.lib.input.RegexInputFormat See KM option EXTRA_OLH_CONF_PROPERTIES for details on how to specify the regular expression.
EXTRA_OLH_CONF_PROPERTIES	Optional extra OLH properties. Allows adding extra parameters to OLH. E.g. for changing the default OLH date format: <property> <name>oracle.hadoop.loader.defaultDateFormat</name> <value>yyyy-MM-dd HH:mm:ss</value> </property> Particularly when using custom InputFormats (see KM option OVERRIDE_INPUTFORMAT for details) the InputFormat may require additional configuration parameters. These are provided in the OLH configuration file. This KM option allows adding extra properties to the OLH configuration file. Default: <empty> Cannot be used with OLH_OUTPUT_MODE=OSCH Example (loading apache weblog file format): When OLH RegexInputFormat is used for reading custom file formats, this KM option specified the regular expression and other parsing details: <property> <name>oracle.hadoop.loader.input.regexPattern</name> <value>([^ ]) ([^ ]) ([^ ]) (-\|\[[^\]]\]) ([^ \]\|\"[^\"]\") (-\|[0-9]) (-\|[0-9]) (\".?\") (\".?\") (\".*?\")</value> <description>RegEx for Apache WebLog format</description> </property>"
MAPRED_OUTPUT_BASE_DIR	MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data.
TEMP_DIR	Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout, and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)".

A.18 LKM File to Oracle OLH-OSCH Direct

This KM integrates data from an HDFS file into an Oracle target using Oracle Loader for Hadoop (OLH) and/or Oracle SQL Connector for Hadoop (OSCH)

The KM can handle filename wildcards (*, ?).

This is a direct load LKM (no staging) and must be used without any IKM.

The following table describes the options for LKM File to Oracle OLH-OSCH Direct.

Table A-17 LKM File to Oracle OLH-OSCH Direct

Option	Description
CREATE_TARG_TABLE	Create target table. Check this option if you wish to create the target table.
TRUNCATE	Replace all target table data. Set this option to true, if you wish to replace the target table content with the new data.
DELETE_ALL	Delete all rows. Set this option to true, if you wish to replace the target table content with the new data.
DELETE_TEMPORARY_OBJECTS	Delete temporary objects at end of mapping. Set this option to NO, if you wish to retain temporary objects (tables, files and scripts) after integration. Useful for debugging.
OLH_OUTPUT_MODE	How to transfer data into Oracle? This option specifies how to load the Hadoop data into Oracle. Permitted values are JDBC, OCI, DP_COPY\|DP_OSCH, and OSCH. JDBC output mode: The data is inserted using a number of direct insert JDBC connections. In very rare cases JDBC mode may result in duplicate records in target table due to Hadoop trying to restart tasks. OCI output mode: The data is inserted using a number of direct insert OCI connections in direct path mode. For direct loading (no C$ table), the target table must be partitioned. For standard loading, FLOW_TABLE_OPTIONS must explicitely specify partitioning: e.g. PARTITION BY HASH(COL1) PARTITIONS 4". In very rare cases OCI mode may result in duplicate records in target table due to Hadoop trying to restart tasks. DP_COPY output mode: OLH creates a number of DataPump export files. These files are transferred by a "Hadoop fs -copyToLocal" command to the local path specified by EXT_TAB_DIR_LOCATION. - Please note that the path must be accessible by the Oracle Database engine. Once the copy job is complete.
REJECT_LIMIT	Max number of errors for OLH/EXTTAB. Enter the maximum number of errors allowed in the file. Examples: UNLIMITED to except all errors. Integer value (10 to allow 10 rejections). This value is used in OLH job definitions as well as in external table definitions.
EXT_TAB_DIR_LOCATION	Directory for ext tab data files. File system path of the external table. Note: Only applicable, if OLH_OUTPUT_MODE = DP_* or OSCH For OLH_OUTPUT_MODE = DP_: this path must be accessible both from the ODI agent and from the target database engine. For OLH_OUTPUT_MODE = DP_: the name of the external directory object is the I$ table name. For OLH_OUTPUT_MODE = DP_COPY: ODI agent will use hadoop-fs command to copy dp files into this directory. For OLH_OUTPUT_MODE = DP_*\|OSCH: this path will contain any external table log/bad/dsc files. ODI agent will remove any files from this directory during clean up before launching OLH/OSCH.
WORK_TABLE_OPTIONS	Option for Flow table creation. Use this option to specify the attributes for the integration table at create time and used for increasing performance. This option is set by default to NOLOGGING. This option may be left empty.
OVERRIDE_INPUTFORMAT	Class name of InputFormat. By default the InputFormat class is derived from the source DataStore/Technology (DelimitedTextInputFormat or HiveToAvroInputFormat). This option allows the user to specify the class name of a custom InputFormat. Default: <empty>. Cannot be used with OLH_OUTPUT_MODE=OSCH. For example, for reading custom file formats like web log files the OLH RegexInputFormat can be used by assigning the value: oracle.hadoop.loader.lib.input.RegexInputFormat See KM option EXTRA_OLH_CONF_PROPERTIES for details on how to specify the regular expression.
EXTRA_OLH_CONF_PROPERTIES	Optional extra OLH properties. Allows adding extra parameters to OLH. E.g. for changing the default OLH date format: <property> <name>oracle.hadoop.loader.defaultDateFormat</name> <value>yyyy-MM-dd HH:mm:ss</value> </property> Particularly when using custom InputFormats (see KM option OVERRIDE_INPUTFORMAT for details) the InputFormat may require additional configuration parameters. These are provided in the OLH configuration file. This KM option allows adding extra properties to the OLH configuration file. Default: <empty> Cannot be used with OLH_OUTPUT_MODE=OSCH Example (loading apache weblog file format): When OLH RegexInputFormat is used for reading custom file formats, this KM option specified the regular expression and other parsing details: <property> <name>oracle.hadoop.loader.input.regexPattern</name> <value>([^ ]) ([^ ]) ([^ ]) (-\|\[[^\]]\]) ([^ \]\|\"[^\"]\") (-\|[0-9]) (-\|[0-9]) (\".?\") (\".?\") (\".*?\")</value> <description>RegEx for Apache WebLog format</description> </property>"
MAPRED_OUTPUT_BASE_DIR	MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data.
TEMP_DIR	Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout, and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)".

A.19 LKM Hive to Oracle OLH-OSCH

This KM integrates data from a Hive query into an Oracle staging table using Oracle Loader for Hadoop (OLH) and/or Oracle SQL Connector for Hadoop (OSCH).

The following table describes the options for LKM Hive to Oracle OLH-OSCH.

Table A-18 LKM Hive to Oracle OLH-OSCH

Option	Description
USE_HIVE_STAGING_TABLE	Use intermediate Hive staging table? By default the Hive source data is getting materialized in a Hive staging table prior to extraction by OLH. If USE_HIVE_STAGING_TABLE is set to false, OLH directly accesses the Hive source data. USE_HIVE_STAGING_TABLE=0 is only possible, if all these conditions are true. Only a single source table No transformations, filters, joins. No datasets USE_HIVE_STAGING_TABLE=0 provides better performance by avoiding an extra data transfer step.
DELETE_TEMPORARY_OBJECTS	Delete temporary objects at end of mapping. Set this option to NO, if you wish to retain temporary objects (tables, files and scripts) after integration. Useful for debugging.
OLH_OUTPUT_MODE	How to transfer data into Oracle? This option specifies how to load the Hadoop data into Oracle. Permitted values are JDBC, OCI, DP_COPY\|DP_OSCH, and OSCH. JDBC output mode: The data is inserted using a number of direct insert JDBC connections. In very rare cases JDBC mode may result in duplicate records in target table due to Hadoop trying to restart tasks. OCI output mode: The data is inserted using a number of direct insert OCI connections in direct path mode. For direct loading (no C$ table), the target table must be partitioned. For standard loading, FLOW_TABLE_OPTIONS must explicitely specify partitioning: e.g. PARTITION BY HASH(COL1) PARTITIONS 4". In very rare cases OCI mode may result in duplicate records in target table due to Hadoop trying to restart tasks. DP_COPY output mode: OLH creates a number of DataPump export files. These files are transferred by a "Hadoop fs -copyToLocal" command to the local path specified by EXT_TAB_DIR_LOCATION. - Please note that the path must be accessible by the Oracle Database engine. Once the copy job is complete.
REJECT_LIMIT	Max number of errors for OLH/EXTTAB. Enter the maximum number of errors allowed in the file. Examples: UNLIMITED to except all errors. Integer value (10 to allow 10 rejections). This value is used in OLH job definitions as well as in external table definitions.
EXT_TAB_DIR_LOCATION	Directory for ext tab data files. File system path of the external table. Note: Only applicable, if OLH_OUTPUT_MODE = DP_* or OSCH For OLH_OUTPUT_MODE = DP_: this path must be accessible both from the ODI agent and from the target database engine. For OLH_OUTPUT_MODE = DP_: the name of the external directory object is the I$ table name. For OLH_OUTPUT_MODE = DP_COPY: ODI agent will use hadoop-fs command to copy dp files into this directory. For OLH_OUTPUT_MODE = DP_*\|OSCH: this path will contain any external table log/bad/dsc files. ODI agent will remove any files from this directory during clean up before launching OLH/OSCH.
WORK_TABLE_OPTIONS	Option for Flow table creation. Use this option to specify the attributes for the integration table at create time and used for increasing performance. This option is set by default to NOLOGGING. This option may be left empty.
OVERRIDE_INPUTFORMAT	Class name of InputFormat. By default the InputFormat class is derived from the source DataStore/Technology (DelimitedTextInputFormat or HiveToAvroInputFormat). This option allows the user to specify the class name of a custom InputFormat. Default: <empty>. Cannot be used with OLH_OUTPUT_MODE=OSCH. For example, for reading custom file formats like web log files the OLH RegexInputFormat can be used by assigning the value: oracle.hadoop.loader.lib.input.RegexInputFormat See KM option EXTRA_OLH_CONF_PROPERTIES for details on how to specify the regular expression.
EXTRA_OLH_CONF_PROPERTIES	Optional extra OLH properties. Allows adding extra parameters to OLH. E.g. for changing the default OLH date format: <property> <name>oracle.hadoop.loader.defaultDateFormat</name> <value>yyyy-MM-dd HH:mm:ss</value> </property> Particularly when using custom InputFormats (see KM option OVERRIDE_INPUTFORMAT for details) the InputFormat may require additional configuration parameters. These are provided in the OLH configuration file. This KM option allows adding extra properties to the OLH configuration file. Default: <empty> Cannot be used with OLH_OUTPUT_MODE=OSCH Example (loading apache weblog file format): When OLH RegexInputFormat is used for reading custom file formats, this KM option specified the regular expression and other parsing details: <property> <name>oracle.hadoop.loader.input.regexPattern</name> <value>([^ ]) ([^ ]) ([^ ]) (-\|\[[^\]]\]) ([^ \]\|\"[^\"]\") (-\|[0-9]) (-\|[0-9]) (\".?\") (\".?\") (\".*?\")</value> <description>RegEx for Apache WebLog format</description> </property>"
MAPRED_OUTPUT_BASE_DIR	MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data.
TEMP_DIR	Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout, and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)".

A.20 LKM Hive to Oracle OLH-OSCH Direct

This KM integrates data from a Hive query into an Oracle target using Oracle Loader for Hadoop (OLH) and/or Oracle SQL Connector for Hadoop (OSCH)

This is a direct load LKM and must be used without any IKM.

The following table describes the options for LKM Hive to Oracle OLH-OSCH.

Table A-19 LKM Hive to Oracle OLH-OSCH

Option	Description
CREATE_TARG_TABLE	Create target table. Check this option if you wish to create the target table.
TRUNCATE	Replace all target table data. Set this option to true, if you wish to replace the target table content with the new data.
DELETE_ALL	Delete all rows. Set this option to true, if you wish to replace the target table content with the new data.
USE_HIVE_STAGING_TABLE	Use intermediate Hive staging table? By default the Hive source data is getting materialized in a Hive staging table prior to extraction by OLH. If USE_HIVE_STAGING_TABLE is set to false, OLH directly accesses the Hive source data. USE_HIVE_STAGING_TABLE=0 is only possible, if all these conditions are true. Only a single source table No transformations, filters, joins. No datasets USE_HIVE_STAGING_TABLE=0 provides better performance by avoiding an extra data transfer step.
DELETE_TEMPORARY_OBJECTS	Delete temporary objects at end of mapping. Set this option to NO, if you wish to retain temporary objects (tables, files and scripts) after integration. Useful for debugging.
OLH_OUTPUT_MODE	How to transfer data into Oracle? This option specifies how to load the Hadoop data into Oracle. Permitted values are JDBC, OCI, DP_COPY\|DP_OSCH, and OSCH. JDBC output mode: The data is inserted using a number of direct insert JDBC connections. In very rare cases JDBC mode may result in duplicate records in target table due to Hadoop trying to restart tasks. OCI output mode: The data is inserted using a number of direct insert OCI connections in direct path mode. For direct loading (no C$ table), the target table must be partitioned. For standard loading, FLOW_TABLE_OPTIONS must explicitely specify partitioning: e.g. PARTITION BY HASH(COL1) PARTITIONS 4". In very rare cases OCI mode may result in duplicate records in target table due to Hadoop trying to restart tasks. DP_COPY output mode: OLH creates a number of DataPump export files. These files are transferred by a "Hadoop fs -copyToLocal" command to the local path specified by EXT_TAB_DIR_LOCATION. - Please note that the path must be accessible by the Oracle Database engine. Once the copy job is complete.
REJECT_LIMIT	Max number of errors for OLH/EXTTAB. Enter the maximum number of errors allowed in the file. Examples: UNLIMITED to except all errors. Integer value (10 to allow 10 rejections). This value is used in OLH job definitions as well as in external table definitions.
EXT_TAB_DIR_LOCATION	Directory for ext tab data files. File system path of the external table. Note: Only applicable, if OLH_OUTPUT_MODE = DP_* or OSCH For OLH_OUTPUT_MODE = DP_: this path must be accessible both from the ODI agent and from the target database engine. For OLH_OUTPUT_MODE = DP_: the name of the external directory object is the I$ table name. For OLH_OUTPUT_MODE = DP_COPY: ODI agent will use hadoop-fs command to copy dp files into this directory. For OLH_OUTPUT_MODE = DP_*\|OSCH: this path will contain any external table log/bad/dsc files. ODI agent will remove any files from this directory during clean up before launching OLH/OSCH.
WORK_TABLE_OPTIONS	Option for Flow table creation. Use this option to specify the attributes for the integration table at create time and used for increasing performance. This option is set by default to NOLOGGING. This option may be left empty.
OVERRIDE_INPUTFORMAT	Class name of InputFormat. By default the InputFormat class is derived from the source DataStore/Technology (DelimitedTextInputFormat or HiveToAvroInputFormat). This option allows the user to specify the class name of a custom InputFormat. Default: <empty>. Cannot be used with OLH_OUTPUT_MODE=OSCH. For example, for reading custom file formats like web log files the OLH RegexInputFormat can be used by assigning the value: oracle.hadoop.loader.lib.input.RegexInputFormat See KM option EXTRA_OLH_CONF_PROPERTIES for details on how to specify the regular expression.
EXTRA_OLH_CONF_PROPERTIES	Optional extra OLH properties. Allows adding extra parameters to OLH. E.g. for changing the default OLH date format: <property> <name>oracle.hadoop.loader.defaultDateFormat</name> <value>yyyy-MM-dd HH:mm:ss</value> </property> Particularly when using custom InputFormats (see KM option OVERRIDE_INPUTFORMAT for details) the InputFormat may require additional configuration parameters. These are provided in the OLH configuration file. This KM option allows adding extra properties to the OLH configuration file. Default: <empty> Cannot be used with OLH_OUTPUT_MODE=OSCH Example (loading apache weblog file format): When OLH RegexInputFormat is used for reading custom file formats, this KM option specified the regular expression and other parsing details: <property> <name>oracle.hadoop.loader.input.regexPattern</name> <value>([^ ]) ([^ ]) ([^ ]) (-\|\[[^\]]\]) ([^ \]\|\"[^\"]\") (-\|[0-9]) (-\|[0-9]) (\".?\") (\".?\") (\".*?\")</value> <description>RegEx for Apache WebLog format</description> </property>"
MAPRED_OUTPUT_BASE_DIR	MapReduce Output Directory. This option specifies an hdfs directory, where SQOOP will create subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data.
TEMP_DIR	Local directory for temporary files. Directory used for storing temporary files like squoop script, stdout, and stderr redirects. Leave blank to use system's default temp dir (<?=System.getProperty(java.io.tmp")?>)".

A.21 RKM Hive

RKM Hive reverses these metadata elements:

Hive tables and views as data stores.

Specify the reverse mask in the Mask field, and then select the tables and views to reverse. The Mask field in the Reverse Engineer tab filters reverse-engineered objects based on their names. The Mask field cannot be empty and must contain at least the percent sign (%).
Hive columns as attributes with their data types.
Information about buckets, partitioning, clusters, and sort columns are set in the respective flex fields in the data store or column metadata.

A.22 RKM HBase

RKM HBase reverses these metadata elements:

HBase tables as data stores.

Specify the reverse mask in the Mask field, and then select the tables to reverse. The Mask field in the Reverse Engineer tab filters reverse-engineered objects based on their names. The Mask field cannot be empty and must contain at least the percent sign (%).
HBase columns as attributes with their data types.
HBase unique row key as attribute called key.

Note:

This RKM uses the oracle.odi.km logger for logging. You can enable logging by changing log level for oracle.odi.km logger to TRACE:16 in ODI-logging-config.xml as shown below:

<logger name="oracle.odi.km" level="TRACE:16" useParentHandlers="true"/>
<logger name="oracle.odi.studio.message.logger.proxy" level="TRACE:16" useParentHandlers="false"/>

For more information about logging configuration in ODI, please see Runtime Logging for ODI components in Administering Oracle Data Integrator.

The following table describes the options for RKM HBase.

Table A-20 RKM HBase Options

Option	Description
`SCAN_MAX_ROWS`	Specifies the maximum number of rows to be scanned during reversing of a table. The default value is `10000`.
`SCAN_START_ROW`	Specifies the key of the row to start the scan on. By default the scan will start on the first row. The row key is specified as a Java expressions returning an instance of `org.apache.hadoop.hbase.util.Bytes`. Example: `Bytes.toBytes(?EMP000001?)`.
`SCAN_STOP_ROW`	Specifies the key of the row to stop the scan on? By default the scan will run to the last row of the table or up to `SCAN_MAX_ROWS` is reached. The row key is specified as a Java expressions returning an instance of `org.apache.hadoop.hbase.util.Bytes`. Example: `Bytes.toBytes(?EMP000999?)`. Only applies if `SCAN_START_ROW` is specified.
`SCAN_ONLY_FAMILY`	Restricts the scan to column families, whose name match this pattern. SQL-LIKE wildcards percentage (`%)` and underscore (`_)` can be used. By default all column families are scanned.

A.23 IKM File to Hive (Deprecated)

Note: This KM is deprecated and only used for backward compatibility.

IKM File to Hive (Load Data) supports:

One or more input files. To load multiple source files, enter an asterisk or a question mark as a wildcard character in the resource name of the file DataStore (for example, webshop_*.log).
File formats:
- Fixed length
- Delimited
- Customized format
Loading options:
- Immediate or deferred loading
- Overwrite or append
- Hive external tables

The following table describes the options for IKM File to Hive (Load Data). See the knowledge module for additional details.

Table A-21 IKM File to Hive Options

Option	Description
`CREATE_TARG_TABLE`	Check this option, if you wish to create the target table. In case `USE_STAGING_TABLE` is set to `false`, please note that data will only be read correctly, if the target table definition, particularly the row format and file format details, are correct.
`TRUNCATE`	Set this option to true, if you wish to replace the target table/partition content with the new data. Otherwise the new data will be appended to the target table. If `TRUNCATE` and `USE_STAGING_TABLE` are set to `false`, all source file names must be unique and must not collide with any data files already loaded into the target table.
`FILE_IS_LOCAL`	Defines whether the source file is to be considered local (outside of the current Hadoop cluster). If this option is set to `true`, the data file(s) are copied into the Hadoop cluster first. The file has to be accessible by the Hive server through the local or shared file system. If this option is set to `false`, the data file(s) are moved into the Hadoop cluster and therefore will no longer be available at their source location. If the source file is already in HDFS, setting this option is set to `false` results in just a file rename, and therefore the operation is very fast. This option only applies, if `EXTERNAL_TABLE` is set to `false`.
`EXTERNAL_TABLE`	Defines whether to declare the target/staging table as externally managed. For non-external tables Hive manages all data files. That is, it will move any data files into `<hive.metastore.warehouse.dir>/<table_name>`. For external tables Hive does not move or delete any files. It will load data from the location given by the ODI schema. If this option is set to `true`: All files in the directory given by the physical data schema will be loaded. So any filename or wildcard information from the source DataStore's resource name will be ignored. The directory structure and file names must comply with Hives directory organization for tables, for example, for partitioning and clustering. The directory and its files must reside in HDFS. No Hive LOAD-DATA-statements are submitted and thus loading of files to a specific partition (using a target-side expression) is not possible.
`USE_STAGING_TABLE`	Defines whether an intermediate staging table will be created. A Hive staging table is required if: Target table is partitioned, but data spreads across partitions Target table is clustered Target table (partition) is sorted, but input file is not Target table is already defined and target table definition does not match the definition required by the KM Target column order does not match source file column order There are any unmapped source columns There are any unmapped non-partition target columns The source is a fixed length file and the target has non-string columns In case none of the above is `true`, this option can be turned off for better performance.
`DELETE_TEMPORARY_OBJECTS`	Removes temporary objects, such as tables, files, and scripts after integration. Set this option to `No` if you want to retain the temporary files, which might be useful for debugging.
`DEFER_TARGET_LOAD`	Defines whether the file(s), which have been declared to the staging table should be loaded into the target table now or during a later execution. Permitted values are `START`, `NEXT`, `END` or `<empty>`. This option only applies if `USE_STAGE_TABLE` is set to `true`. The typical use case for this option is when there are multiple files and each of them requires data redistribution/sorting and the files are gathered by calling the interface several times. For example, the interface is used in a package, which retrieves (many small) files from different locations and the location, stored in an variable, is to be used in a target partition column. In this case the first interface execution will have `DEFER_TARGET_LOAD` set to `START`, the next interface executions will have `DEFER_TARGET_LOAD` set to `NEXT` and set to `END` for the last interface. The interfaces having `DEFER_ TARGET _LOAD` set to `START/NEXT` will just load the data file into HDFS (but not yet into the target table) and can be executed in parallel to accelerate file upload to cluster.
`OVERRIDE_ROW_FORMAT`	Allows to override the entire Hive row format definition of the staging table (in case `USE_STAGE_TABLE` is set to `true`) or the target table (in case `USE_STAGE_TABLE` is set to `false`). It contains the text to be used for row format definition.Example for reading Apache Combined WebLog files: `ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^ ]) ([^ ]) ([^ ]) (-\|\\[[^\\]]\\]) ([^ \"]\|\"[^\"]\") (-\|[0-9]) (-\|[0-9]) (\".?\") (\".?\") (\".*?\")", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s" ) STORED AS TEXTFILE` The list of columns in the source DataStore must match the list of input groups in the regular expression (same number of columns and appropriate data types). If `USE_STAGE_TABLE` is set to `false`, the number of target columns must match the number of columns returned by the SerDe, in the above example, the number of groups in the regular expression. The number of source columns is ignored (At least one column must be mapped to the target.). All source data is mapped into the target table structure according to the column order, the SerDe's first column is mapped to the first target column, the SerDe's second column is mapped to the second target column, and so on. If `USE_STAGE_TABLE` is set to `true`, the source DataStore must have as many columns as the SerDe returns columns. Only data of mapped columns will be transferred.
`STOP_ON_FILE_NOT_FOUND`	Defines whether the KM should stop, if input file is not found.
`HIVE_COMPATIBILE`	Specifies the Hive version compatibility. The values permitted for this option are 0.7 and 0.8. 0.7: Simulates the append behavior. Must be used for Hive 0.7 (CDH3). 0.8: Uses Hive's append feature, which provides better performance. Requires Hive 0.8 (CDH4) or later.

A.24 LKM HBase to Hive (HBase-SerDe) [Deprecated]

Note: This KM is deprecated and only used for backward compatibility.

LKM HBase to Hive (HBase-SerDe) supports:

A single source HBase table.

The following table describes the options for LKM HBase to Hive (HBase-SerDe). See the knowledge module for additional details.

Table A-22 LKM HBase to Hive (HBase-SerDe) Options

Option	Description
`DELETE_TEMPORARY_OBJECTS`	Deletes temporary objects such as tables, files, and scripts post data integration. Set this option to `NO` if you want to retain the temporary objects, which might be useful for debugging.

A.25 IKM Hive to HBase Incremental Update (HBase-SerDe) [Deprecated]

Note: This KM is deprecated and only used for backward compatibility.

IKM Hive to HBase Incremental Update (HBase-SerDe) supports:

Filters, Joins, Datasets, Transformations and Aggregations in Hive
Inline views generated by IKM Hive Transform
Inline views generated by IKM Hive Control Append

The following table describes the options for IKM Hive to HBase Incremental Update (HBase-SerDe). See the knowledge module for additional details.

Table A-23 IKM Hive to HBase Incremental Update (HBase-SerDe) Options

Option	Description
`CREATE_TARG_TABLE`	Creates the HBase target table.
`TRUNCATE`	Replaces the target table content with the new data. If this option is set to `false`, the new data is appended to the target table.
`DELETE_TEMPORARY_OBJECTS`	Deletes temporary objects such as tables, files, and scripts post data integration. Set this option to `NO` if you want to retain the temporary objects, which might be useful for debugging.
`HBASE_WAL`	Enables or disables the Write-Ahead-Log (WAL) that HBase uses to protect against data loss. For better performance, WAL can be disabled.

A.26 IKM SQL to Hive-HBase-File (SQOOP) [Deprecated]

Note: This KM is deprecated and only used for backward compatibility.

IKM SQL to Hive-HBase-File (SQOOP) supports:

Mappings on staging
Joins on staging
Filter expressions on staging
Datasets
Lookups
Derived tables

The following table describes the options for IKM SQL to Hive-HBase-File (SQOOP). See the knowledge module for additional details.

Table A-24 IKM SQL to Hive-HBase-File (SQOOP) Options

Option	Description
`CREATE_TARG_TABLE`	Creates the target table. This option is applicable only if the target is Hive or HBase.
`TRUNCATE`	Replaces any existing target table content with the new data. For Hive and HBase targets, the target data is truncated. For File targets, the target directory is removed. For File targets, this option must be set to `true`.
`SQOOP_PARALLELISM`	Specifies the degree of parallelism. More precisely the number of mapper processes used for extraction. If `SQOOP_PARALLELISM` option is set to greater than 1, `SPLIT_BY` option must be defined.
`SPLIT_BY`	Specifies the target column to be used for splitting the source data into n chunks for parallel extraction, where n is `SQOOP_PARALLELISM`. To achieve equally sized data chunks the split column should contain homogeneously distributed values. For calculating the data chunk boundaries a query similar to `SELECT MIN(EMP.EMPNO), MAX(EMP.EMPNO) from EMPLOYEE EMP` is used. To avoid an extra full table scan the split column should be backed by an index.
`BOUNDARY_QUERY`	For splitting the source data into chunks for parallel extraction the minimum and maximum value of the split column is retrieved (KM option `SPLIT-BY`). In certain situations this may not be the best boundaries or not the most optimized way to retrieve the boundaries. In such cases this KM option can be set to a SQL query returning one row with two columns, lowest value and highest value to be used for split-column. This range will be divided into `SQOOP_PARALLELISM` chunks for parallel extraction. Example for hard-coded ranges for an Oracle source: `SELECT 1000, 2000 FROM DUAL` For preserving context independence, regular table names should be inserted through `odiRef.getObjectName` calls. For example: `SELECT MIN(EMPNO), MAX(EMPNO) FROM <%=odiRef.getObjectName("EMP")%>`
`TEMP_DIR`	Specifies the directory used for storing temporary files, such as sqoop script, stdout and stderr redirects. Leave this option blank to use system's default temp directory: `<?=System.getProperty("java.io.tmp")?>`
`MAPRED_OUTPUT_BASE_DIR`	Specifies an hdfs directory, where SQOOP creates subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data.
`DELETE_TEMPORARY_OBJECTS`	Deletes temporary objects such as tables, files, and scripts after data integration. Set this option to `NO` if you want to retain the temporary objects, which might be useful for debugging.
`USE_HIVE_STAGING_TABLE`	Loads data into the Hive work table before loading into the Hive target table. Set this option to `false` to load data directly into the target table. Setting this option to `false` is only possible, if all these conditions are true: All target columns are mapped Existing Hive table uses standard hive row separators (\n) and column delimiter (\01) Setting this option to `false` provides better performance by avoiding an extra data transfer step. This option is applicable only if the target technology is Hive.
`USE_GENERIC_JDBC_CONNECTOR`	Specifies whether to use the generic JDBC connector if a connector for the target technology is not available. For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector can be used.
`EXTRA_HADOOP_CONF_PROPERTIES`	Optional generic Hadoop properties.
`EXTRA_SQOOP_CONF_PROPERTIES`	Optional SQOOP properties.
`EXTRA_SQOOP_CONNECTOR_CONF_PROPERTIES`	Optional SQOOP connector properties.

A.27 IKM Hive Control Append (Deprecated)

Note: This KM is deprecated and only used for backward compatibility.

This knowledge module validates and controls the data, and integrates it into a Hive target table in truncate/insert (append) mode. Invalid data is isolated in an error table and can be recycled. IKM Hive Control Append supports inline view mappings that use either this knowledge module or IKM Hive Transform.

The following table describes the options for IKM Hive Control Append.

Table A-25 IKM Hive Control Append Options

Option	Description
`FLOW_CONTROL`	Activates flow control.
`RECYCLE_ERRORS`	Recycles data rejected from a previous control.
`STATIC_CONTROL`	Controls the target table after having inserted or updated target data.
`CREATE_TARG_TABLE`	Creates the target table.
`TRUNCATE`	Replaces the target table content with the new data. Setting this option to `true` provides better performance.
`DELETE_TEMPORARY_OBJECTS`	Removes the temporary objects, such as tables, files, and scripts after data integration. Set this option to `NO` if you want to retain the temporary objects, which might be useful for debugging.
`HIVE_COMPATIBILE`	Specifies the Hive version compatibility. The values permitted for this option are 0.7 and 0.8. 0.7: Simulates the append behavior. Must be used for Hive 0.7 (CDH3). 0.8: Uses Hive's append feature, which provides better performance. Requires Hive 0.8 (CDH4) or later.

A.28 CKM Hive (Deprecated)

Note: This KM is deprecated and only used for backward compatibility.

This knowledge module checks data integrity for Hive tables. It verifies the validity of the constraints of a Hive data store and diverts the invalid records to an error table. You can use CKM Hive for static control and flow control. You must also define these constraints on the stored data.

The following table describes the options for this check knowledge module.

Table A-26 CKM Hive Options

Option Description

Option	Description
`DROP_ERROR_TABLE`	Drops error table before execution. When this option is set to `YES`, the error table will be dropped each time a control is performed on the target table. This means that any rejected records, identified and stored during previous control operations, will be lost. Otherwise previous rejects will be preserved. In addition to the error table, any table called `<error table>_tmp` will also be dropped.
`HIVE_COMPATIBILE`	Specifies the Hive version compatibility. The values permitted for this option are 0.7 and 0.8. 0.7: Simulates the append behavior. Must be used for Hive 0.7 (CDH3). 0.8: Uses Hive's append feature, which provides better performance. Requires Hive 0.8 (CDH4) or later.

DROP_ERROR_TABLE

Drops error table before execution. When this option is set to YES, the error table will be dropped each time a control is performed on the target table. This means that any rejected records, identified and stored during previous control operations, will be lost. Otherwise previous rejects will be preserved. In addition to the error table, any table called <error table>_tmp will also be dropped.

HIVE_COMPATIBILE

Specifies the Hive version compatibility. The values permitted for this option are 0.7 and 0.8.

0.7: Simulates the append behavior. Must be used for Hive 0.7 (CDH3).
0.8: Uses Hive's append feature, which provides better performance. Requires Hive 0.8 (CDH4) or later.

A.29 IKM Hive Transform (Deprecated)

Note: This KM is deprecated and only used for backward compatibility.

This knowledge module performs transformations. It uses a shell script to transform the data, and then integrates it into a Hive target table using replace mode. The knowledge module supports inline view mappings and can be used as an inline-view for IKM Hive Control Append.

The transformation script must read the input columns in the order defined by the source data store. Only mapped source columns are streamed into the transformations. The transformation script must provide the output columns in the order defined by the target data store.

The following table describes the options for this integration knowledge module.

Table A-27 IKM Hive Transform Options

Option	Description
`CREATE_TARG_TABLE`	Creates the target table.
`DELETE_TEMPORARY_OBJECTS`	Removes the temporary objects, such as tables, files, and scripts post data integration. Set this option to `NO` if you want to retain the temporary objects, which might be useful for debugging.
`TRANSFORM_SCRIPT_NAME`	Defines the file name of the transformation script. This transformation script is used to transform the input data into the output structure. Both local and HDFS paths are supported, for example: Local script location: `file:///tmp/odi/script1.pl` HDFS script location: `hdfs://namenode:nnPort/tmp/odi/script1.pl` Ensure that the following requirements are met: The path/file must be accessible by both the ODI agent and the Hive server. Read access for the Hive server is required as it is the Hive server, which executes the resulting MR job invoking the script. If `TRANSFORM_SCRIPT` is set (ODI creates the script file during mapping execution), the path/file must be writable for the ODI agent, as it is the ODI agent, which writes the script file using the HDFS Java API. When the KM option `TRANSFORM_SCRIPT` is set, the following paragraphs provide some configuration help: For HDFS script locations: The script file created is owned by the ODI agent user and receives the group of the owning directory. See Hadoop Hdfs Permissions Guide for more details. The standard configuration to cover the above two requirements for HDFS scripts is to ensure that the group of the HDFS script directory includes the ODI agent user (let's assume oracle) as well as the Hive server user (let's assume hive). Assuming that the group hadoop includes oracle and hive, the sample command below adjusts the ownership of the HDFS script directory: `logon as hdfs user hdfs dfs -chown oracle:hadoop /tmp/odi/myscriptdir` For local script locations: The script file created is owned by the ODI agent user and receives the ODI agent user's default group, unless SGID has been set on the script directory. If the sticky group bit has been set, the file will be owned by the group of the script directory instead. The standard configuration to cover the above two requirements for local scripts is similar to the HDFS configuration by using the SGID: `chown oracle:hadoop /tmp/odi/myscriptdir chmod g+s /tmp/odi/myscriptdir`
`TRANSFORM_SCRIPT`	Defines the transformation script content. This transformation script is then used to transform the input data into the output structure. If left blank, the file given in `TRANSFORM_SCRIPT_NAME` must already exist. If not blank, the script file is created. Script example (1-to-1 transformation): `#! /usr/bin/csh -f cat` All mapped source columns are spooled as tab separated data into this script via stdin. This unix script then transforms the data and writes out the data as tab separated data on stdout. The script must provide as many output columns as there are target columns.
`TRANSFORM_SCRIPT_MODE`	Unix/HDFS file permissions for script file in octal notation with leading zero. For example, full permissions for owner and group: 0770. Warning: Using wider permissions like 0777 poses a security risk. See also KM option description for `TRANSFORM_SCRIPT_NAME` for details on directory permissions.
`PRE_TRANSFORM_DISTRIBUTE`	Provides an optional, comma-separated list of source column names, which enables the knowledge module to distribute the data before the transformation script is applied.
`PRE_TRANSFORM_SORT`	Provide an optional, comma-separated list of source column names, which enables the knowledge module to sort the data before the transformation script is applied.
`POST_TRANSFORM_DISTRIBUTE`	Provides an optional, comma-separated list of target column names, which enables the knowledge module to distribute the data after the transformation script is applied.
`POST_TRANSFORM_SORT`	Provides an optional, comma-separated list of target column names, which enables the knowledge module to sort the data after the transformation script is applied.

A.30 IKM File-Hive to Oracle (OLH-OSCH) [Deprecated]

Note: This KM is deprecated and only used for backward compatibility.

IKM File-Hive to Oracle (OLH-OSCH) integrates data from an HDFS file or Hive source into an Oracle database target using . Using the mapping configuration and the selected options, the knowledge module generates an appropriate Oracle Database target instance. Hive and Hadoop versions must follow the requirements.

Option	Description
`OLH_OUTPUT_MODE`	Specifies how to load the Hadoop data into Oracle. Permitted values are JDBC, OCI, DP_COPY, DP_OSCH, and OSCH. JDBC output mode: The data is inserted using a number of direct insert JDBC connections. In very rare cases JDBC mode may result in duplicate records in target table due to Hadoop trying to restart tasks. OCI output mode: The data is inserted using a number of direct insert OCI connections in direct path mode. If `USE_ORACLE_STAGING`is set to `false`, target table must be partitioned. If `USE_ORACLE_STAGING` is set to `true`, `FLOW_TABLE_OPTIONS` must explicitly specify partitioning, for example, `"PARTITION BY HASH(COL1) PARTITIONS 4"`. In very rare cases OCI mode may result in duplicate records in target table due to Hadoop trying to restart tasks. DP_COPY output mode: OLH creates a number of DataPump export files. These files are transferred by a "`Hadoop fs -copyToLocal`" command to the local path specified by `EXT_TAB_DIR_LOCATION`. Please note that the path must be accessible by the Oracle Database engine. Once the copy job is complete, an external table is defined in the target database, which accesses the files from `EXT_TAB_DIR_LOCATION`. DP_OSCH output mode: OLH creates a number of DataPump export files. After the export phase an external table is created on the target database, which accesses these output files directly via OSCH. Please note that the path must be accessible by the Oracle Database engine. Once the copy job is complete, an external table is defined in the target database, which accesses the files from `EXT_TAB_DIR_LOCATION`. OSCH output mode: In OSCH mode loading, OLH is bypassed. ODI creates an external table on the target database, which accesses the input files through OSCH. Please note that only delimited and fixed length files can be read. No support for loading from Hive or custom Input Formats such as RegexInputFormat, as there is no OLH pre-processing.
`REJECT_LIMIT`	Specifies the maximum number of errors for and external table. Examples: `UNLIMITED` to except all errors. Integer value (10 to allow 10 rejections) This value is used in job definitions as well as in external table definitions.
`CREATE_TARG_TABLE`	Creates the target table.
`TRUNCATE`	Replaces the target table content with the new data.
`DELETE_ALL`	Deletes all the data in target table.
`USE_HIVE_STAGING_TABLE`	Materializes Hive source data before extraction by . If this option is set to `false`, directly accesses the Hive source data. Setting this option to `false` is only possible, if all these conditions are true: Only a single source table No transformations, filters, joins No datasets Setting this option to `false` provides better performance by avoiding an extra data transfer step. This option is applicable only if the source technology is Hive.
`USE_ORACLE_STAGING_TABLE`	Uses an intermediate Oracle database staging table. The extracted data is made available to Oracle by an external table. If `USE_ORACLE_STAGING_TABLE` is set to `true` (default), the external table is created as a temporary (I$) table. This I$ table data is then inserted into the target table. Setting this option to `false` is only possible, if all these conditions are true: `OLH_OUTPUT_MODE` is set to `JDBC` or `OCI` All source columns are mapped All target columns are mapped No target-side mapping expressions Setting this option to `false` provides better performance by avoiding an extra data transfer step, but may lead to partial data being loaded into the target table, as loads data in multiple transactions.
`EXT_TAB_DIR_LOCATION`	Specifies the file system path of the external table. Please note the following: Only applicable, if `OLH_OUTPUT_MODE` = `DP_\|OSCH` For `OLH_OUTPUT_MODE` = `DP_`: this path must be accessible both from the ODI agent and from the target database engine. For `OLH_OUTPUT_MODE` = `DP_`: the name of the external directory object is the I$ table name. For `OLH_OUTPUT_MODE` = `DP_COPY`: ODI agent will use `hadoop-fs` command to copy dp files into this directory. For `OLH_OUTPUT_MODE` = `DP_\|OSCH`: this path will contain any external table log/bad/dsc files. ODI agent will remove any files from this directory during clean up before launching OLH/OSCH.
`TEMP_DIR`	Specifies the directory used for storing temporary files, such as sqoop script, stdout and stderr redirects. Leave this option blank to use system's default temp directory: `<?=System.getProperty("java.io.tmp")?>`
`MAPRED_OUTPUT_BASE_DIR`	Specifies an HDFS directory, where the job will create subdirectories for temporary files/datapump output files.
`FLOW_TABLE_OPTIONS`	Specifies the attributes for the integration table at create time and used for increasing performance. This option is set by default to `NOLOGGING`. This option may be left empty.
`DELETE_TEMPORARY_OBJECTS`	Removes temporary objects, such as tables, files, and scripts post data integration. Set this option to `NO` if you want to retain the temporary objects, which might be useful for debugging.
`OVERRIDE_INPUTFORMAT`	By default the InputFormat class is derived from the source DataStore/Technology (DelimitedTextInputFormat or HiveToAvroInputFormat). This option allows the user to specify the class name of a custom InputFormat. Cannot be used with `OLH_OUTPUT_MODE`=`OSCH`. Example, for reading custom file formats like web log files the OLH RegexInputFormat can be used by assigning the value: `oracle.hadoop.loader.lib.input.RegexInputFormat` See KM option EXTRA_OLH_CONF_PROPERTIES for details on how to specify the regular expression.
`EXTRA_OLH_CONF_PROPERTIES`	Particularly when using custom InputFormats (see KM option `OVERRIDE_INPUTFORMAT` for details) the InputFormat may require additional configuration parameters. These are provided in the OLH configuration file. This KM option allows adding extra properties to the OLH configuration file. Cannot be used with `OLH_OUTPUT_MODE`=`OSCH`. Example, (loading apache weblog file format): When OLH RegexInputFormat is used for reading custom file formats, this KM option specifies the regular expression and other parsing details: `<property> <name>oracle.hadoop.loader.input.regexPattern</name> <value>([^ ]) ([^ ]) ([^ ]) (-\|\[[^\]]\]) ([^ \"]\|\"[^\"]\") (-\|[0-9]) (-\|[0-9]) (\".?\") (\".?\") (\".*?\")</value> <description>RegEx for Apache WebLog format</description> </property>`

A.31 IKM File-Hive to SQL (SQOOP) [Deprecated]

Note: This KM is deprecated and only used for backward compatibility.

IKM File-Hive to SQL (SQOOP) supports:

Filters, Joins, Datasets, Transformations and Aggregations in Hive
Inline views generated by IKM Hive Control Append
Inline views generated by IKM Hive Transform
Hive-HBase source tables using LKM HBase to Hive (HBase SerDe)
File source data (delimited file format only)

The following table describes the options for this integration knowledge module.

Table A-29 IKM File-Hive to SQL (SQOOP)

Option	Description
`CREATE_TARG_TABLE`	Creates the target table.
`TRUNCATE`	Replaces the target datastore content with new data. If this option is set to `false`, the new data is appended to the target datastore.
`DELETE_ALL`	Deletes all the rows in the target datastore.
`SQOOP_PARALLELISM`	Specifies the degree of parallelism. More precisely the number of mappers used during SQOOP export and therefore the number of parallel JDBC connections.
`USE_TARGET_STAGING_TABLE`	By default the source data is staged into a target-side staging table, before it is moved into the target table. If this option is set to `false`, SQOOP loads the source data directly into the target table, which provides better performance and less need for tablespace in target RDBMS by avoiding an extra data transfer step. For File sources setting this option to `false` is only possible, if all these conditions are met: All source columns must be mapped Source and target columns have same order First file column must map to first target column no mapping gaps only 1-to-1 mappings (no expressions) Please note the following: SQOOP uses multiple writers, each having their own JDBC connection to the target. Every writer uses multiple transactions for inserting the data. This means that in case `USE_TARGET_STAGING_TABLE` is set to `false`, changes to the target table are no longer atomic and writer failures can lead to partially updated target tables. The Teradata Connector for SQOOP always creates an extra staging table during load. This connector staging table is independent of the KM option.
`USE_GENERIC_JDBC_CONNECTOR`	Specifies whether to use the generic JDBC connector if a connector for the target technology is not available. For certain technologies SQOOP provides specific connectors. These connectors take care of SQL-dialects and optimize performance. When there is a connector for the respective target technology, this connector should be used. If not, the generic JDBC connector can be used.
`FLOW_TABLE_OPTIONS`	When creating the target-side work table, RDBMS-specific table options can improve performance. By default this option is empty and the knowledge module will use the following table options: For Oracle: `NOLOGGING` For DB2: `NOT LOGGED INITIALLY` For Teradata: `no fallback, no before journal, no after journal` Any explicit value overrides these defaults.
`TEMP_DIR`	Specifies the directory used for storing temporary files, such as sqoop script, stdout and stderr redirects. Leave this option blank to use system's default temp directory: `<?=System.getProperty("java.io.tmp")?>`
`MAPRED_OUTPUT_BASE_DIR`	Specifies an HDFS directory, where SQOOP creates subdirectories for temporary files. A subdirectory called like the work table will be created here to hold the temporary data.
`DELETE_TEMPORARY_OBJECTS`	Deletes temporary objects such as tables, files, and scripts after data integration. Set this option to `NO` if you want to retain the temporary objects, which might be useful for debugging.
`TERADATA_PRIMARY_INDEX`	Primary index for the target table. Teradata uses the primary index to spread data across AMPs. It is important that the chosen primary index has a high cardinality (many distinct values) to ensure evenly spread data to allow maximum processing performance. Please follow Teradata's recommendation on choosing a primary index. This option is applicable only to Teradata targets.
`TERADATA_FLOW_TABLE_TYPE`	Type of the Teradata flow table, either SET or MULTISET. This option is applicable only to Teradata targets.
`TERADATA_OUTPUT_METHOD`	Specifies the way the Teradata Connector will load the data. Valid values are: `batch.insert`: multiple JDBC connections using batched prepared statements (simplest to start with) `multiple.fastload`: multiple FastLoad connections `internal.fastload`: single coordinated FastLoad connections (most performant) This option is applicable only to Teradata targets.
`EXTRA_HADOOP_CONF_PROPERTIES`	Optional generic Hadoop properties.
`EXTRA_SQOOP_CONF_PROPERTIES`	Optional SQOOP properties.
`EXTRA_SQOOP_CONNECTOR_CONF_PROPERTIES`	Optional SQOOP connector properties.