Class TableHiveInputFormat<K,V>
- All Implemented Interfaces:
InputFormat<K,
V>
Note that whereas this class is an instance of a version 1 InputFormat class, in order to exploit and reuse the mechanisms provided by the Hadoop integration classes (in package oracle.kv.hadoop.table), this class also creates and manages an instance of a version 2 InputFormat. - Note on Logging - Two loggers are currently employed by this class:
- One logger based on Log4j version 1, accessed via the org.apache.commons.logging wrapper.
- One logger based on the Log4j2 API.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static final class
Local class, intended as a convenient return type data structure, that associates the comparison operation(s) specified in a given predicate with a corresponding column (field) name. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiongetRecordReader
(InputSplit split, JobConf job, Reporter reporter) Returns the RecordReader for the given InputSplit.For the current query, returns an array containing the input splits to use when satisfying the query.
-
Constructor Details
-
TableHiveInputFormat
public TableHiveInputFormat()
-
-
Method Details
-
getRecordReader
public RecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException Returns the RecordReader for the given InputSplit.Note that the RecordReader that is returned is based on version 1 of MapReduce, but wraps and delegates to a YARN based (MapReduce version2) RecordReader. This is done because the RecordReader provided for Hadoop integration is YARN based, whereas the Hive infrastructure requires a version 1 RecordReader.
Additionally, note that when query execution occurs via a MapReduce job, this method is invoked by backend processes running on each DataNode in the Hadoop cluster; where the splits are distributed to each DataNode. When the query is simple enough to be executed by the Hive infrastructure from data in the metastore -- that is, without MapReduce -- this method is invoked by the frontend Hive processes; once for each split. For example, if there are 6 splits and the query is executed via a MapReduce job employing only 3 DataNodes, then each DataNode will invoke this method twice; once for each of 2 splits in the set of splits. On the other hand, if MapReduce is not employed, then the Hive frontend will invoke this method 6 separate times; one per different split. In either case, when this method is invoked, the given Version 1
split
has already been populated with a fully populated Version 2 split; and the state of that encapsulated Version 2 split can be exploited to construct the necessary Version 1 RecordReader encapsulating a fully functional Version 2 RecordReader, as required by YARN.- Specified by:
getRecordReader
in interfaceInputFormat<K,
V> - Throws:
IOException
-
getSplits
For the current query, returns an array containing the input splits to use when satisfying the query.Implementation Note: V1V2TableUtil.getInputFormat() is first called by this method which, in addition to constructing a TableInputFormat instance, also populatest the splitMap. The splitMap constructed by V1V2TableUtil.getInputFormat() is then retrieved and the keySet of that splitMap is used to populate the array returned by this method.
- Specified by:
getSplits
in interfaceInputFormat<K,
V> - Throws:
IOException
-