Class TableHiveInputFormat<K,V>
- java.lang.Object
-
- oracle.kv.hadoop.hive.table.TableHiveInputFormat<K,V>
-
- All Implemented Interfaces:
InputFormat<K,V>
public class TableHiveInputFormat<K,V> extends Object implements InputFormat<K,V>
A Hadoop MapReduce version 1 InputFormat class for reading data from an Oracle NoSQL Database when processing a Hive query against data written to that database using the Table API.Note that whereas this class is an instance of a version 1 InputFormat class, in order to exploit and reuse the mechanisms provided by the Hadoop integration classes (in package oracle.kv.hadoop.table), this class also creates and manages an instance of a version 2 InputFormat. - Note on Logging - Two loggers are currently employed by this class:
- One logger based on Log4j version 1, accessed via the org.apache.commons.logging wrapper.
- One logger based on the Log4j2 API.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
TableHiveInputFormat.ColumnPredicateInfo
Local class, intended as a convenient return type data structure, that associates the comparison operation(s) specified in a given predicate with a corresponding column (field) name.
-
Constructor Summary
Constructors Constructor Description TableHiveInputFormat()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description RecordReader<K,V>
getRecordReader(InputSplit split, JobConf job, Reporter reporter)
Returns the RecordReader for the given InputSplit.InputSplit[]
getSplits(JobConf job, int numSplits)
For the current query, returns an array containing the input splits to use when satisfying the query.
-
-
-
Method Detail
-
getRecordReader
public RecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException
Returns the RecordReader for the given InputSplit.Note that the RecordReader that is returned is based on version 1 of MapReduce, but wraps and delegates to a YARN based (MapReduce version2) RecordReader. This is done because the RecordReader provided for Hadoop integration is YARN based, whereas the Hive infrastructure requires a version 1 RecordReader.
Additionally, note that when query execution occurs via a MapReduce job, this method is invoked by backend processes running on each DataNode in the Hadoop cluster; where the splits are distributed to each DataNode. When the query is simple enough to be executed by the Hive infrastructure from data in the metastore -- that is, without MapReduce -- this method is invoked by the frontend Hive processes; once for each split. For example, if there are 6 splits and the query is executed via a MapReduce job employing only 3 DataNodes, then each DataNode will invoke this method twice; once for each of 2 splits in the set of splits. On the other hand, if MapReduce is not employed, then the Hive frontend will invoke this method 6 separate times; one per different split. In either case, when this method is invoked, the given Version 1
split
has already been populated with a fully populated Version 2 split; and the state of that encapsulated Version 2 split can be exploited to construct the necessary Version 1 RecordReader encapsulating a fully functional Version 2 RecordReader, as required by YARN.- Specified by:
getRecordReader
in interfaceInputFormat<K,V>
- Throws:
IOException
-
getSplits
public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException
For the current query, returns an array containing the input splits to use when satisfying the query.Implementation Note: V1V2TableUtil.getInputFormat() is first called by this method which, in addition to constructing a TableInputFormat instance, also populatest the splitMap. The splitMap constructed by V1V2TableUtil.getInputFormat() is then retrieved and the keySet of that splitMap is used to populate the array returned by this method.
- Specified by:
getSplits
in interfaceInputFormat<K,V>
- Throws:
IOException
-
-