Class TableHiveInputFormat<K,V>
- java.lang.Object
-
- oracle.kv.hadoop.hive.table.TableHiveInputFormat<K,V>
-
- All Implemented Interfaces:
InputFormat<K,V>
public class TableHiveInputFormat<K,V> extends Object implements InputFormat<K,V>
A Hadoop MapReduce version 1 InputFormat class for reading data from an Oracle NoSQL Database when processing a Hive query against data written to that database using the Table API.Note that whereas this class is an instance of a version 1 InputFormat class, in order to exploit and reuse the mechanisms provided by the Hadoop integration classes (in package oracle.kv.hadoop.table), this class also creates and manages an instance of a version 2 InputFormat. - Note on Logging - Two loggers are currently employed by this class:
- One logger based on Log4j version 1, accessed via the org.apache.commons.logging wrapper.
- One logger based on the Log4j2 API.
-
-
Constructor Summary
Constructors Constructor Description TableHiveInputFormat()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description RecordReader<K,V>
getRecordReader(InputSplit split, JobConf job, Reporter reporter)
Returns the RecordReader for the given InputSplit.InputSplit[]
getSplits(JobConf job, int numSplits)
Returns an array containing the input splits for the given job.
-
-
-
Method Detail
-
getRecordReader
public RecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException
Returns the RecordReader for the given InputSplit.Note that the RecordReader that is returned is based on version 1 of MapReduce, but wraps and delegates to a YARN based (MapReduce version2) RecordReader. This is done because the RecordReader provided for Hadoop integration is YARN based, whereas the Hive infrastructure requires a version 1 RecordReader.
Additionally, note that when query execution occurs via a MapReduce job, this method is invoked by backend processes running on each DataNode in the Hadoop cluster; where the splits are distributed to each DataNode. When the query is simple enough to be executed by the Hive infrastructure from data in the metastore -- that is, without MapReduce -- this method is invoked by the frontend Hive processes; once for each split. For example, if there are 6 splits and the query is executed via a MapReduce job employing only 3 DataNodes, then each DataNode will invoke this method twice; once for each of 2 splits in the set of splits. On the other hand, if MapReduce is not employed, then the Hive frontend will invoke this method 6 separate times; one per different split. In either case, when this method is invoked, the given Version 1
split
has already been populated with a fully populated Version 2 split; and the state of that encapsulated Version 2 split can be exploited to construct the necessary Version 1 RecordReader encapsulating a fully functional Version 2 RecordReader, as required by YARN.- Specified by:
getRecordReader
in interfaceInputFormat<K,V>
- Throws:
IOException
-
getSplits
public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException
Returns an array containing the input splits for the given job.Implementation Note: when V1V2TableUtil.getInputFormat() is called by this method to retrieve the TableInputFormat instance to use for a given query, only the VERY FIRST call to V1V2TableUtil.getInputFormat() (after the query has been entered on the command line and the input info for the job has been reset) will construct an instance of TableInputFormat; all additional calls -- while that query is executing -- will always return the original instance created by that first call. Note also that in addition to constructing a TableInputFormat instance, that first call to V1V2TableUtil.getInputFormat() also populates the splitMap; which is achieved via a call to getSplits() on the newly created TableInputFormat instance. Since the first call to V1V2TableUtil.getInputFormat() has already called TableInputFormat.getSplits() and placed the retrieved splits in the splitMap, it is no longer necessary to make any additional calls to TableInputFormat.getSplits(). Not only is it not necessary to call TableInputFormat.getSplits(), but such a call should be avoided; to avoid additional, unnecessary remote calls to KVStore. Thus, avoid calls such as, V1V2TableUtil.getInputFormat().getSplits(); since such a call may result in two successive calls to TableInputFormat.getSplits(). To avoid this situation, a two step process like the following should be employed to retrieve and return the desired splits:
1. First call V1V2TableUtil.getInputFormat(); which when called repeatedly, will always return the same instance of TableInputFormat. 2. Call V1V2TableUtil.getSplitMap(), then retrieve and return the desired splits from the returned map.
- Specified by:
getSplits
in interfaceInputFormat<K,V>
- Throws:
IOException
-
-