public class TableHiveInputFormat<K,V> extends Object implements InputFormat<K,V>
Note that whereas this class is an instance of a version 1 InputFormat class, in order to exploit and reuse the mechanisms provided by the Hadoop integration classes (in package oracle.kv.hadoop.table), this class also creates and manages an instance of a version 2 InputFormat.
Constructor and Description |
---|
TableHiveInputFormat() |
Modifier and Type | Method and Description |
---|---|
RecordReader<K,V> |
getRecordReader(InputSplit split,
JobConf job,
Reporter reporter)
Returns the RecordReader for the given InputSplit.
|
InputSplit[] |
getSplits(JobConf job,
int numSplits)
Returns an array containing the input splits for the given job.
|
public RecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException
Note that the RecordReader that is returned is based on version 1 of MapReduce, but wraps and delegates to a YARN based (MapReduce version2) RecordReader. This is done because the RecordReader provided for Hadoop integration is YARN based, whereas the Hive infrastructure requires a version 1 RecordReader.
getRecordReader
in interface InputFormat<K,V>
IOException
public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException
Implementation Note: when V1V2TableUtil.getInputFormat() is called by this method to retrieve the TableInputFormat instance to use for a given query, only the VERY FIRST call to V1V2TableUtil.getInputFormat() (after the query has been entered on the command line and the input info for the job has been reset) will construct an instance of TableInputFormat; all additional calls -- while that query is executing -- will always return the original instance created by that first call. Note also that in addition to constructing a TableInputFormat instance, that first call to V1V2TableUtil.getInputFormat() also populates the splitMap; which is achieved via a call to getSplits() on the newly created TableInputFormat instance. Since the first call to V1V2TableUtil.getInputFormat() has already called TableInputFormat.getSplits() and placed the retrieved splits in the splitMap, it is no longer necessary to make any additional calls to TableInputFormat.getSplits(). Not only is it not necessary to call TableInputFormat.getSplits(), but such a call should be avoided. This is because any call to TableInputFormat.getSplits() will result in remote calls to the KVStore; which can be very costly. Thus, one should NEVER make a call such as, V1V2TableUtil.getInputFormat().getSplits(); as such a call may result in two successive calls to TableInputFormat.getSplits(). To avoid this situation, one should employ a two step process like the following to retrieve and return the desired splits: 1. First call V1V2TableUtil.getInputFormat(); which when called repeatedly, will always return the same instance of TableInputFormat. 2. Call V1V2TableUtil.getSplitMap(), then retrieve and return the desired splits from the returned map.
getSplits
in interface InputFormat<K,V>
IOException
Copyright (c) 2011, 2015 Oracle and/or its affiliates. All rights reserved.