public final class V1V2TableUtil extends Object
Modifier and Type | Method and Description |
---|---|
static InputFormat<PrimaryKey,Row> |
getInputFormat(JobConf jobConf,
int queryBy,
String whereClause,
Integer shardKeyPartitionId) |
static InputFormat<PrimaryKey,Row> |
getInputFormat(JobConf jobConf,
TableHiveInputSplit inputSplit,
int queryBy,
String whereClause,
Integer shardKeyPartitionId)
For the current Hive query, constructs and returns a YARN based
InputFormat class that will be used when processing the query.
|
static Map<TableHiveInputSplit,TableInputSplit> |
getSplitMap(JobConf jobConf,
int queryBy,
String whereClause,
Integer shardKeyPartitionId) |
static Map<TableHiveInputSplit,TableInputSplit> |
getSplitMap(JobConf jobConf,
TableHiveInputSplit inputSplit,
int queryBy,
String whereClause,
Integer shardKeyPartitionId)
For the current Hive query, returns a singleton collection that
maps each version 1 InputSplit for the query to its corresponding
version 2 InputSplit.
|
static void |
resetInputJobInfoForNewQuery()
Clears and resets the information related to the current job's input
classes.
|
public static Map<TableHiveInputSplit,TableInputSplit> getSplitMap(JobConf jobConf, TableHiveInputSplit inputSplit, int queryBy, String whereClause, Integer shardKeyPartitionId) throws IOException
Implementation Note: when the getInputFormat method from this class is called to retrieve the TableInputFormat instance, only the VERY FIRST call to getInputFormat will construct an instance of TableInputFormat; all additional calls will always return the original instance created by that first call. More importantly, in addition to constructing a TableInputFormat instance, that first call to getInputFormat also constructs and populates the Map returned by this method; which is achieved via a call to the getSplits method on the newly created TableInputFormat instance.
Since the first call to the getInputFormat method of this class has
already called TableInputFormat.getSplits and placed the retrieved
splits in the Map to return here, it is no longer necessary to make
any additional calls to TableInputFormat.getSplits. Not only is it not
necessary to call TableInputFormat.getSplits, but such a call should
be avoided. This is because any call to TableInputFormat.getSplits
will result in remote calls to the KVStore; which can be very costly.
As a result, one should NEVER make a call such as,
getInputFormat().getSplits()
as such a call may result in two successive calls to
TableInputFormat.getSplits. Thus, to avoid the situation just described,
this method only needs to call getInputFormat (not getSplits()) to
construct and populate the Map to return.
IOException
public static Map<TableHiveInputSplit,TableInputSplit> getSplitMap(JobConf jobConf, int queryBy, String whereClause, Integer shardKeyPartitionId) throws IOException
IOException
public static InputFormat<PrimaryKey,Row> getInputFormat(JobConf jobConf, TableHiveInputSplit inputSplit, int queryBy, String whereClause, Integer shardKeyPartitionId) throws IOException
IOException
public static InputFormat<PrimaryKey,Row> getInputFormat(JobConf jobConf, int queryBy, String whereClause, Integer shardKeyPartitionId) throws IOException
IOException
public static void resetInputJobInfoForNewQuery()
This method must be called before each new query has been entered on the command line; to reset the splits as well as the InputFormats participating in the job. Note that the Hive infrastructure and BigDataSQL each employ different code paths with respect to the initialization of the query state set in TableStorageHandlerBase. That is, for a Hive-only query, the path consists of the following calls: decomposePredicate followed by configureJobProperties; whereas for a BigDataSQL query, the code path consists of: configureJobProperties followed by decomposePredicate. As a result, this method must be invoked after processing of the current query has completed; for example, in the close method of the record reader.
Copyright (c) 2011, 2017 Oracle and/or its affiliates. All rights reserved.