Class V1V2TableUtil


  • public final class V1V2TableUtil
    extends Object
    Utility class that provides static convenience methods for managing the interactions between version 1 and version 2 (YARN) MapReduce classes.
    • Method Detail

      • getSplitMap

        public static Map<TableHiveInputSplit,​TableInputSplit> getSplitMap​(JobConf jobConf,
                                                                                 TableHiveInputSplit inputSplit,
                                                                                 int queryBy,
                                                                                 String whereClause,
                                                                                 Integer shardKeyPartitionId)
                                                                          throws IOException
        For the current Hive query, returns a singleton collection that maps each version 1 InputSplit for the query to its corresponding version 2 InputSplit. If the call to this method is the first call after the query has been entered on the command line and the input info for the job has been reset (using resetInputJobInfoForNewQuery), this method will construct and populate the return Map; otherwise, it will return the previously constructed Map.

        Implementation Note: when the getInputFormat method from this class is called to retrieve the TableInputFormat instance, only the VERY FIRST call to getInputFormat will construct an instance of TableInputFormat; all additional calls will always return the original instance created by that first call. More importantly, in addition to constructing a TableInputFormat instance, that first call to getInputFormat also constructs and populates the Map returned by this method; which is achieved via a call to the getSplits method on the newly created TableInputFormat instance.

        Since the first call to the getInputFormat method of this class has already called TableInputFormat.getSplits and placed the retrieved splits in the Map to return here, it is no longer necessary to make any additional calls to TableInputFormat.getSplits. Not only is it not necessary to call TableInputFormat.getSplits, but such a call should be avoided. This is because any call to TableInputFormat.getSplits will result in remote calls to the KVStore; which can be very costly. As a result, one should NEVER make a call such as, getInputFormat().getSplits() as such a call may result in two successive calls to TableInputFormat.getSplits. Thus, to avoid the situation just described, this method only needs to call getInputFormat (not getSplits()) to construct and populate the Map to return.

        Throws:
        IOException
      • getInputFormat

        public static InputFormat<PrimaryKey,​Row> getInputFormat​(JobConf jobConf,
                                                                       TableHiveInputSplit inputSplit,
                                                                       int queryBy,
                                                                       String whereClause,
                                                                       Integer shardKeyPartitionId)
                                                                throws IOException
        For the current Hive query, constructs and returns a YARN based InputFormat class that will be used when processing the query. This method also constructs and populates a singleton Map whose elements are key/value pairs in which each key is a version 1 split for the returned InputFormat, and each value is the key's corresponding version 2 split. Note that both the InputFormat and the Map are contructed only on the first call to this method for the given query. On all subsequent calls, the original objects are returned; until the resetInputJobInfoForNewQuery method from this utility is called.
        Throws:
        IOException
      • resetInputJobInfoForNewQuery

        public static void resetInputJobInfoForNewQuery()
        Clears and resets the information related to the current job's input classes.

        This method must be called before each new query has been entered on the command line; to reset the splits as well as the InputFormats participating in the job. Note that the Hive infrastructure and BigDataSQL each employ different code paths with respect to the initialization of the query state set in TableStorageHandlerBase. That is, for a Hive-only query, the path consists of the following calls: decomposePredicate followed by configureJobProperties; whereas for a BigDataSQL query, the code path consists of: configureJobProperties followed by decomposePredicate. As a result, this method must be invoked after processing of the current query has completed; for example, in the close method of the record reader.