Oracle NoSQL Database
version 11gR2.2.0.26

oracle.kv.hadoop
Class KVInputFormatBase<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<K,V>
      extended by oracle.kv.hadoop.KVInputFormatBase<K,V>
Direct Known Subclasses:
KVAvroInputFormat, KVInputFormat

public abstract class KVInputFormatBase<K,V>
extends org.apache.hadoop.mapreduce.InputFormat<K,V>

This is the base class for Oracle NoSQL Database InputFormat classes. Keys and Value types are determined by the specific subclass.

Parameters may be passed using either the static setters on this class or through the Hadoop JobContext configuration parameters. The following parameters are recognized:

Internally, the KVInputFormatBase class utilizes KVStore.storeIterator to retrieve records. You should refer to the javadoc for that method for information about the various parameters.

KVInputFormatBase creates one split per Oracle NoSQL DB partition. The location value for each split is an array of hosts holding the partition. If the consistency passed to KVInputFormatBase is NONE_REQUIRED (the default), then InputSplit.getLocations() will return an array of the names of the master and the replica(s) which contain the partition. If the consistency is ABSOLUTE, then locations will only contain the name of the master. This means that if Hadoop job trackers are running on the nodes named in the location array, Hadoop will generally attempt to run the subtasks for a particular partition on those nodes where the data is stored and replicated. Hadoop and Oracle NoSQL DB administrators should be careful about co-location of Oracle NoSQL DB and Hadoop processes since they may competing for resources.

Partitions in Oracle NoSQL DB are considered to be roughly equal in size; therefore InputSplit.getLength() always returns 1.

A simple example demonstrating the Oracle NoSQL DB Hadoop oracle.kv.hadoop.InputFormat class reading data from Hadoop in a Map/Reduce job and counting the number of records for each major key in the store can be found in the KVHOME/examples/hadoop directory. The javadoc for that program describes the simple Map/Reduce processing as well as how to invoke the program in Hadoop.

Since:
2.0

Method Summary
static void setBatchSize(int batchSize)
          Specifies the suggested number of keys to fetch during each network round trip by the InputFormat.
static void setConsistency(Consistency consistency)
          Specifies the read consistency associated with the lookup of the child KV pairs.
static void setDepth(Depth depth)
          Specifies whether the parent and only children or all descendents are returned.
static void setDirection(Direction direction)
          Specifies the order in which records are returned by the InputFormat.
static void setFormatterClassName(String formatterClassName)
          Specifies the name of a class that implements AvroFormatter to (optionally) format KeyValueVersion instances into Avro IndexedRecords.
static void setKVHelperHosts(String[] kvHelperHosts)
          Set the KV Helper host:port pair(s) for this InputFormat to operate on.
static void setKVStoreName(String kvStoreName)
          Set the KV Store name for this InputFormat to operate on.
static void setParentKey(Key parentKey)
          Specifies the parent key whose "child" KV pairs are to be returned by the InputFormat.
static void setSubRange(KeyRange subRange)
          Specifies a sub range to further restrict the range under the parentKey to the major path components in this sub range.
static void setTimeout(long timeout)
          Specifies an upper bound on the time interval for processing a particular KV retrieval.
static void setTimeoutUnit(TimeUnit timeoutUnit)
          Specifies the unit of the timeout parameter.
 
Methods inherited from class org.apache.hadoop.mapreduce.InputFormat
createRecordReader, getSplits
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

setKVStoreName

public static void setKVStoreName(String kvStoreName)
Set the KV Store name for this InputFormat to operate on. This is equivalent to passing the oracle.kv.kvstore Hadoop property.

Parameters:
kvStoreName - the KV Store name

setKVHelperHosts

public static void setKVHelperHosts(String[] kvHelperHosts)
Set the KV Helper host:port pair(s) for this InputFormat to operate on. This is equivalent to passing the oracle.kv.hosts Hadoop property.

Parameters:
kvHelperHosts - array of hostname:port strings of any hosts in the KV Store.

setDirection

public static void setDirection(Direction direction)
Specifies the order in which records are returned by the InputFormat. Only Direction.FORWARD is allowed.

Parameters:
direction - the direction to retrieve data

setBatchSize

public static void setBatchSize(int batchSize)
Specifies the suggested number of keys to fetch during each network round trip by the InputFormat. If 0, an internally determined default is used. This is equivalent to passing the oracle.kv.batchSize Hadoop property.

Parameters:
batchSize - the suggested number of keys to fetch during each network round trip.

setParentKey

public static void setParentKey(Key parentKey)
Specifies the parent key whose "child" KV pairs are to be returned by the InputFormat. null will result in fetching all keys in the store. If non-null, the major key path must be a partial path and the minor key path must be empty. This is equivalent to passing the oracle.kv.parentKey Hadoop property.

Parameters:
parentKey - the parentKey

setSubRange

public static void setSubRange(KeyRange subRange)
Specifies a sub range to further restrict the range under the parentKey to the major path components in this sub range. It may be null. This is equivalent to passing the oracle.kv.subRange Hadoop property.

Parameters:
subRange - the sub range.

setDepth

public static void setDepth(Depth depth)
Specifies whether the parent and only children or all descendents are returned. If null, Depth.PARENT_AND_DESCENDENTS is implied. This is equivalent to passing the oracle.kv.depth Hadoop property.

Parameters:
depth - the depth.

setConsistency

public static void setConsistency(Consistency consistency)
Specifies the read consistency associated with the lookup of the child KV pairs. Version- and Time-based consistency may not be used. If null, the default consistency is used. This is equivalent to passing the oracle.kv.consistency Hadoop property.

Parameters:
consistency - the consistency

setTimeout

public static void setTimeout(long timeout)
Specifies an upper bound on the time interval for processing a particular KV retrieval. A best effort is made to not exceed the specified limit. If zero, the default request timeout is used. This is equivalent to passing the oracle.kv.timeout Hadoop property.

Parameters:
timeout - the timeout

setTimeoutUnit

public static void setTimeoutUnit(TimeUnit timeoutUnit)
Specifies the unit of the timeout parameter. It may be null only if timeout is zero. This is equivalent to passing the oracle.kv.timeout Hadoop property.

Parameters:
timeoutUnit - the timeout unit

setFormatterClassName

public static void setFormatterClassName(String formatterClassName)
Specifies the name of a class that implements AvroFormatter to (optionally) format KeyValueVersion instances into Avro IndexedRecords.

Parameters:
formatterClassName - the name of the class implementing AvroFormatter.

Oracle NoSQL Database
version 11gR2.2.0.26

Copyright (c) 2011, 2013 Oracle and/or its affiliates. All rights reserved.