public abstract class KVInputFormatBase<K,V> extends org.apache.hadoop.mapreduce.InputFormat<K,V>
Parameters may be passed using either the static setters on this class or through the Hadoop JobContext configuration parameters. The following parameters are recognized:
oracle.kv.kvstore- the KV Store name for this InputFormat to operate on. This is equivalent to the
oracle.kv.hosts- one or more
hostname:portpairs separated by commas naming hosts in the KV Store. This is equivalent to the
oracle.kv.batchSize- Specifies the suggested number of keys to fetch during each network round trip by the InputFormat. If 0, an internally determined default is used. This is equivalent to the
oracle.kv.parentKey- Specifies the parent key whose "child" KV pairs are to be returned by the InputFormat. null will result in fetching all keys in the store. If non-null, the major key path must be a partial path and the minor key path must be empty. This is equivalent to the
oracle.kv.subRange- Specifies a sub range to further restrict the range under the parentKey to the major path components in this sub range. It may be null. This is equivalent to the
oracle.kv.depth- Specifies whether the parent and only children or all descendents are returned. If null, Depth.PARENT_AND_DESCENDENTS is implied. This is equivalent to the
oracle.kv.consistency- Specifies the read consistency associated with the lookup of the child KV pairs. Version- and Time-based consistency may not be used. If null, the default consistency is used. This is equivalent to the
oracle.kv.timeout- Specifies an upper bound on the time interval for processing a particular KV retrieval. A best effort is made to not exceed the specified limit. If zero, the default request timeout is used. This value is always in milliseconds. This is equivalent to the
oracle.kv.formatterClass- Specifies the name of a class that implements
AvroFormatterto (optionally) format
KeyValueVersioninstances into Avro IndexedRecords. This is only meaningful when
One case where specifying a value for this parameter is useful is when you are using Oracle Loader for Hadoop (OLH) to read Avro records from Oracle NoSQL Database. Since the Avro records (the NoSQL Database record values) are passed directly to OLH, the NoSQL Database keys are not available for mapping into the target Oracle Database table. However, the formatter class is passed both the NoSQL Database key and value so a new Avro record containing both the value and key can be created and returned to be passed to OLH.
This is equivalent to the
Internally, the KVInputFormatBase class utilizes
KVStore.storeIterator to retrieve records. You should refer to the javadoc
for that method for information about the various parameters.
KVInputFormatBase creates one split per Oracle NoSQL DB
location value for each split is an array of
hosts holding the partition. If the consistency passed to
NONE_REQUIRED (the default), then
InputSplit.getLocations() will return an array of the names of the master
and the replica(s) which contain the partition. If the consistency is
locations will only
contain the name of the master. This means that if Hadoop job trackers are
running on the nodes named in the
location array, Hadoop will
generally attempt to run the subtasks for a particular partition on those
nodes where the data is stored and replicated. Hadoop and Oracle NoSQL DB
administrators should be careful about co-location of Oracle NoSQL DB and
Hadoop processes since they may competing for resources.
Partitions in Oracle NoSQL DB are considered to be roughly equal in size;
InputSplit.getLength() always returns
A simple example demonstrating the Oracle NoSQL DB Hadoop
oracle.kv.hadoop.InputFormat class reading data from Hadoop in a Map/Reduce
job and counting the number of records for each major key in the store can
be found in the
KVHOME/examples/hadoop directory. The javadoc
for that program describes the simple Map/Reduce processing as well as how
to invoke the program in Hadoop.
|Modifier and Type||Method and Description|
Specifies the suggested number of keys to fetch during each network round trip by the InputFormat.
Specifies the read consistency associated with the lookup of the child KV pairs.
Specifies whether the parent and only children or all descendents are returned.
Specifies the order in which records are returned by the InputFormat.
Set the KV Helper host:port pair(s) for this InputFormat to operate on.
Set the KV Store name for this InputFormat to operate on.
Specifies the parent key whose "child" KV pairs are to be returned by the InputFormat.
Specifies a sub range to further restrict the range under the parentKey to the major path components in this sub range.
Specifies an upper bound on the time interval for processing a particular KV retrieval.
Specifies the unit of the timeout parameter.
public static void setKVStoreName(String kvStoreName)
kvStoreName- the KV Store name
public static void setKVHelperHosts(String kvHelperHosts)
kvHelperHosts- array of hostname:port strings of any hosts in the KV Store.
public static void setDirection(Direction direction)
direction- the direction to retrieve data
public static void setBatchSize(int batchSize)
batchSize- the suggested number of keys to fetch during each network round trip.
public static void setParentKey(Key parentKey)
parentKey- the parentKey
public static void setSubRange(KeyRange subRange)
subRange- the sub range.
public static void setDepth(Depth depth)
depth- the depth.
public static void setConsistency(Consistency consistency)
consistency- the consistency
public static void setTimeout(long timeout)
timeout- the timeout
public static void setTimeoutUnit(TimeUnit timeoutUnit)
timeoutUnit- the timeout unit
public static void setFormatterClassName(String formatterClassName)
AvroFormatterto (optionally) format
KeyValueVersioninstances into Avro IndexedRecords.
formatterClassName- the name of the class implementing AvroFormatter.
Copyright (c) 2011, 2013 Oracle and/or its affiliates. All rights reserved.