public abstract class KVInputFormatBase<K,V>
extends org.apache.hadoop.mapreduce.InputFormat<K,V>
Parameters may be passed using either the static setters on this class or through the Hadoop JobContext configuration parameters. The following parameters are recognized:
oracle.kv.kvstore
- the KV Store name for this InputFormat
to operate on. This is equivalent to the setKVStoreName(java.lang.String)
method.
oracle.kv.hosts
- one or more hostname:port
pairs separated by commas naming hosts in the KV Store. This is equivalent
to the setKVHelperHosts(java.lang.String[])
method.
oracle.kv.batchSize
- Specifies the suggested number of
keys to fetch during each network round trip by the InputFormat. If 0, an
internally determined default is used. This is equivalent to the setBatchSize(int)
method.
oracle.kv.parentKey
- Specifies the parent key whose
"child" KV pairs are to be returned by the InputFormat. null will result
in fetching all keys in the store. If non-null, the major key path must be a
partial path and the minor key path must be empty. This is equivalent to the
setParentKey(oracle.kv.Key)
method.
oracle.kv.subRange
- Specifies a sub range to further
restrict the range under the parentKey to the major path components in this
sub range. It may be null. This is equivalent to the setSubRange(oracle.kv.KeyRange)
method.
oracle.kv.depth
- Specifies whether the parent and only
children or all descendents are returned. If null,
Depth.PARENT_AND_DESCENDENTS is implied. This is equivalent to the
setDepth(oracle.kv.Depth)
method.
oracle.kv.consistency
- Specifies the read consistency
associated with the lookup of the child KV pairs. Version- and Time-based
consistency may not be used. If null, the default consistency is used.
This is equivalent to the setConsistency(oracle.kv.Consistency)
method.
oracle.kv.timeout
- Specifies an upper bound on the time
interval for processing a particular KV retrieval. A best effort is made to
not exceed the specified limit. If zero, the default request timeout is
used. This value is always in milliseconds. This is equivalent to the
setTimeout(long)
and setTimeoutUnit(java.util.concurrent.TimeUnit)
methods.
oracle.kv.formatterClass
- Specifies the name of a class
that implements AvroFormatter
to (optionally) format KeyValueVersion
instances into Avro IndexedRecords. This is only meaningful
when KVAvroInputFormat
is used.
One case where specifying a value for this parameter is useful is when you are using Oracle Loader for Hadoop (OLH) to read Avro records from Oracle NoSQL Database. Since the Avro records (the NoSQL Database record values) are passed directly to OLH, the NoSQL Database keys are not available for mapping into the target Oracle Database table. However, the formatter class is passed both the NoSQL Database key and value so a new Avro record containing both the value and key can be created and returned to be passed to OLH.
This is equivalent to the setFormatterClassName(java.lang.String)
method.
Internally, the KVInputFormatBase class utilizes KVStore.storeIterator
to retrieve records. You should refer to the javadoc
for that method for information about the various parameters.
KVInputFormatBase
creates one split per Oracle NoSQL DB
partition. The location
value for each split is an array of
hosts holding the partition. If the consistency passed to
KVInputFormatBase
is NONE_REQUIRED
(the default), then InputSplit.getLocations()
will return an array of the names of the master
and the replica(s) which contain the partition. If the consistency is
ABSOLUTE
, then locations
will only
contain the name of the master. This means that if Hadoop job trackers are
running on the nodes named in the location
array, Hadoop will
generally attempt to run the subtasks for a particular partition on those
nodes where the data is stored and replicated. Hadoop and Oracle NoSQL DB
administrators should be careful about co-location of Oracle NoSQL DB and
Hadoop processes since they may competing for resources.
Partitions in Oracle NoSQL DB are considered to be roughly equal in size;
therefore InputSplit.getLength()
always returns
1.
A simple example demonstrating the Oracle NoSQL DB Hadoop
oracle.kv.hadoop.InputFormat class reading data from Hadoop in a Map/Reduce
job and counting the number of records for each major key in the store can
be found in the KVHOME/examples/hadoop
directory. The javadoc
for that program describes the simple Map/Reduce processing as well as how
to invoke the program in Hadoop.
Modifier and Type | Method and Description |
---|---|
static void |
setBatchSize(int batchSize)
Specifies the suggested number of keys to fetch during each network
round trip by the InputFormat.
|
static void |
setConsistency(Consistency consistency)
Specifies the read consistency associated with the lookup of the child
KV pairs.
|
static void |
setDepth(Depth depth)
Specifies whether the parent and only children or all descendents are
returned.
|
static void |
setDirection(Direction direction)
Specifies the order in which records are returned by the InputFormat.
|
static void |
setFormatterClassName(String formatterClassName)
Specifies the name of a class that implements
AvroFormatter
to (optionally) format KeyValueVersion instances into
Avro IndexedRecords. |
static void |
setKVHelperHosts(String[] kvHelperHosts)
Set the KV Helper host:port pair(s) for this InputFormat to operate on.
|
static void |
setKVStoreName(String kvStoreName)
Set the KV Store name for this InputFormat to operate on.
|
static void |
setParentKey(Key parentKey)
Specifies the parent key whose "child" KV pairs are to be returned by
the InputFormat.
|
static void |
setSubRange(KeyRange subRange)
Specifies a sub range to further restrict the range under the parentKey
to the major path components in this sub range.
|
static void |
setTimeout(long timeout)
Specifies an upper bound on the time interval for processing a
particular KV retrieval.
|
static void |
setTimeoutUnit(TimeUnit timeoutUnit)
Specifies the unit of the timeout parameter.
|
public static void setKVStoreName(String kvStoreName)
oracle.kv.kvstore
Hadoop
property.kvStoreName
- the KV Store namepublic static void setKVHelperHosts(String[] kvHelperHosts)
oracle.kv.hosts
Hadoop
property.kvHelperHosts
- array of hostname:port strings of any hosts in the
KV Store.public static void setDirection(Direction direction)
direction
- the direction to retrieve datapublic static void setBatchSize(int batchSize)
oracle.kv.batchSize
Hadoop property.batchSize
- the suggested number of keys to fetch during each
network round trip.public static void setParentKey(Key parentKey)
oracle.kv.parentKey
Hadoop property.parentKey
- the parentKeypublic static void setSubRange(KeyRange subRange)
oracle.kv.subRange
Hadoop
property.subRange
- the sub range.public static void setDepth(Depth depth)
oracle.kv.depth
Hadoop
property.depth
- the depth.public static void setConsistency(Consistency consistency)
oracle.kv.consistency
Hadoop property.consistency
- the consistencypublic static void setTimeout(long timeout)
oracle.kv.timeout
Hadoop
property.timeout
- the timeoutpublic static void setTimeoutUnit(TimeUnit timeoutUnit)
oracle.kv.timeout
Hadoop property.timeoutUnit
- the timeout unitpublic static void setFormatterClassName(String formatterClassName)
AvroFormatter
to (optionally) format KeyValueVersion
instances into
Avro IndexedRecords.formatterClassName
- the name of the class implementing
AvroFormatter.Copyright (c) 2011, 2013 Oracle and/or its affiliates. All rights reserved.