Class KVInputFormatBase<K,V>
- Direct Known Subclasses:
KVInputFormat
Parameters may be passed using either the static setters on this class or through the Hadoop JobContext configuration parameters. The following parameters are recognized:
oracle.kv.kvstore
- the KV Store name for this InputFormat to operate on. This is equivalent to thesetKVStoreName(java.lang.String)
method.oracle.kv.hosts
- one or morehostname:port
pairs separated by commas naming hosts in the KV Store. This is equivalent to thesetKVHelperHosts(java.lang.String[])
method.oracle.kv.batchSize
- Specifies the suggested number of keys to fetch during each network round trip by the InputFormat. If 0, an internally determined default is used. This is equivalent to thesetBatchSize(int)
method.oracle.kv.parentKey
- Specifies the parent key whose "child" KV pairs are to be returned by the InputFormat. null will result in fetching all keys in the store. If non-null, the major key path must be a partial path and the minor key path must be empty. This is equivalent to thesetParentKey(oracle.kv.Key)
method.oracle.kv.subRange
- Specifies a sub range to further restrict the range under the parentKey to the major path components in this sub range. It may be null. This is equivalent to thesetSubRange(oracle.kv.KeyRange)
method.oracle.kv.depth
- Specifies whether the parent and only children or all descendents are returned. If null, Depth.PARENT_AND_DESCENDENTS is implied. This is equivalent to thesetDepth(oracle.kv.Depth)
method.oracle.kv.consistency
- Specifies the read consistency associated with the lookup of the child KV pairs. Version- and Time-based consistency may not be used. If null, the default consistency is used. This is equivalent to thesetConsistency(oracle.kv.Consistency)
method.oracle.kv.timeout
- Specifies an upper bound on the time interval for processing a particular KV retrieval. A best effort is made to not exceed the specified limit. If zero, the default request timeout is used. This value is always in milliseconds. This is equivalent to thesetTimeout(long)
andsetTimeoutUnit(java.util.concurrent.TimeUnit)
methods.
Internally, the KVInputFormatBase class utilizes KVStore.storeIterator
to
retrieve records. You should refer to the javadoc for that method
for information about the various parameters.
KVInputFormatBase
creates one split per Oracle NoSQL
DB partition. The location
value for each split is an
array of hosts holding the partition. If the consistency passed to
KVInputFormatBase
is NONE_REQUIRED
(the default), then InputSplit.getLocations()
will return an array of the names of the
master and the replica(s) which contain the partition.
Alternatively, if the consistency is NONE_REQUIRED_NO_MASTER
, then
the array returned will contain only the names of the replica(s);
not the master. Finally, if the consistency is ABSOLUTE
, then the array returned will
contain only the name of the master. This means that if Hadoop job
trackers are running on the nodes named in the returned
location
array, Hadoop will generally attempt to run
the subtasks for a particular partition on those nodes where the
data is stored and replicated. Hadoop and Oracle NoSQL DB
administrators should be careful about co-location of Oracle NoSQL
DB and Hadoop processes since they may compete for resources.
Partitions in Oracle NoSQL DB are considered to be roughly equal in size;
therefore InputSplit.getLength()
always returns
1.
A simple example demonstrating the Oracle NoSQL DB Hadoop
oracle.kv.hadoop.InputFormat class reading data from Hadoop in a Map/Reduce
job and counting the number of records for each major key in the store can
be found in the KVHOME/examples/hadoop
directory. The javadoc
for that program describes the simple Map/Reduce processing as well as how
to invoke the program in Hadoop.
- Since:
- 2.0
-
Method Summary
Modifier and TypeMethodDescriptionstatic void
setBatchSize
(int batchSize) Specifies the suggested number of keys to fetch during each network round trip by the InputFormat.static void
setConsistency
(Consistency consistency) Specifies the read consistency associated with the lookup of the child KV pairs.static void
Specifies whether the parent and only children or all descendents are returned.static void
setKVHelperHosts
(String[] kvHelperHosts) Set the KV Helper host:port pair(s) for this InputFormat to operate on.static void
setKVSecurity
(String kvStoreSecurity) Allows KVStore security to be set.static void
setKVStoreName
(String kvStoreName) Set the KV Store name for this InputFormat to operate on.static void
setParentKey
(Key parentKey) Specifies the parent key whose "child" KV pairs are to be returned by the InputFormat.static void
setSubRange
(KeyRange subRange) Specifies a sub range to further restrict the range under the parentKey to the major path components in this sub range.static void
setTimeout
(long timeout) Specifies an upper bound on the time interval for processing a particular KV retrieval.static void
setTimeoutUnit
(TimeUnit timeoutUnit) Specifies the unit of the timeout parameter.Methods inherited from class org.apache.hadoop.mapreduce.InputFormat
createRecordReader
-
Method Details
-
setKVStoreName
Set the KV Store name for this InputFormat to operate on. This is equivalent to passing theoracle.kv.kvstore
Hadoop property.- Parameters:
kvStoreName
- the KV Store name
-
setKVHelperHosts
Set the KV Helper host:port pair(s) for this InputFormat to operate on. This is equivalent to passing theoracle.kv.hosts
Hadoop property.- Parameters:
kvHelperHosts
- array of hostname:port strings of any hosts in the KV Store.
-
setBatchSize
public static void setBatchSize(int batchSize) Specifies the suggested number of keys to fetch during each network round trip by the InputFormat. If 0, an internally determined default is used. This is equivalent to passing theoracle.kv.batchSize
Hadoop property.- Parameters:
batchSize
- the suggested number of keys to fetch during each network round trip.
-
setParentKey
Specifies the parent key whose "child" KV pairs are to be returned by the InputFormat. null will result in fetching all keys in the store. If non-null, the major key path must be a partial path and the minor key path must be empty. This is equivalent to passing theoracle.kv.parentKey
Hadoop property.- Parameters:
parentKey
- the parentKey
-
setSubRange
Specifies a sub range to further restrict the range under the parentKey to the major path components in this sub range. It may be null. This is equivalent to passing theoracle.kv.subRange
Hadoop property.- Parameters:
subRange
- the sub range.
-
setDepth
Specifies whether the parent and only children or all descendents are returned. If null, Depth.PARENT_AND_DESCENDENTS is implied. This is equivalent to passing theoracle.kv.depth
Hadoop property.- Parameters:
depth
- the depth.
-
setConsistency
Specifies the read consistency associated with the lookup of the child KV pairs. Version- and Time-based consistency may not be used. If null, the default consistency is used. This is equivalent to passing theoracle.kv.consistency
Hadoop property.- Parameters:
consistency
- the consistency
-
setTimeout
public static void setTimeout(long timeout) Specifies an upper bound on the time interval for processing a particular KV retrieval. A best effort is made to not exceed the specified limit. If zero, the default request timeout is used. This is equivalent to passing theoracle.kv.timeout
Hadoop property.- Parameters:
timeout
- the timeout
-
setTimeoutUnit
Specifies the unit of the timeout parameter. It may be null only if timeout is zero. This is equivalent to passing theoracle.kv.timeout
Hadoop property.- Parameters:
timeoutUnit
- the timeout unit
-
setKVSecurity
Allows KVStore security to be set. The kvStoreSecurity file is a property file utilizing the format supported by the CLI tools. This security file and any wallet or password store needed to support it must be distributed on the hadoop cluster.- Since:
- 3.0
-