public abstract class KVInputFormatBase<K,V> extends InputFormat<K,V>
Parameters may be passed using either the static setters on this class or through the Hadoop JobContext configuration parameters. The following parameters are recognized:
oracle.kv.kvstore
- the KV Store name for this InputFormat
to operate on. This is equivalent to the setKVStoreName(java.lang.String)
method.
oracle.kv.hosts
- one or more hostname:port
pairs separated by commas naming hosts in the KV Store. This is equivalent
to the setKVHelperHosts(java.lang.String[])
method.
oracle.kv.batchSize
- Specifies the suggested number of
keys to fetch during each network round trip by the InputFormat. If 0, an
internally determined default is used. This is equivalent to the setBatchSize(int)
method.
oracle.kv.parentKey
- Specifies the parent key whose
"child" KV pairs are to be returned by the InputFormat. null will result
in fetching all keys in the store. If non-null, the major key path must be a
partial path and the minor key path must be empty. This is equivalent to the
setParentKey(oracle.kv.Key)
method.
oracle.kv.subRange
- Specifies a sub range to further
restrict the range under the parentKey to the major path components in this
sub range. It may be null. This is equivalent to the setSubRange(oracle.kv.KeyRange)
method.
oracle.kv.depth
- Specifies whether the parent and only
children or all descendents are returned. If null,
Depth.PARENT_AND_DESCENDENTS is implied. This is equivalent to the
setDepth(oracle.kv.Depth)
method.
oracle.kv.consistency
- Specifies the read consistency
associated with the lookup of the child KV pairs. Version- and Time-based
consistency may not be used. If null, the default consistency is used.
This is equivalent to the setConsistency(oracle.kv.Consistency)
method.
oracle.kv.timeout
- Specifies an upper bound on the time
interval for processing a particular KV retrieval. A best effort is made to
not exceed the specified limit. If zero, the default request timeout is
used. This value is always in milliseconds. This is equivalent to the
setTimeout(long)
and setTimeoutUnit(java.util.concurrent.TimeUnit)
methods.
oracle.kv.formatterClass
- Specifies the name of a class
that implements AvroFormatter
to (optionally) format KeyValueVersion
instances into Avro IndexedRecords. This is only meaningful
when KVAvroInputFormat
is used.
One case where specifying a value for this parameter is useful is when you are using Oracle Loader for Hadoop (OLH) to read Avro records from Oracle NoSQL Database. Since the Avro records (the NoSQL Database record values) are passed directly to OLH, the NoSQL Database keys are not available for mapping into the target Oracle Database table. However, the formatter class is passed both the NoSQL Database key and value so a new Avro record containing both the value and key can be created and returned to be passed to OLH.
This is equivalent to the setFormatterClassName(java.lang.String)
method.
Internally, the KVInputFormatBase class utilizes KVStore.storeIterator
to
retrieve records. You should refer to the javadoc for that method
for information about the various parameters.
KVInputFormatBase
creates one split per Oracle NoSQL
DB partition. The location
value for each split is an
array of hosts holding the partition. If the consistency passed to
KVInputFormatBase
is NONE_REQUIRED
(the default), then InputSplit.getLocations()
will return an array of the names of the
master and the replica(s) which contain the partition.
Alternatively, if the consistency is NONE_REQUIRED_NO_MASTER
, then
the array returned will contain only the names of the replica(s);
not the master. Finally, if the consistency is ABSOLUTE
, then the array returned will
contain only the name of the master. This means that if Hadoop job
trackers are running on the nodes named in the returned
location
array, Hadoop will generally attempt to run
the subtasks for a particular partition on those nodes where the
data is stored and replicated. Hadoop and Oracle NoSQL DB
administrators should be careful about co-location of Oracle NoSQL
DB and Hadoop processes since they may compete for resources.
Partitions in Oracle NoSQL DB are considered to be roughly equal in size;
therefore InputSplit.getLength()
always returns
1.
A simple example demonstrating the Oracle NoSQL DB Hadoop
oracle.kv.hadoop.InputFormat class reading data from Hadoop in a Map/Reduce
job and counting the number of records for each major key in the store can
be found in the KVHOME/examples/hadoop
directory. The javadoc
for that program describes the simple Map/Reduce processing as well as how
to invoke the program in Hadoop.
Modifier and Type | Method and Description |
---|---|
static void |
setBatchSize(int batchSize)
Specifies the suggested number of keys to fetch during each network
round trip by the InputFormat.
|
static void |
setConsistency(Consistency consistency)
Specifies the read consistency associated with the lookup of the child
KV pairs.
|
static void |
setDepth(Depth depth)
Specifies whether the parent and only children or all descendents are
returned.
|
static void |
setFormatterClassName(String formatterClassName)
Specifies the name of a class that implements
AvroFormatter
to (optionally) format KeyValueVersion instances into
Avro IndexedRecords. |
static void |
setKVHelperHosts(String[] kvHelperHosts)
Set the KV Helper host:port pair(s) for this InputFormat to operate on.
|
static void |
setKVSecurity(String kvStoreSecurity)
Allows KVStore security to be set.
|
static void |
setKVStoreName(String kvStoreName)
Set the KV Store name for this InputFormat to operate on.
|
static void |
setParentKey(Key parentKey)
Specifies the parent key whose "child" KV pairs are to be returned by
the InputFormat.
|
static void |
setSubRange(KeyRange subRange)
Specifies a sub range to further restrict the range under the parentKey
to the major path components in this sub range.
|
static void |
setTimeout(long timeout)
Specifies an upper bound on the time interval for processing a
particular KV retrieval.
|
static void |
setTimeoutUnit(TimeUnit timeoutUnit)
Specifies the unit of the timeout parameter.
|
createRecordReader, getSplits
public static void setKVStoreName(String kvStoreName)
oracle.kv.kvstore
Hadoop
property.kvStoreName
- the KV Store namepublic static void setKVHelperHosts(String[] kvHelperHosts)
oracle.kv.hosts
Hadoop
property.kvHelperHosts
- array of hostname:port strings of any hosts in the
KV Store.public static void setBatchSize(int batchSize)
oracle.kv.batchSize
Hadoop property.batchSize
- the suggested number of keys to fetch during each
network round trip.public static void setParentKey(Key parentKey)
oracle.kv.parentKey
Hadoop property.parentKey
- the parentKeypublic static void setSubRange(KeyRange subRange)
oracle.kv.subRange
Hadoop
property.subRange
- the sub range.public static void setDepth(Depth depth)
oracle.kv.depth
Hadoop
property.depth
- the depth.public static void setConsistency(Consistency consistency)
oracle.kv.consistency
Hadoop property.consistency
- the consistencypublic static void setTimeout(long timeout)
oracle.kv.timeout
Hadoop
property.timeout
- the timeoutpublic static void setTimeoutUnit(TimeUnit timeoutUnit)
oracle.kv.timeout
Hadoop property.timeoutUnit
- the timeout unitpublic static void setFormatterClassName(String formatterClassName)
AvroFormatter
to (optionally) format KeyValueVersion
instances into
Avro IndexedRecords.formatterClassName
- the name of the class implementing
AvroFormatter.public static void setKVSecurity(String kvStoreSecurity)
Copyright (c) 2011, 2015 Oracle and/or its affiliates. All rights reserved.