A simple example demonstrating how to use the Oracle NoSQL DB Hadoop
oracle.kv.hadoop.KVInputFormat class to read data from NoSQL Database in a
Map/Reduce job and count the number of records for each major key in the
store.
The map() function is passed the Key and Value for each record in the KV
Store and outputs k/v pairs containing the major path components as the
output key and a value of 1. The reduce step sums the values for each of
the records with the same key. This M/R task is similar to the ubiquitous
Hadoop Map/Reduce WordCount example.
The KV Keys passed to the Map function are in the canonical format described
in the javadoc for the oracle.kv.Key.toString() method.
The KVInputFormat and related classes are located in the lib/kvclient.jar
file so this must be included in the Hadoop classpath at runtime.
The arguments to the program are the kvstore name, the helperHost:port pair,
the HDFS output path and optionally, the login file path.
For example, if you build this class (and its subclasses) and put it into
myjar.jar, you can invoke with a command similar to this:
export HADOOP_CLASSPATH=...:KVHOME/lib/kvclient.jar
bin/hadoop jar myjar.jar hadoop.CountMinorKeys \
-libjars KVHOME/lib/kvclient.jar \
mystore myhost:myport /myHDFSoutputdir [mySecurityFilePath]
If you are accessing a secured KV Store using Oracle Wallet, additional
Oracle PKI jars obtained from EE package need to be added to the class
path. The local login file path needs to be specified as the program
argument. See example below:
export HADOOP_CLASSPATH=...:KVEEHOME/lib/kvclient.jar:\
KVEEHOME/lib/oraclepki.jar
export LIBJARS=KVEEHOME/lib/kvclient.jar,KVEEHOME/lib/oraclepki.jar,\
KVEEHOME/lib/osdt_cert.jar,KVEEHOME/lib/osdt_core.jar
bin/hadoop jar myjar.jar hadoop.CountMinorKeys \
-libjars ${LIBJARS} \
mystore myhost:myport /myHDFSoutputdir mySecurityFilePath