A basic example demonstrating how to use the class
oracle.kv.hadoop.table.TableInputFormat
to access the
rows of a table in an Oracle NoSQL Database from within a Hadoop
MapReduce job for the purpose of counting the number of records
in the table.
The map() function is passed the PrimaryKey and Row for each record in
the KV Store and outputs k/v pairs containing the component field names
of the PrimaryKey as the output key and a value of 1. The reduce phase
then uses the output of the map phase to count the number of records in
the table. This MapReduce task is similar to the ubiquitous Hadoop
MapReduce WordCount example.
The TableInputFormat and related classes are located in the lib/kvclient.jar
file, so kvclient.jar must be included in the Hadoop classpath at runtime.
The arguments to the program are the kvstore name, the helperHost:port pair,
the name of the table whose rows will be counted, and the HDFS output path
for the MapReduce job.
For example, if one builds this class (and its subclasses) and puts the
resulting class files into a JAR file named "myjar.jar", then this example
MapReduce job can be executed using a command like the following:
export HADOOP_CLASSPATH=...:KVHOME/lib/kvclient.jar
bin/hadoop jar myjar.jar hadoop.table.CountTableRows \
-libjars KVHOME/lib/kvclient.jar \
[-Doracle.kv.primaryKey=fieldName,fieldValue,fieldType,...] \
mystore myhost:myport mytable /myHDFSoutputdir
Note that TableInputFormat does not yet fully support the Oracle NoSQL
Database security model. Therefore, a secure KV Store should not be
accessed when running this example. The security model will be supported
in a future release; at which time, if the program is to access a secure
KV Store, the command line above must also include the path to the
appropriate login file at the end of the argument list.