See: Description
Class | Description |
---|---|
CountTableRows |
A basic example demonstrating how to use the class
oracle.kv.hadoop.table.TableInputFormat to access the
rows of a table in an Oracle NoSQL Database from within a Hadoop
MapReduce job for the purpose of counting the number of records
in the table. |
CountTableRows.Map |
Nested class used to map input key/value pairs to a set of
intermediate key/value pairs in preparation for the reduce phase
of the associated MapReduce job.
|
CountTableRows.Reduce |
Nested class used to reduce to a smaller set of values, the set of
intermediate values, sharing a key, which were produced by the
mapping phase of the job.
|
KVSecurityCreation |
Standalone program that creates (or deletes) a password file or Oracle
Wallet for use when running the associated example MapReduce job
with a secure KVStore.
|
KVSecurityUtil |
Utility class that provides convenience methods related to the Oracle
NoSQL Database security model.
|
LoadVehicleTable |
Class that creates an example table in a given NoSQL Database store and
then uses the Table API to populate the table with sample records.
|
Apache Hadoop
,
MapReduce
,
and its programming model; that is, become familiar with how to write and deploy
a MapReduce job.
/opt/ondb/kv
—
that is network reachable from the nodes of the Hadoop cluster.
/opt/ondb/example-store
—
to 3 machines (real or virtual) with host names, kv-host-1,
kv-host-2, and kv-host-3; where an admin service,
listening on port 5000, is deployed on each host.
/opt/apps
.
<KVHOME>
and <KVROOT>
environment variables, as well as the store name,
host names, admin port, and example code location described above should allow you to more easily
understand and use the example commands presented below. Combined with the information contained
in the Oracle NoSQL Database Getting Started Guide, along with the
Oracle NoSQL Database Admin Guide and Oracle NoSQL Database Security Guide,
you should then be able to generalize and extend these examples to your own
particular development scenario; substituting the values specific to the
given environment where necessary.
Detailed instructions for deploying a non-secure KVStore are provided in
Appendix A
.
Similarly, Appendix B
provides instructions for deploying a KVStore configured for security.
A Brief Hadoop Primer
Hadoop can be thought of as consisting of two primary components:
The various Hadoop distributions that are available (for example,
Hadoop Distributed File System
(referred to as, HDFS).
MapReduce
programming model; which consists of a Map Phase
that includes a mapping step and a shuffle-and-sort step
that together perform filtering and sorting, and a Reduce Phase
that performs a summary operation on the mapped and sorted results.
Cloudera
or
Hortonworks
)
each provide an infrastructure which orchestrates the MapReduce
processing that is performed. It does this by marshalling the distributed
servers that run the various tasks in parallel, by managing all communications
and data transfers between the various parts of the system, and by providing for
redundancy and fault tolerance.
In addition, the Hadoop infrastructure provides a number of interactive tools — such as a command line interface (the Hadoop CLI) — that provide access to the data stored in HDFS. But the typical way application developers read, write, and process data stored in HDFS is via MapReduce jobs; which are programs that adhere to the Hadoop MapReduce programming model. For more detailed information on Hadoop HDFS and MapReduce, consult the Hadoop MapReduce tutorial.
As indicated above, with the introduction of the Table API, a new set of
interfaces and classes that satisfy the Hadoop MapReduce programming model
have been provided which support writing MapReduce jobs that can be run against
table data contained in a KVStore. These new classes are located in the
oracle.kv.hadoop.table
package, and consist of the following types:
As described below, it is through the specific implementation of the
org.apache.hadoop.mapreduce.InputFormat
,
which specifies how the associated MapReduce job reads its input data (using a Hadoop
RecordReader
),
and splits up the input data into logical sections, each referred to as an
InputSplit
.
org.apache.hadoop.mapreduce.OutputFormat
,
which specifies how the associated MapReduce job writes its output data (using a Hadoop
RecordWriter
).
org.apache.hadoop.mapreduce.RecordReader
,
which specifies how the mapped keys and values are located and retrieved
during MapReduce processing.
org.apache.hadoop.mapreduce.InputSplit
,
which represents the data to be processed by an individual MapReduce
Mapper (one Mapper per
InputSplit
).
InputFormat
class provided in the Oracle NoSQL Database distribution that the Hadoop MapReduce
infrastructure obtains access to a given KVStore and the desired table data
that the store contains.
oracle.kv.hadoop.table.TableInputFormat
oracle.kv.hadoop.table.TableInputSplit
oracle.kv.hadoop.table.TableRecordReader
Currently, Oracle NoSQL Database does not define a subclass of the Hadoop
OutputFormat
class. This means that it is not currently possible to
write data from a MapReduce job into a KVStore.
That is, from within a MapReduce job, you can only retrieve data from
a desired KVStore table and process that data.
The CountTableRows Example
Assuming you installed the separate example distribution under the directory
/opt/apps
, the hadoop.table
example package
would then be contained in the following location:
/opt/apps/kv/examples/
hadoop/table/
CountTableRows.java
LoadVehicleTable.java
KVSecurityCreation.java
KVSecurityUtil.java
In order to run the CountTableRows
example MapReduce job, a
KVStore — either secure or non-secure — must first be deployed,
and a table must be created and populated with data. Thus, before attempting
to execute CountTableRows
, either deploy a non-secure KVStore
using the steps outlined in
Appendix A
,
or start a KVStore configured for security using the steps presented in
Appendix B
.
Once a KVStore has been deployed as described in either
Appendix A
,
or
Appendix B
,
the standalone Java program LoadVehicleTable
can be run
against either type of store to create a table with the name and schema
expected by CountTableRows
, and populate it with rows of
data consistent with the table's schema. Once the table is created and
populated with example data, CountTableRows
can then be
executed to run a MapReduce job that counts the number of rows of data
in the table.
In addition to the LoadVehicleTable
program, the example package also contains the
classes KVSecurityCreation
and KVSecurityUtil
; which
are provided to support running CountTableRows
against a secure
KVStore. The standalone Java program KVSecurityCreation
is provided
as a convenience, and can be run to create (or delete) a password file or Oracle
Wallet — along with associated client side and server side login files —
that CountTableRows
will need to interact with a secure store. And the
KVSecurityUtil
class provides convenient utility methods that
CountTableRows
uses to create and process the various security
artifacts it uses for secure access.
The next sections explain how to compile and execute LoadVehicleTable
to create and populate the required example table in the deployed store; how to
compile and execute KVSecurityCreation
to create or delete any
security credentials that may be needed by CountTableRows
; and
finally, how to compile, build (JAR), and then execute the CountTableRows
MapReduce job on the Hadoop cluster that was deployed for this example.
Schema for the Example Table 'vehicleTable'
To execute the CountTableRows
MapReduce job, a table named
vehicleTable — having the schema shown in the table below — must
be created in the KVStore that was deployed for this example; where the data types
specified in the schema are defined by the Oracle NoSQL Database Table API (see
oracle.kv.table.FieldDef.Type
) .
Field Name | Field Type | |||
---|---|---|---|---|
type | STRING | |||
make | STRING | |||
model | STRING | |||
class | STRING | |||
color | STRING | |||
price | DOUBLE | |||
count | INTEGER | |||
Primary Key Field Names | ||||
type | make | model | class | |
Shard Key Field Names | ||||
type | make | model |
Thus, vehicleTable consists of rows representing a particular vehicle a dealer might have in stock for purchase. Each such row contains fields specifying the "type" of vehicle (for example, car, truck, SUV, etc.), the "make" of the vehicle (Ford, GM, Chrysler, etc.), the "model" (Explorer, Camaro, Lebaron, etc.), the vehicle "class" (4WheelDrive, FrontWheelDrive, etc.), "color", "price", and finally the number of vehicles in stock (the "count").
Although you can enter individual commands in the admin CLI to create a table with the above
schema, the preferred approach is to employ the
Oracle NoSQL Database Data Definition Language (DDL)
to create the desired table. One way to
accomplish this is to follow the instructions presented in the next sections to compile
and execute the LoadVehicleTable
program; which populates the desired
table after using the DDL to create it.
Create and Populate 'vehicleTable' with Example Data
Assuming the KVStore
— either
non-secure
or
secure
—
has been deployed with <KVHOME> equal to /opt/ondb/kv
,
the LoadVehicleTable
program that is supplied with this example as
a convenience can be executed to create and populate the table named vehicleTable.
Before executing LoadVehicleTable
though, that program must first
be compiled. To do this, type the following command from the OS command line:
> cd /opt/apps/kv > javac -classpath /opt/ondb/kv/lib/kvstore.jar:examples examples/hadoop/table/LoadVehicleTable.javawhich should produce the file:
/opt/apps/kv/examples/hadoop/table/LoadVehicleTable.class
In the example command line above, the argument -nops 79
requests that 79 rows be written to the vehicleTable. If more or
less than that number of rows is desired, then the value of the
-nops
parameter should be changed.
If LoadVehicleTable
is executed a second time and the
optional -delete
parameter is specified, then all rows
added by any previous executions of LoadVehicleTable
are
deleted from the table prior to adding the new rows. Otherwise, all
pre-existing rows are left in place, and the number of rows in the table
will be increased by the requested -nops
number of new rows.
Note that because of the way LoadVehicleTable
generates
records, it is possible that a given record has already been added to
the table; either during a previous call to LoadVehicleTable
,
or during the current call. As a result, it is not uncommon for the
number of unique rows added to be less than the number requested. Because
of this, when processing has completed, LoadVehicleTable
will display the number of unique rows that are actually added to the table,
along with the total number of rows currently in the table (from previous
runs).
— Creating and Populating 'vehicleTable' with Example Data in a Secure KVStore —
To execute
To understand the LoadVehicleTable
against a secure KVStore deployed
and provisioned with a non-administrative user employing the steps presented in
Appendix B
,
an additonal parameter must be added to the command line above. That is, type the following:
> cd /opt/apps/kv
> javac -classpath /opt/ondb/kv/lib/kvclient.jar:/opt/ondb/kv/lib/threetenbp.jar:examples examples/hadoop/table/LoadVehicleTable.java
> cp /opt/ondb/example-store/security/client.trust /tmp
> java -classpath /opt/ondb/kv/lib/kvstore.jar:examples hadoop.table.LoadVehicleTable \
-store example-store -host kv-host-1 -port 5000 -nops 79 \
-security /tmp/example-user-client-pwdfile.login
[-delete]
where the single additonal -security
parameter specifies the
location of the login properties file (associated with a password file
rather than an Oracle Wallet) for the given user (the alias); and
all other parameters are as described for the non-secure case.
-security
parameter for this example,
recall from Appendix B
that a non-administrative user named example-user was created, and
password file based credential files (prefixed with that user name) were generated
for that user and placed under the /tmp
system directory. That is,
the example login and password files generated in
Appendix B
are:
/tmp
client.trust
example-user-client-pwdfile.login
example-user-server.login
example-user.passwd
Note that for this example, the user credential files must be co-located; where it
doesn't matter which directory they are located in, as long as they all reside in the
same directory accessible by the user. It is for this reason that the shared trust file
(client.trust) is copied into /tmp
above. Co-locating
client.trust and example-user.passwd with the login file
(example-user-client-pwdfile.login) allows relative paths to be used for the
values of the oracle.kv.ssl.trustStore
and oracle.kv.auth.pwdfile.file
system properties that are specified in the login file (or oracle.kv.auth.wallet.dir
if a wallet is used to store the user password). If those files are not co-located
with the login file, then absolute paths must be used for those properties.
At this point, the vehicleTable created in the specified KVStore (non-secure or secure)
should be populated with the desired example data. And the
For example, suppose that the 2.6.0 version of Hadoop that is
delivered via the 5.4.8 package provided by
CountTableRows
example
MapReduce job can be run to count the number of rows in that table.
Compile and Build the CountTableRows Example Program
After the example vehicleTable has been created and populated, but before
the example MapReduce job can be executed, the CountTableRows
program
must first be compiled and built for deployment to the Hadoop infrastructure.
In order to compile the CountTableRows
program, a number of Hadoop
JAR files must be installed and available in the build environment for inclusion
in the program classpath. Those JAR files are:
where the <version> token represents the particular version number
of the corresponding JAR file contained in the Hadoop distribution installed
in the build environment.
Cloudera
(cdh) is installed under the <HADOOPROOT>
base directory. And suppose that the classes from that version of Hadoop
use the 1.1.3 version of commons-logging
. Then,
to compile the CountTableRows
program, type the following
at the command line (with the <HADOOPROOT>
token
replaced with the appropriate directory path for your system):
> cd /opt/apps/kv
> javac -classpath <HADOOPROOT>/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar: \
<HADOOPROOT>/hadoop/share/hadoop/common/hadoop-common-2.6.0-cdh5.4.8.jar: \
<HADOOPROOT>/hadoop/share/hadoop/mapreduce2/hadoop-mapreduce-client-core-2.6.0-cdh5.4.8.jar: \
<HADOOPROOT>/hadoop/share/hadoop/common/lib/hadoop-annotations-2.6.0-cdh5.4.8.jar: \
/opt/ondb/kv/lib/kvclient.jar:examples \
examples/hadoop/table/CountTableRows.java
which produces the following files:
/opt/apps/kv/examples/hadoop/table/
CountTableRows.class
CountTableRows$Map.class
CountTableRows$Reduce.class
If your specific environment has a different, compatible Hadoop distribution installed,
then simply replace the versions referenced in the example command line above with the
specific versions that are installed.
If you will be running CountTableRows
against a non-secure
KVStore, then this is all you need; and the resulting class files should be placed in a
JAR file so that the program can be deployed to the example Hadoop cluster. For example,
to create a JAR file containing the class files needed to run CountTableRows
against a non-secure KVStore like that deployed in
Appendix A
,
do the following:
> cd /opt/apps/kv/examples > jar cvf CountTableRows.jar hadoop/table/CountTableRows*.classwhich should produce the file
CountTableRows.jar
in the
/opt/apps/kv/examples
directory, with contents that look like:
0 Fri Feb 20 12:53:24 PST 2015 META-INF/ 68 Fri Feb 20 12:53:24 PST 2015 META-INF/MANIFEST.MF 3842 Fri Feb 20 12:49:16 PST 2015 hadoop/table/CountTableRows.class 2623 Fri Feb 20 12:49:16 PST 2015 hadoop/table/CountTableRows$Map.class 3842 Fri Feb 20 12:49:16 PST 2015 hadoop/table/CountTableRows$Reduce.classNote that when the command above is used to generate
CountTableRows.jar
,
the utility class KVSecurityUtil
(see below) will not be included in the
resulting JAR file. Since CountTablesRows
does not use that utility class
in the non-secure case, including it in the JAR file is optional.
— Building the Example for a Secure Environment —
If you will be running
To support the secure version of CountTableRows
against a secure
KVStore such as that deployed in
Appendix B
,
then in addition to compiling CountTableRows
as described above,
additional security related artifacts need to be generated and included in
the build; where the additional artifacts include not only compiled class files,
but security credential files as well.
CountTableRows
, the utilitity class
KVSecurityUtil
and the standalone program KVSecurityCreation
should also be compiled. That is,
> cd /opt/apps/kv
> javac -classpath /opt/ondb/kv/lib/kvstore.jar:examples examples/hadoop/table/KVSecurityCreation.java
> javac -classpath /opt/ondb/kv/lib/kvstore.jar:examples examples/hadoop/table/KVSecurityUtil.java
> javac -classpath <HADOOPROOT>/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar: \
<HADOOPROOT>/hadoop/share/hadoop/common/hadoop-common-2.6.0-cdh5.4.8.jar: \
<HADOOPROOT>/hadoop/share/hadoop/mapreduce2/hadoop-mapreduce-client-core-2.6.0-cdh5.4.8.jar: \
<HADOOPROOT>/hadoop/share/hadoop/common/lib/hadoop-annotations-2.6.0-cdh5.4.8.jar: \
/opt/ondb/kv/lib/kvclient.jar:examples \
examples/hadoop/table/CountTableRows.java
which produces the files:
/opt/apps/kv/examples/hadoop/table/
CountTableRows.class
CountTableRows$Map.class
CountTableRows$Reduce.class
KVSecurityUtil.class
KVSecurityCreation.class
Unlike the non-secure case, the build artifacts needed to deploy CountTableRows
in a secure environment include more than just a single JAR file containing the generated class
files. For the secure case, it will be important to package some artifacts for deployment to the
client side of the application that communicates with the KVStore; whereas other artifacts
will need to be packaged for deployment to the server side. Although there are different
ways to achieve this "separation of concerns" when deploying a given application,
Appendix C
presents one particular model you can use to package and deploy the artifacts of an application
(such as CountTableRows
) that will interact with a secure KVStore. With this in mind,
the sections below related to executing CountTableRows
against a secure KVStore
each assume that the application has been built and packaged according to the instructions presented in
Appendix C
.
Executing the CountTableRows Example
If you will be running CountTableRows
against a non-secure KVStore
deployed in the manner described in Appendix A
,
and have compiled and built CountTableRows
in the manner presented
in the previous section
,
then the MapReduce job initiated by CountTableRows
can be deployed
and executed by typing the following at the command line of the Hadoop cluster's
access node (where line breaks are used only for readability):
> cd /opt/apps/kv > export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/ondb/kv/lib/kvclient.jar > hadoop jar examples/CountTableRows.jar \ hadoop.table.CountTableRows \ -libjars /opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar \ example-store \ kv-host-1:5000 \ vehicleTable \ /user/example-user/CountTableRows/vehicleTable/<000N>where the
hadoop
command interpreter's -libjars
argument
is used to include the third party libraries kvclient.jar
and threetenbp.jar
in the classpath
of each MapReduce task executing on the cluster's DataNodes; so that those tasks can
access classes such as,
TableInputSplit
,
TableRecordReader
,
and
org.threeten.bp.DateTimeException
.
Note that in the last argument, the example-user
directory
component corresponds to a directory under the HDFS /user
top-level
directory, and typically corresponds to the user who has initiated the MapReduce job.
This directory is usually created in HDFS by the Hadoop cluster administrator.
Additionally, the <000N> token in that argument represents a string
such as 0000
, 0001
, 0002
, etc.
Although any string can be used for this token, using a different number for
"N" on each execution of the job makes it easier to keep track of results when
multiple executions of the job occur.
— Executing CountTableRows Against a Secure KVStore —
If you will be running CountTableRows
against a secure
KVStore deployed in the manner presented in
Appendix B
,
and if you have compiled, built, and packaged CountTableRows
and all
the necessary artifacts in the manner described in
Appendix C
,
then CountTableRows
can be run against the secure KVStore by typing the
following at the command line of the Hadoop cluster's access node (where line breaks are
used only for readability):
> export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/ondb/kv/lib/kvclient.jar:/opt/apps/kv/examples/CountTableRows-pwdServer.jar
> cd /opt/apps/kv
> hadoop jar examples/CountTableRows-pwdClient.jar \
hadoop.table.CountTableRows \
-libjars /opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar,/opt/apps/kv/examples/CountTableRows-pwdServer.jar \
example-store \
kv-host-1:5000 \
vehicleTable \
/user/example-user/CountTableRows/vehicleTable/<000N> \
example-user-client-pwdfile.login \
example-user-server.login
where the mechanism used for storing the user password is a password file;
and the client side artifacts are highlighted in red, and the server side artifacts are
highlighted in purple.
Similarly, if the mechanism used for storing the user password is an Oracle Wallet (available only with the Oracle NoSQL Database Enterprise Edition), you would type the following at the access node's command line:
> export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/ondb/kv/lib/kvclient.jar:/opt/apps/kv/examples/CountTableRows-walletServer.jar > cd /opt/apps/kv > hadoop jar examples/CountTableRows-walletClient.jar \ hadoop.table.CountTableRows \ -libjars /opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar,/opt/apps/kv/examples/CountTableRows-walletServer.jar \ example-store \ kv-host-1:5000 \ vehicleTable \ /user/example-user/CountTableRows/vehicleTable/<000N> \ example-user-client-wallet.login \ example-user-server.loginIn both cases above — password file and Oracle Wallet — notice the additional JAR file (
CountTableRows-pwdServer.jar
or
CountTableRows-walletServer.jar
) specified for both the
HADOOP_CLASSPATH
environment variable and -libjars
paramenter. For a detailed explanation of the use and purpose of that server side
JAR file, as well as a description of the client side JAR file and the two
additional arguments at the end of the command line, refer to
Appendix C
;
specifically, the section on
packaging
for a secure KVStore.
Whether running against a secure or non-secure store, as the job runs, assuming no errors, the output from the job will look like the following:
... 2014-12-04 08:59:47,996 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1344)) - Running job: job_1409172332346_0024 2014-12-04 08:59:54,107 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1372)) - map 0% reduce 0% 2014-12-04 09:00:16,148 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1372)) - map 7% reduce 0% 2014-12-04 09:00:17,368 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1372)) - map 26% reduce 0% 2014-12-04 09:00:18,596 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1372)) - map 56% reduce 0% 2014-12-04 09:00:19,824 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1372)) - map 100% reduce 0% 2014-12-04 09:00:23,487 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1372)) - map 100% reduce 100% 2014-12-04 09:00:23,921 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1383)) - Job job_1409172332346_0024 completed successfully 2014-12-04 09:00:24,117 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1390)) - Counters: 49 File System Counters FILE: Number of bytes read=2771 FILE: Number of bytes written=644463 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2660 HDFS: Number of bytes written=32 HDFS: Number of read operations=15 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=6 Launched reduce tasks=1 Rack-local map tasks=6 Total time spent by all maps in occupied slots (ms)=136868 Total time spent by all reduces in occupied slots (ms)=2103 Total time spent by all map tasks (ms)=136868 Total time spent by all reduce tasks (ms)=2103 Total vcore-seconds taken by all map tasks=136868 Total vcore-seconds taken by all reduce tasks=2103 Total megabyte-seconds taken by all map tasks=140152832 Total megabyte-seconds taken by all reduce tasks=2153472 Map-Reduce Framework Map input records=79 Map output records=79 Map output bytes=2607 Map output materialized bytes=2801 Input split bytes=2660 Combine input records=0 Combine output records=0 Reduce input groups=1 Reduce shuffle bytes=2801 Reduce input records=79 Reduce output records=1 Spilled Records=158 Shuffled Maps =6 Failed Shuffles=0 Merged Map outputs=6 GC time elapsed (ms)=549 CPU time spent (ms)=9460 Physical memory (bytes) snapshot=1888358400 Virtual memory (bytes) snapshot=6424895488 Total committed heap usage (bytes)=1409286144 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=32To see the results of the job — to verify that the program actually counted the correct number of rows in the table — use the Hadoop CLI to display the contents of the MapReduce results file located in HDFS. To do this, type the following at the command line of the Hadoop cluster's access node:
> hadoop fs -cat /user/example-user/CountTableRows/vehicleTable/<000N>/part-r-00000where
example-user
and the <000N>
token should
be replaced with the values you used when the job was run; appropriate to your particular
system. Assuming the table was populated with 79 rows, if the job was successful, then
the output should look like the following:
/type/make/model/class 79where
/type/make/model/class
are the names of the fields making up the
PrimaryKey
of the vehicleTable
; and 79
is the number of rows in the table.
/opt/ondb/kv
.
/disk1/shard
for host kv-host-1, /disk2/shard
for host kv-host-2, and /disk3/shard
for host kv-host-3.
> java -jar /opt/ondb/kv/lib/kvstore.jar makebootconfig \ -root /opt/ondb/example-store \ -config config.xml \ -port 5000 \ -host kv-host-1 \ -harange 5002,5005 \ -num_cpus 0 \ -memory_mb 0 \ -capacity 3 \ -storagedir /disk1/shardThen from kv-host-2, type:
> java -jar /opt/ondb/kv/lib/kvstore.jar makebootconfig \ -root /opt/ondb/example-store \ -config config.xml \ -port 5000 \ -host kv-host-2 \ -harange 5002,5005 \ -num_cpus 0 \ -memory_mb 0 \ -capacity 3 \ -storagedir /disk2/shardAnd finally from kv-host-3, type:
> java -jar /opt/ondb/kv/lib/kvstore.jar makebootconfig \ -root /opt/ondb/example-store \ -config config.xml \ -port 5000 \ -host kv-host-3 \ -harange 5002,5005 \ -num_cpus 0 \ -memory_mb 0 \ -capacity 3 \ -storagedir /disk3/shard
> nohup java -jar /opt/ondb/kv/lib/kvstore.jar start -root /opt/ondb/example-store -config config.xml &which will start both an SNA and an admin service on the associated host.
> java -jar /opt/ondb/kv/lib/kvstore.jar runadmin -host kv-host-1 -port 5000 kv->Next, deploy the store by entering the following commands — either in succession, from the CLI prompt; or from a script, using the CLI command '
load -file <flnm>
'.
configure -name kvstore-db plan deploy-zone -name zn1 -rf 3 -wait plan deploy-sn -znname zn1 -host kv-host-1 -port 5000 -wait plan deploy-admin -sn 1 -port 5001 -wait pool create -name snpool pool join -name snpool -sn sn1 plan deploy-sn -znname zn1 -host kv-host-2 -port 5000 -wait plan deploy-admin -sn 2 -port 5001 -wait pool join -name snpool -sn sn2 plan deploy-sn -znname zn1 -host kv-host-3 -port 5000 -wait plan deploy-admin -sn 3 -port 5001 -wait pool join -name snpool -sn sn3 change-policy -params "loggingConfigProps=oracle.kv.level=INFO;" topology create -name store-layout -pool snpool -partitions 300 plan deploy-topology -name store-layout -plan-name store-deploy-plan -wait
/opt/ondb/kv
.
/disk1/shard
for host kv-host-1, /disk2/shard
for host kv-host-2, and /disk3/shard
for host kv-host-3.
No_Sql_00
.
example-user
.
> java -jar /opt/ondb/kv/lib/kvstore.jar makebootconfig \ -root /opt/ondb/example-store \ -config config.xml \ -port 5000 \ -host kv-host-1 \ -harange 5002,5005 \ -num_cpus 0 \ -memory_mb 0 \ -capacity 3 \ -storagedir /disk1/shard \ -store-security configure \ -pwdmgr pwdfile Enter a password for the Java KeyStore:No_Sql_00<RETURN> Re-enter the KeyStore password for verification:No_Sql_00<RETURN> Created files /opt/ondb/example-store/security/store.trust /opt/ondb/example-store/security/store.keys /opt/ondb/example-store/security/store.passwd /opt/ondb/example-store/security/client.trust /opt/ondb/example-store/security/security.xml /opt/ondb/example-store/security/client.securityNote the value of the
-store-security
parameter for
kv-host-1 is configure
.
After executing the command above, use a utility such as scp
to copy the resulting security directory to the other SN hosts; that is,
kv-host-2 and kv-host-3. For example,
> scp -r /opt/ondb/example-store/security example-user@kv-host-2:/opt/ondb/example-store > scp -r /opt/ondb/example-store/security example-user@kv-host-3:/opt/ondb/example-store store.trust 100% 508 0.5KB/s 00:00 store.keys 100% 1215 1.2KB/s 00:00 store.passwd 100% 39 0.0KB/s 00:00 client.trust 100% 508 0.5KB/s 00:00 security.xml 100% 2216 2.2KB/s 00:00 client.security 100% 255 0.3KB/s 00:00After generating and distributing the security configuration files by executing the commands shown above from the first SN host (kv-host-1), login to each of the remaining SN hosts and enable security by typing the commands shown below from the respective host's command line. That is, from kv-host-2, type the following:
> java -jar /opt/ondb/kv/lib/kvstore.jar makebootconfig \ -root /opt/ondb/example-store \ -config config.xml \ -port 5000 \ -host kv-host-2 \ -harange 5002,5005 \ -num_cpus 0 \ -memory_mb 0 \ -capacity 3 \ -storagedir /disk2/shard \ -store-security enable \ -pwdmgr pwdfileAnd then from kv-host-3, type:
> java -jar /opt/ondb/kv/lib/kvstore.jar makebootconfig \ -root /opt/ondb/example-store \ -config config.xml \ -port 5000 \ -host kv-host-3 \ -harange 5002,5005 \ -num_cpus 0 \ -memory_mb 0 \ -capacity 3 \ -storagedir /disk3/shard \ -store-security enable \ -pwdmgr pwdfileFor both commands above, note the value of the
-store-security
parameter
is enable
rather than configure
; as was used
with the first host.
> nohup java -jar /opt/ondb/kv/lib/kvstore.jar start -root /opt/ondb/example-store -config config.xml &which will start both an SNA and an admin service on the associated host.
> java -jar /opt/ondb/kv/lib/kvstore.jar runadmin \ -host kv-host-1 \ -port 5000 \ -security /opt/ondb/example-store/security/client.security Logged in admin as anonymous kv->Next, deploy the store by entering the following commands — either in succession, from the CLI prompt; or from a script, using the CLI command '
load -file <flnm>
'.
configure -name kvstore-db plan deploy-zone -name zn1 -rf 3 -wait plan deploy-sn -znname zn1 -host kv-host-1 -port 5000 -wait plan deploy-admin -sn 1 -port 0 -wait pool create -name snpool pool join -name snpool -sn sn1 plan deploy-sn -znname zn1 -host kv-host-2 -port 5000 -wait plan deploy-admin -sn 2 -port 0 -wait pool join -name snpool -sn sn2 plan deploy-sn -znname zn1 -host kv-host-3 -port 5000 -wait plan deploy-admin -sn 3 -port 0 -wait pool join -name snpool -sn sn3 change-policy -params "loggingConfigProps=oracle.kv.level=INFO;" topology create -name store-layout -pool snpool -partitions 300 plan deploy-topology -name store-layout -plan-name store-deploy-plan -waitBefore you can create and populate a table with data and then execute the example described in this document, an administrative user — named root for the purposes of this document — must be created, and provisioned with the necessary security credentials; so that the store can be administered if necessary. To do this, after deploying the store and topology using the steps just presented, the desired user is first created by typing the command shown below at the CLI prompt. That user is then provisioned with the necessary administrative privileges by following the steps presented in the next sub-sections.
plan create-user -name root -admin -wait Enter the new password: No_Sql_00<RETURN> Re-enter the new password: No_Sql_00<RETURN>
Note also that for both cases — password or wallet — two login properties
files are generated; one for client side connections, and one for server side connections.
The only difference between the client side login file and the server side login file is
that the client side login file specifies the username
(the alias)
along with the location of the user's password
— specified by either the oracle.kv.auth.pwdfile
or oracle.kv.auth.wallet.dir
property.
Although optional, the reason for using two login files is to avoid passing private security
information to the server side; as explained in more detail in
Appendix C
.
Additionally, observe that the server side login file (example-user-server.login
)
is identical for both cases. This is because whether a password file or a wallet is used to
store the password, both use the same publicly visible communication transport information.
At this point, the KVStore has been deployed, configured for secure access, and provisioned with the necessary users and credentials; so that the table can be created and populated, and the example can be executed by a user whose password is stored either in a clear text password file or an Oracle Wallet (Enterprise Edition only) to demonstrate running against table data contained in a secure Oracle NoSQL Database store.
A final, important point to note is that the storage mechanism used for the example application's
user password (password file or Oracle Wallet) does not depend on the password
storage mechanism used by the KVStore with which that application will communicate. That is, although
Appendix B
(for convenience) deployed a secure KVStore using a password file rather than a wallet,
the fact that the KVStore placed the passwords it manages in a password file does not
prevent the developer/deployer of a client of that store from storing the client's
user password in an Oracle Wallet; or vice-versa. You should therefore view the use of
an Oracle Wallet or a password file by any client application as simply a "safe" place
(for some value of "safe") where the user password can be stored; which can be accessed
by only the user who owns the wallet or password file. This means that the choice of
password storage mechanism is at the discretion of the application developer/deployer;
no matter what mechanism is used by the KVStore itself.
With respect to running a MapReduce job against data contained in a secure KVStore,
a particularly important issue to address is related to the communication of user
credentials to the tasks run on each of the DataNodes on which the Hadoop
infrastructure executes the job. Recall from above that when using the
To address this issue, the sections below present a model that developers and deployers
can employ to facilitate the communication of each user's credentials to a given MapReduce
job from the client side of the job; that is, from the address space controlled
by the job's client process, owned by the user. As described below, this model will consist
of two primary components: a programming model for executing MapReduce jobs that retrieve
and process data contained in tables located in a secure KVStore; and a set of
"best practices" for building, packaging, and deploying those jobs. Although there is
nothing preventing a user from manually installing the necessary security credentials
on all nodes in a given cluster, doing so is not only impractical, but may result in
various security vulnerabilitites. Combining this programming model with the deployment
best practices that are presented will help developers and deployers not only avoid the
need to manually pre-install credentials on the DataNodes of the Hadoop cluster, but will
also prevent the sort of security vulnerabilities that can occur with manual installation.
Appendix C: A Model for Building and Packaging a Secure KVStore Client
MapReduce
programming model defined by
Apache Hadoop
the tasks executed by a MapReduce job each act as a client of the KVStore. Thus,
if the store is configured for secure access, in order to retrieve the desired data
from the store, each task must have access to the credentials of the user associated
with that data. As described in the Oracle NoSQL Database Security Guide, the
typical mechanism for providing the necessary credentials to a client of a secure store
is to manually install the credentials on the client's local file system;
for example, by employing a utility such as scp
. Although that mechanism
is practical for most clients of a secure KVStore, it is extremely impractical for a
MapReduce job. This is because a MapReduce job consists of multiple tasks running in
parallel, in separate address spaces, each with a separate file system that is generally
not under the control of the user. Assuming then, that write access is granted by the
Hadoop administrator (a problem in and of itself), this means that manual installation
of the client credentials for every possible user known to the KVStore would need to
occur on the file system of each of the multiple nodes in the Hadoop cluster; something
that may be very difficult to achieve.
The Programming Model for MapReduce and Oracle NoSQL Database Security
Recall that when executing a MapReduce job, the client application uses mechanisms
provided by the Hadoop infrastructure to initiate the job from a node (referred to
as the Hadoop cluster's access node) that has network access to the node running the
Hadoop cluster's ResourceManager
. If the job will be run against a secure KVStore,
then prior to initiating the job, the client must initialize the job's
TableInputFormat
with the following three pieces of information:
To perform this initialization, the MapReduce client application
— PasswordCredentials
containing the username and password the client will present to the store during authentication.
CountTableRows
in this case — invokes the setKVSecurity
method defined in
TableInputFormat
.
Once this initialization has been performed and the job has been initiated, the job uses that
TableInputFormat
to create and assign a
TableInputSplit
(a split) to each of the Mapper
tasks that will run on one of the DataNodes
in the cluster. The
TableInputFormat
needs the information initialized by the setKVSecurity
method for two reasons:
TableInputFormat
(and thus, it splits) with the information listed above, the model also requires that the
public and private security credentials referenced by that information be communicated to the
TableInputFormat
,
as well as the splits, in a secure fashion. How this is achieved depends on whether that
information is being communicated to the
TableInputFormat
on the client side of the application, or to the splits on the server side.
— Communicating Security Credentials to the Splits —
To facilitate communication of the user's security credentials to the splits distributed
to each of the DataNodes of the cluster, the model separates public security
information from the private information (the username and password), and then
stores the private information as part of each split's internal state, rather than on the
local file system of each associated DataNode; which may be vulnerable or difficult/impossible
to secure. For communication of the public contents of the login and trust files
to each such split, the model supports an (optional) mechanism that allows the application
to communicate that information as Java resources that each split retrieves from
the classpath of the split's Java VM. This avoids the need to manually transfer the contents
of those files to each DataNode's local file system, and also avoids the potential security
vulnerabilities that can result from manual installation on those nodes. Note that when
an application wishes to employ this mechanism, it will typically include the necessary
information in a JAR file that is specified to the MapReduce job via the
The intent of the mechanism just described is to allow applications to exploit the Hadoop
infrastructure to automatically distribute the public login and trust information to each of
the job's splits via a JAR file added to the classpath on each remote DataNode. But it is important
to note that although this mechanism is used to distribute the application's public
credentials, it must not be used to distribute any of the private information related to
authentication; specifically, the username and password. This is important because a JAR
file that is distributed to the DataNodes in the manner described may be cached on the associated
DataNode's local file system; which might expose a vulnerability. As a result, private
authentication information is only communicated as part of each split's internal state.
The separation of public and private credentials supported by this model not only
prevents caching the private credentials on each DataNode, but also facilitates the ability
to guarantee the confidentiality of that information; via whatever external third party
secure communication mechanism the current Hadoop implementation happens to employ.
This capability is also important to support the execution of Hive queries against a
secure store.
— Communicating Security Credentials to the TableInputFormat —
With respect to the job's
A second option for communicating the user's security credentials to the
Rather than manually installing the necessary security artifacts (login file, trust file,
password file or Oracle Wallet) on each DataNode in the cluster, user's should instead
install those artifacts only on the cluster's single access node; the node from which
the client application is executed. The client application can then retrieve each artifact
from the local environment, repackage the necessary information, and then employ mechanisms
provided by the Hadoop infrastructure to transfer that information to the appropriate
components of the MapReduce job that will be executed.
For example, as described in the previous section, your client application can be designed
to retrieve the username and location of the password from the command line, a configuration
file, or a resource in the client classpath; where the location of the user's password is
a locally installed password file or Oracle Wallet (Enterprise Edition only) that can only
be read by the user. After retrieving the username from the command line and the password
from the specified location, the client uses that information to create the user's
With respect to the transfer of the public login and trust artifacts, the client application
can exploit the mechanisms provided by the Hadoop infrastructure to automatically transfer
classpath (JAR) artifacts to the job's tasks. As demonstrated by the
— Review: Application Packaging for the Non-Secure Case —
To understand how the packaging model discussed here can be employed when executing an
application against a secure KVStore, it may be helpful to first review how the
Finally, the
— Application Packaging for the Secure Case —
Compare the non-secure case above with what would be done to run the
As described in
Finally, in a fashion similar to that described for the non-secure case above, to
execute the
Observe that when you package and execute your MapReduce application in a manner like
that shown in the example above, there is no need to specify the username or password file
(or wallet) on the command line; as that information is included as part of the client side
JAR file. Additionally, the server side JAR file that is transferred from the access node
to the job's DataNodes does not include that private information; which is important because
that transferred JAR file will be cached in the file system of each of those DataNodes.
— Summary —
As the example above demonstrates, the
Note that this model for separating public and private user credentials will play
an important role when executing Hive queries against table data in a secure KVStore.-libjars
hadoop command line directive.
TableInputFormat
,
the programming model supports different options for communicating the user's security information.
This is because the
TableInputFormat
operates only on the access node, on the client side of the job; which means that there is
only one file system that needs to be secured. Additionally, unlike the splits, the
TableInputFormat
is not sent on the wire. Thus, as long as only the user is granted read privileges,
both the public and private security information can be installed on the access node's
file system without fear of compromise. For this case, the application would typically
use system properties (on the command line) to specify the fully-qualified paths to the
login, trust, and password files (or Oracle Wallet); which the
TableInputFormat
would then read from the local file system, retrieving the necessary public and private
security information.
TableInputFormat
is to include the public and private information as resources in the client side
classpath of the Java VM in which the
TableInputFormat
runs. This is the option employed by the example presented in this document, and is
similar to what was described above for the splits. This option demonstrates how an
application's build model can be exploited to simplify not only the applications's
command line, but also the deployment of secure MapReduce jobs in general. As was the
case with the splits, applications will typically communicate the necessary security
information as Java resources by including that information in a JAR file. But rather
than using the -libjars
hadoop command line directive to specify the
JAR file to the server side of the MapReduce job, in this case, because the
TableInputFormat
operates on only the client side access node, the JAR file would simply be added to
the HADOOP_CLASSPATH
environment variable.
Best Practices: MapReduce Application Packaging for Oracle NoSQL Database Security
To help users achieve the sort of separation of public and private security information
described in previous sections, a set of (optional) best practices related to packaging
the client application and its necessary artifacts is presented in this section; and are
employed by the example featured in this document. Although the use of these packaging
practices is optional, you are encouraged to employ them when working with any
MapReduce jobs of your own that will interact with a KVStore configured for secure
access.
PasswordCredentials
,
which are transferred to each MapReduce task via the splits that are created by the job's
TableInputFormat
.
Using this model, the user's
PasswordCredentials
,
are never written to the file systems of the cluster's DataNodes. They are only held in each
task's memory. As a result, the integrity and confidentiality of those credentials only
needs to be provided when on the wire; which can be achieved by using whatever external
third party secure communication mechanism the current Hadoop implementation happens
to employ.
CountTableRows
example presented in the body of this document, the client application's build process can be
designed to separate the application's class files from its public security artifacts.
Specifically, the application's class files (and optionally, the public and private credentials)
can be placed in a local (to the access node) JAR file for inclusion in the classpath of the
client itself; while only the public security artifacts (the public login properties and client
trust information) are placed in a separate JAR file that can be added to the
-libjars
specification of the hadoop command line for inclusion in the classpath
of each MapReduce task.
CountTableRows
example is executed against a non-secure store. Recall from
the previous sections, for the non-secure case, the following command was executed
to produce a JAR file containing only the class files needed by CountTableRows
.
> cd /opt/apps/kv/examples
> jar cvf CountTableRows.jar hadoop/table/CountTableRows*.class
which produces the file CountTableRows.jar
, whose contents look like:
0 Fri Feb 20 12:53:24 PST 2015 META-INF/
68 Fri Feb 20 12:53:24 PST 2015 META-INF/MANIFEST.MF
3842 Fri Feb 20 12:49:16 PST 2015 hadoop/table/CountTableRows.class
2623 Fri Feb 20 12:49:16 PST 2015 hadoop/table/CountTableRows$Map.class
3842 Fri Feb 20 12:49:16 PST 2015 hadoop/table/CountTableRows$Reduce.class
Then the following commands can be used to execute the CountTableRows
example MapReduce job against a non-secure KVStore:
> export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/ondb/kv/lib/kvclient.jar
> cd /opt/ondb/kv
> hadoop jar examples/non_secure_CountTableRows.jar hadoop.table.CountTableRows \
-libjars /opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar \
example-store \
kv-host-1:5000 \
vehicleTable \
/user/example-user/CountTableRows/vehicleTable/0001
Note that there are three classpaths that must be set when a MapReduce job is
executed. First, the jar
specification to the hadoop
command interpreter
makes the class files of the main program (CountTableRows
in this case)
accessible to the hadoop launcher mechanism; so that the program can be loaded and
executed. Next, the HADOOP_CLASSPATH
environment variable must be
set to include any third party libraries that the program or the Hadoop framework
(running on the local access node) may need to load. For the example above,
kvclient.jar
is added to HADOOP_CLASSPATH
so
that the Hadoop framework's job initiation mechanism on the access node can access
TableInputFormat
and its related classes; whereas the threetenbp.jar
library is only
required in the -libjars
argument (described next).
hadoop
command interpreter's -libjars
argument is
used to include any third party libraries in the classpath of each MapReduce task
executing on the cluster's DataNodes. For the case above, both kvclient.jar
and threetenbp.jar
are specified in -libjars
so that each MapReduce task can access
classes such as,
TableInputSplit
and
TableRecordReader
,
as well as the org.threeten.bp.DateTimeException
class from the threetenbp.jar
third party library.
CountTableRows
MapReduce job against a secure KVStore. For
the secure case, two JAR files are built; one for the classpath on
the client side, and one for the classpaths of the DataNodes on the server side. The
first JAR file will be added to the client side classpath and includes not only the
class files for the application but also the public and private credentials the
client will need to interact with the secure KVStore; where including the public and
private credentials in the client side JAR file avoids the inconvenience of having to
specify that information on the command line. The second JAR file will be added
(via the -libjars
argument) to the DataNode classpaths on the server
side, and will include only the user's public credentials.
Appendix B
,
the user's password can be stored in either a clear text password file or an Oracle
Wallet. As a result, how the first JAR is generated is dependent on whether a password file
is used or a wallet. For example, assuming that a password file is used and the
user's security artifacts are generated using the KVSecurityCreation
program in the manner presented in
Appendix B
,
to generate both the client side and server side JAR files for the CountTableRows
example application, type the following:
> cd /opt/apps/kv/examples
> jar cvf CountTableRows-pwdClient.jar hadoop/table/CountTableRows*.class hadoop/table/KVSecurityUtil*.class
> cd /opt/ondb/example-store/security
> jar uvf /opt/apps/kv/examples/CountTableRows-pwdClient.jar client.trust
> cd /tmp
> jar uvf /opt/apps/kv/examples/CountTableRows-pwdClient.jar example-user-client-pwdfile.login
> jar uvf /opt/apps/kv/examples/CountTableRows-pwdClient.jar example-user.passwd
> cd /opt/ondb/example-store/security
> jar cvf /opt/apps/kv/examples/CountTableRows-pwdServer.jar client.trust
> cd /tmp
> jar uvf /opt/apps/kv/examples/CountTableRows-pwdServer.jar example-user-server.login
which produces the client side JAR file named CountTableRows-pwdClient.jar
,
with contents that look like:
0 Mon May 04 13:01:04 PDT 2015 META-INF/
68 Mon May 04 13:01:04 PDT 2015 META-INF/MANIFEST.MF
3650 Mon May 04 13:00:52 PDT 2015 hadoop/table/CountTableRows.class
2623 Mon May 04 13:00:52 PDT 2015 hadoop/table/CountTableRows$Map.class
437 Mon May 04 13:00:52 PDT 2015 hadoop/table/CountTableRows$Reduce.class
6628 Mon May 04 13:00:52 PDT 2015 hadoop/table/KVSecurityUtil.class
508 Wed Apr 22 12:23:32 PDT 2015 client.trust
322 Mon May 04 11:23:32 PDT 2015 example-user-client-pwdfile.login
34 Mon May 04 11:23:38 PDT 2015 example-user.passwd
and produces the server side JAR file named CountTableRows-pwdServer.jar
,
with contents that look like:
0 Mon May 04 13:01:04 PDT 2015 META-INF/
68 Mon May 04 13:01:04 PDT 2015 META-INF/MANIFEST.MF
508 Wed Apr 22 12:23:32 PDT 2015 client.trust
255 Mon May 04 11:30:54 PDT 2015 example-user-server.login
Alternatively, if KVSecurityCreation
was used to generate wallet based
artifacts for CountTableRows
, then the client side and server side
JAR files would be generated by typing:
> cd /opt/apps/kv/examples
> jar cvf CountTableRows-walletClient.jar hadoop/table/CountTableRows*.class hadoop/table/KVSecurityUtil*.class
> cd /opt/ondb/example-store/security
> jar uvf /opt/apps/kv/examples/CountTableRows-walletClient.jar client.trust
> cd /tmp
> jar uvf /opt/apps/kv/examples/CountTableRows-walletClient.jar example-user-client-wallet.login
> jar uvf /opt/apps/kv/examples/CountTableRows-walletClient.jar example-user-wallet.dir
> cd /opt/ondb/example-store/security
> jar cvf /opt/apps/kv/examples/CountTableRows-walletServer.jar client.trust
> cd /tmp
> jar uvf /opt/apps/kv/examples/CountTableRows-walletServer.jar example-user-server.login
each with contents identical or analogous to the contents of the JAR files for
the password case. That is,
0 Mon May 04 13:22:36 PDT 2015 META-INF/
68 Mon May 04 13:22:36 PDT 2015 META-INF/MANIFEST.MF
3650 Mon May 04 13:00:52 PDT 2015 hadoop/table/CountTableRows.class
2623 Mon May 04 13:00:52 PDT 2015 hadoop/table/CountTableRows$Map.class
437 Mon May 04 13:00:52 PDT 2015 hadoop/table/CountTableRows$Reduce.class
6628 Mon May 04 13:00:52 PDT 2015 hadoop/table/KVSecurityUtil.class
508 Wed Apr 22 12:23:32 PDT 2015 client.trust
324 Mon May 04 11:30:54 PDT 2015 example-user-client-wallet.login
0 Mon May 04 11:30:54 PDT 2015 example-user-wallet.dir/
3677 Mon May 04 11:31:00 PDT 2015 example-user-wallet.dir/cwallet.sso
and
0 Mon May 04 13:01:04 PDT 2015 META-INF/
68 Mon May 04 13:01:04 PDT 2015 META-INF/MANIFEST.MF
508 Wed Apr 22 12:23:32 PDT 2015 client.trust
255 Mon May 04 11:30:54 PDT 2015 example-user-server.login
CountTableRows
MapReduce job — using a password file —
against a secure KVStore, you would type the following:
> export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/ondb/kv/lib/kvclient.jar:/opt/apps/kv/examples/CountTableRows-pwdServer.jar
> cd /opt/apps/kv
> hadoop jar examples/CountTableRows-pwdClient.jar \
hadoop.table.CountTableRows \
-libjars /opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar,/opt/apps/kv/examples/CountTableRows-pwdServer.jar \
example-store \
kv-host-1:5000 \
vehicleTable \
/user/example-user/CountTableRows/vehicleTable/0001 \
example-user-client-pwdfile.login \
example-user-server.login
Similarly, if the application stores its password in an Oracle Wallet, then you
would type:
> export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/ondb/kv/lib/kvclient.jar:/opt/apps/kv/examples/CountTableRows-walletServer.jar
> cd /opt/apps/kv
> hadoop jar examples/CountTableRows-walletClient.jar \
hadoop.table.CountTableRows \
-libjars /opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar,/opt/apps/kv/examples/CountTableRows-walletServer.jar \
example-store \
kv-host-1:5000 \
vehicleTable \
/user/example-user/CountTableRows/vehicleTable/0001 \
example-user-client-wallet.login \
example-user-server.login
When comparing the command lines above with the command line used for the non-secure
case, you should notice that HADOOP_CLASSPATH
and -libjars
both have been augmented with the JAR file that contains only the public
login and trust credentials (CountTableRows-pwdServer.jar
or
CountTableRows-walletServer.jar
); whereas the local classpath
of the client side of the application is augmented — via the jar
directive —
with the JAR file that includes both the public and private credentials (CountTableRows-pwdClient.jar
or CountTableRows-walletClient.jar
). The only other difference with the
non-secure case is the two additional arguments at the end of the argument list;
example-user-client-pwdfile.login
(or example-user-client-wallet.login
)
and example-user-server.login
. The values of those arguments specify, respectively,
the names of the client side and server side login files; which will be retrieved as
resources from the corresponding JAR file.
programming model
for MapReduce and Oracle NoSQL Database Security supports (even encourages) the
best practices presented in this section for building, packaging, and deploying
any given MapReduce job that employs the Oracle NoSQL Database Table API to retrieve
and process data in a given KVStore — either secure or non-secure. As a result,
simply generating separate JAR files — a set of JAR files for the secure case,
and one for the non-secure case — allows deployers to conveniently run the job
with or without security.
Copyright (c) 2011, 2017 Oracle and/or its affiliates. All rights reserved.