See: Description
| Class | Description |
|---|---|
| CountTableRows |
A basic example demonstrating how to use the class
oracle.kv.hadoop.table.TableInputFormat to access the
rows of a table in an Oracle NoSQL Database from within a Hadoop
MapReduce job for the purpose of counting the number of records
in the table. |
| CountTableRows.Map |
Nested class used to map input key/value pairs to a set of
intermediate key/value pairs in preparation for the reduce phase
of the associated MapReduce job.
|
| CountTableRows.Reduce |
Nested class used to reduce to a smaller set of values, the set of
intermediate values, sharing a key, which were produced by the
mapping phase of the job.
|
| KVSecurityCreation |
Standalone program that creates (or deletes) a password file or Oracle
Wallet for use when running the associated example MapReduce job
with a secure KVStore.
|
| KVSecurityUtil |
Utility class that provides convenience methods related to the Oracle
NoSQL Database security model.
|
| LoadVehicleTable |
Class that creates an example table in a given NoSQL Database store and
then uses the Table API to populate the table with sample records.
|
Apache Hadoop,
MapReduce,
and its programming model; that is, become familiar with how to write and deploy
a MapReduce job.
/opt/ondb/kv —
that is network reachable from the nodes of the Hadoop cluster.
/opt/ondb/example-store —
to 3 machines (real or virtual) with host names, kv-host-1,
kv-host-2, and kv-host-3; where an admin service,
listening on port 5000, is deployed on each host.
<KVHOME>
and <KVROOT> environment variables, the store name,
host names, and admin port described above should allow you to more easily
follow the example that is presented. Combined with the information contained
in the Oracle NoSQL Database Getting Started Guide, as well as the
Oracle NoSQL Database Admin Guide and Oracle NoSQL Database Security Guide,
you should then be able to generalize and extend the example to your own
particular development scenario; substituting the values specific to the
given environment where necessary.
Detailed instructions for deploying a non-secure KVStore are provided in
Appendix A.
Similarly, Appendix B
provides instructions for deploying a KVStore configured for security.
Hadoop Distributed File System
(referred to as, HDFS).
MapReduce
programming model; which consists of a Map Phase
that includes a mapping step and a shuffle-and-sort step
that together perform filtering and sorting, and a Reduce Phase
that performs a summary operation on the mapped and sorted results.
Cloudera
or
Hortonworks)
each provide an infrastructure which orchestrates the MapReduce
processing that is performed. It does this by marshalling the distributed
servers that run the various tasks in parallel, by managing all communications
and data transfers between the various parts of the system, and by providing for
redundancy and fault tolerance.
In addition, the Hadoop infrastructure provides a number of interactive tools — such as a command line interface (the Hadoop CLI) — that provide access to the data stored in HDFS. But the typical way application developers read, write, and process data stored in HDFS is via MapReduce jobs; which are programs that adhere to the Hadoop MapReduce programming model. For more detailed information on Hadoop HDFS and MapReduce, consult the Hadoop MapReduce tutorial.
As indicated above, with the introduction of the Table API, a new set of
interfaces and classes that satisfy the Hadoop MapReduce programming model
have been provided which support writing MapReduce jobs that can be run against
table data contained in a KVStore. These new classes are located in the
oracle.kv.hadoop.table
package, and consist of the following types:
org.apache.hadoop.mapreduce.InputFormat,
which specifies how the associated MapReduce job reads its input data (using a Hadoop
RecordReader),
and splits up the input data into logical sections, each referred to as an
InputSplit.
org.apache.hadoop.mapreduce.OutputFormat,
which specifies how the associated MapReduce job writes its output data (using a Hadoop
RecordWriter).
org.apache.hadoop.mapreduce.RecordReader,
which specifies how the mapped keys and values are located and retrieved
during MapReduce processing.
org.apache.hadoop.mapreduce.InputSplit,
which represents the data to be processed by an individual MapReduce
Mapper (one Mapper per
InputSplit).
InputFormat
class provided in the Oracle NoSQL Database distribution that the Hadoop MapReduce
infrastructure obtains access to a given KVStore and the desired table data
that the store contains.
oracle.kv.hadoop.table.TableInputFormat
oracle.kv.hadoop.table.TableInputSplit
oracle.kv.hadoop.table.TableRecordReader
Currently, Oracle NoSQL Database does not define a subclass of the Hadoop
OutputFormat
class. This means that it is not currently possible to
write data from a MapReduce job into a KVStore.
That is, from within a MapReduce job, you can only retrieve data from
a desired KVStore table and process that data.
hadoop.table example package is contained in the following
location within your Oracle NoSQL Database distribution:
/opt/ondb/kv/examples/
hadoop/table/
CountTableRows.java
LoadVehicleTable.java
KVSecurityCreation.java
KVSecurityUtil.java
In order to run the CountTableRows example MapReduce job, a
KVStore — either secure or non-secure — must first be deployed,
and a table must be created and populated with data. Thus, before attempting
to execute CountTableRows, either deploy a non-secure KVStore
using the steps outlined in
Appendix A,
or start a KVStore configured for security using the steps presented in
Appendix B.
Once a KVStore has been deployed as described in either
Appendix A,
or
Appendix B,
the standalone Java program LoadVehicleTable can be run
against either type of store to create a table with the name and schema
expected by CountTableRows, and populate it with rows of
data consistent with the table's schema. Once the table is created and
populated with example data, CountTableRows can then be
executed to run a MapReduce job that counts the number of rows of data
in the table.
In addition to the LoadVehicleTable program, the example package also contains the
classes KVSecurityCreation and KVSecurityUtil; which
are provided to support running CountTableRows against a secure
KVStore. The standalone Java program KVSecurityCreation is provided
as a convenience, and can be run to create (or delete) a password file or Oracle
Wallet — along with associated client side and server side login files —
that CountTableRows will need to interact with a secure store. And the
KVSecurityUtil class provides convenient utility methods that
CountTableRows uses to create and process the various security
artifacts it uses for secure access.
The next sections explain how to compile and execute LoadVehicleTable
to create and populate the required example table in the deployed store; how to
compile and execute KVSecurityCreation to create or delete any
security credentials that may be needed by CountTableRows; and
finally, how to compile, build (JAR), and then execute the CountTableRows
MapReduce job on the Hadoop cluster that was deployed for this example.
CountTableRows MapReduce job, a table named
vehicleTable — having the schema shown in the table below — must
be created in the KVStore that was deployed for this example; where the data types
specified in the schema are defined by the Oracle NoSQL Database Table API (see
oracle.kv.table.FieldDef.Type) .
| Field Name | Field Type | |||
|---|---|---|---|---|
| type | STRING | |||
| make | STRING | |||
| model | STRING | |||
| class | STRING | |||
| color | STRING | |||
| price | DOUBLE | |||
| count | INTEGER | |||
| Primary Key Field Names | ||||
| type | make | model | class | color |
| Shard Key Field Names | ||||
| type | make | model | ||
Thus, vehicleTable consists of rows representing a particular vehicle a dealer might have in stock for purchase. Each such row contains fields specifying the "type" of vehicle (for example, car, truck, SUV, etc.), the "make" of the vehicle (Ford, GM, Chrysler, etc.), the "model" (Explorer, Camaro, Lebaron, etc.), the vehicle "class" (4WheelDrive, FrontWheelDrive, etc.), "color", "price", and finally the number of vehicles in stock (the "count").
Although you can enter individual commands in the admin CLI to create a table with the above
schema, the preferred approach is to employ the
Oracle NoSQL Database Data Definition Language (DDL) to create the desired table. One way to
accomplish this is to follow the instructions presented in the next sections to compile
and execute the LoadVehicleTable program; which populates the desired
table after using the DDL to create it.
non-secure
or
secure
—
has been deployed,
the LoadVehicleTable program that is supplied with the example as
a convenience can be executed to create and populate the table named vehicleTable.
Before executing LoadVehicleTable though, that program must first
be compiled. To do this, type the following command from the OS command line:
> cd /opt/ondb/kv > javac -classpath lib/kvstore.jar:examples examples/hadoop/table/LoadVehicleTable.javawhich should produce the file:
/opt/ondb/kv/examples/hadoop/table/LoadVehicleTable.class
— Creating and Populating 'vehicleTable' with Example Data in a Non-Secure KVStore —
To execute LoadVehicleTable to create and then populate the
table named vehicleTable with example data in a KVStore configured
for non-secure access, type the following at the command line of a
node that has network connectivity with a node running the admin service
(for example, kv-host-1 itself):
> cd /opt/ondb/kv
> java -classpath lib/kvstore.jar:examples hadoop.table.LoadVehicleTable \
-store example-store -host kv-host-1 -port 5000 -nops 79 [-delete]
where the parameters -store, -host, -port,
and -nops are required.
In the example command line above, the argument -nops 79 specifies
that 79 rows be written to the vehicleTable. If more or less
than that number of rows is desired, then the value of the -nops
parameter should be changed.
If LoadVehicleTable is executed a second time and the
optional -delete parameter is specified, then all rows added by
any previous executions of LoadVehicleTable are deleted from the
table prior to adding the new rows. Otherwise, all pre-existing rows are left
in place, and the number of rows in the table will be increased by the specified
-nops number of new rows.
— Creating and Populating 'vehicleTable' with Example Data in a Secure KVStore —
To execute LoadVehicleTable against a secure KVStore deployed
and provisioned with a non-administrative user employing the steps presented in
Appendix B,
an additonal parameter must be added to the command line above. That is, type the following:
> cd /opt/ondb/kv
> javac -classpath lib/kvclient.jar:LoadVehicleTable examples/hadoop/table/LoadVehicleTable.java
> cp /opt/ondb/example-store/security/client.trust /tmp
> java -classpath lib/kvstore.jar:examples hadoop.table.LoadVehicleTable \
-store example-store -host kv-host-1 -port 5000 -nops 79 \
-security /tmp/example-user-client-pwdfile.login
[-delete]
where the single additonal -security parameter specifies the
location of the login properties file (associated with a password file
rather than an Oracle Wallet) for the given user (the alias); and
all other parameters are as described for the non-secure case.
To understand the -security parameter for this example,
recall from Appendix B
that a non-administrative user named example-user was created, and
password file based credential files (prefixed with that user name) were generated
for that user and placed under the /tmp system directory. That is,
the example login and password files generated in
Appendix B are:
/tmp
client.trust
example-user-client-pwdfile.login
example-user-server.login
example-user.passwd
Note that for this example, the user credential files must be co-located; where it
doesn't matter which directory they are located in, as long as they all reside in the
same directory accessible by the user. It is for this reason that the shared trust file
(client.trust) is copied into /tmp above. Co-locating
client.trust and example-user.passwd with the login file
(example-user-client-pwdfile.login) allows relative paths to be used for the
values of the oracle.kv.ssl.trustStore and oracle.kv.auth.pwdfile.file
system properties that are specified in the login file (or oracle.kv.auth.wallet.dir
if a wallet is used to store the user password). If those files are not co-located
with the login file, then absolute paths must be used for those properties.
At this point, the vehicleTable created in the specified KVStore (non-secure or secure)
should be populated with the desired example data. And the CountTableRows example
MapReduce job can be run to count the number of rows in that table.
CountTableRows program
must first be compiled and built for deployment to the Hadoop infrastructure.
In order to compile the CountTableRows program, a number of Hadoop
JAR files must be installed and available in the build environment for inclusion
in the program classpath. Those JAR files are:
For example, suppose that the 2.3.0 version of Hadoop that is
delivered via the 5.1.0 package provided by
Cloudera
(cdh) is installed under the <HADOOPROOT>
base directory. And suppose that the classes from that version of Hadoop
use the 1.1.3 version of commons-logging. Then,
to compile the CountTableRows program, type the following
at the command line (with the <HADOOPROOT> token
replaced with the appropriate directory path for your system):
> cd /opt/ondb/kv
> javac -classpath <HADOOPROOT>/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar: \
<HADOOPROOT>/hadoop/share/hadoop/common/hadoop-common-2.3.0-cdh5.1.0.jar: \
<HADOOPROOT>/hadoop/share/hadoop/mapreduce2/hadoop-mapreduce-client-core-2.3.0-cdh5.1.0.jar: \
<HADOOPROOT>/hadoop/share/hadoop/common/lib/hadoop-annotations-2.3.0-cdh5.1.0.jar: \
lib/kvclient.jar:examples \
examples/hadoop/table/CountTableRows.java
which produces the following files:
/opt/ondb/kv/examples/hadoop/table/
CountTableRows.class
CountTableRows$Map.class
CountTableRows$Reduce.class
If your specific environment has a different, compatible Hadoop distribution installed,
then simply replace the versions referenced in the example command line above with the
specific versions that are installed.
If you will be running CountTableRows against a non-secure
KVStore, then this is all you need; and the resulting class files should be placed in a
JAR file so that the program can be deployed to the example Hadoop cluster. For example,
to create a JAR file containing the class files needed to run CountTableRows
against a non-secure KVStore like that deployed in
Appendix A,
do the following:
> cd /opt/ondb/kv/examples > jar cvf CountTableRows.jar hadoop/table/CountTableRows*.classwhich should produce the file
CountTableRows.jar in the
/opt/ondb/kv/examples directory, with contents that look like:
0 Fri Feb 20 12:53:24 PST 2015 META-INF/
68 Fri Feb 20 12:53:24 PST 2015 META-INF/MANIFEST.MF
3842 Fri Feb 20 12:49:16 PST 2015 hadoop/table/CountTableRows.class
2623 Fri Feb 20 12:49:16 PST 2015 hadoop/table/CountTableRows$Map.class
3842 Fri Feb 20 12:49:16 PST 2015 hadoop/table/CountTableRows$Reduce.class
Note that when the command above is used to generate CountTableRows.jar,
the utility class KVSecurityUtil (see below) will not be included in the
resulting JAR file. Since CountTablesRows does not use that utility class
in the non-secure case, including it in the JAR file is optional.
— Building the Example for a Secure Environment —
If you will be running CountTableRows against a secure
KVStore such as that deployed in
Appendix B,
then in addition to compiling CountTableRows as described above,
additional security related artifacts need to be generated and included in
the build; where the additional artifacts include not only compiled class files,
but security credential files as well.
To support the secure version of CountTableRows, the utilitity class
KVSecurityUtil and the standalone program KVSecurityCreation
should also be compiled. That is,
> cd /opt/ondb/kv
> javac -classpath lib/kvstore.jar:examples examples/hadoop/table/KVSecurityCreation.java
> javac -classpath lib/kvstore.jar:examples examples/hadoop/table/KVSecurityUtil.java
> javac -classpath <HADOOPROOT>/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar: \
<HADOOPROOT>/hadoop/share/hadoop/common/hadoop-common-2.3.0-cdh5.1.0.jar: \
<HADOOPROOT>/hadoop/share/hadoop/mapreduce2/hadoop-mapreduce-client-core-2.3.0-cdh5.1.0.jar: \
<HADOOPROOT>/hadoop/share/hadoop/common/lib/hadoop-annotations-2.3.0-cdh5.1.0.jar: \
lib/kvclient.jar:examples \
examples/hadoop/table/CountTableRows.java
which produces the files:
/opt/ondb/kv/examples/hadoop/table/
CountTableRows.class
CountTableRows$Map.class
CountTableRows$Reduce.class
KVSecurityUtil.class
KVSecurityCreation.class
Unlike the non-secure case, the build artifacts needed to deploy CountTableRows
in a secure environment include more than just a single JAR file containing the generated class
files. For the secure case, it will be important to package some artifacts for deployment to the
client side of the application that communicates with the KVStore; whereas other artifacts
will need to be packaged for deployment to the server side. Although there are different
ways to achieve this "separation of concerns" when deploying a given application,
Appendix C
presents one particular model you can use to package and deploy the artifacts of an application
(such as CountTableRows) that will interact with a secure KVStore. With this in mind,
the sections below related to executing CountTableRows against a secure KVStore
each assume that the application has been built and packaged according to the instructions presented in
Appendix C.
CountTableRows against a non-secure KVStore
deployed in the manner described in Appendix A,
and have compiled and built CountTableRows in the manner presented
in the previous section,
then the MapReduce job initiated by CountTableRows can be deployed
and executed by typing the following at the command line of the Hadoop cluster's
access node (where line breaks are used only for readability):
> cd /opt/ondb/kv
> hadoop jar examples/CountTableRows.jar \
hadoop.table.CountTableRows \
-libjars /opt/ondb/kv/lib/kvclient.jar \
example-store \
kv-host-1:5000 \
vehicleTable \
/user/example-user/CountTableRows/vehicleTable/<000N>
where the hadoop command interpreter's -libjars argument
is used to include the third party library kvclient.jar in the classpath
of each MapReduce task executing on the cluster's DataNodes; so that those tasks can
access classes such as,
TableInputSplit
and
TableRecordReader.
Note that in the last argument, the example-user directory
component corresponds to a directory under the HDFS /user top-level
directory, and typically corresponds to the user who has initiated the MapReduce job.
This directory is usually created in HDFS by the Hadoop cluster administrator.
Additionally, the <000N> token in that argument represents a string
such as 0000, 0001, 0002, etc.
Although any string can be used for this token, using a different number for
"N" on each execution of the job makes it easier to keep track of results when
multiple executions of the job occur.
— Executing CountTableRows Against a Secure KVStore —
If you will be running CountTableRows against a secure
KVStore deployed in the manner presented in
Appendix B,
and if you have compiled, built, and packaged CountTableRows and all
the necessary artifacts in the manner described in
Appendix C,
then CountTableRows can be run against the secure KVStore by typing the
following at the command line of the Hadoop cluster's access node (where line breaks are
used only for readability):
> export=HADOOP_CLASSPATH $HADOOP_CLASSPATH:/opt/ondb/kv/lib/kvclient.jar:/opt/ondb/kv/examples/CountTableRows-pwdServer.jar
> cd /opt/ondb/kv
> hadoop jar examples/CountTableRows-pwdClient.jar \
hadoop.table.CountTableRows \
-libjars /opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/examples/CountTableRows-pwdServer.jar \
example-store \
kv-host-1:5000 \
vehicleTable \
/user/example-user/CountTableRows/vehicleTable/<000N> \
example-user-client-pwdfile.login \
example-user-server.login
where the mechanism used for storing the user password is a password file;
and the client side artifacts are highlighted in red, and the server side artifacts are
highlighted in purple.
Similarly, if the mechanism used for storing the user password is an Oracle Wallet (available only with the Oracle NoSQL Database Enterprise Edition), you would type the following at the access node's command line:
> export=HADOOP_CLASSPATH $HADOOP_CLASSPATH:/opt/ondb/kv/lib/kvclient.jar:/opt/ondb/kv/examples/CountTableRows-walletServer.jar
> cd /opt/ondb/kv
> hadoop jar examples/CountTableRows-walletClient.jar \
hadoop.table.CountTableRows \
-libjars /opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/examples/CountTableRows-walletServer.jar \
example-store \
kv-host-1:5000 \
vehicleTable \
/user/example-user/CountTableRows/vehicleTable/<000N> \
example-user-client-wallet.login \
example-user-server.login
In both cases above — password file and Oracle Wallet — notice the
additional JAR file (CountTableRows-pwdServer.jar or
CountTableRows-walletServer.jar) specified for both the
HADOOP_CLASSPATH environment variable and -libjars
paramenter. For a detailed explanation of the use and purpose of that server side
JAR file, as well as a description of the client side JAR file and the two
additional arguments at the end of the command line, refer to
Appendix C;
specifically, the section on
packaging
for a secure KVStore.
Whether running against a secure or non-secure store, as the job runs, assuming no errors, the output from the job will look like the following:
...
2014-12-04 08:59:47,996 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1344)) - Running job: job_1409172332346_0024
2014-12-04 08:59:54,107 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1372)) - map 0% reduce 0%
2014-12-04 09:00:16,148 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1372)) - map 7% reduce 0%
2014-12-04 09:00:17,368 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1372)) - map 26% reduce 0%
2014-12-04 09:00:18,596 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1372)) - map 56% reduce 0%
2014-12-04 09:00:19,824 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1372)) - map 100% reduce 0%
2014-12-04 09:00:23,487 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1372)) - map 100% reduce 100%
2014-12-04 09:00:23,921 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1383)) - Job job_1409172332346_0024 completed successfully
2014-12-04 09:00:24,117 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1390)) - Counters: 49
File System Counters
FILE: Number of bytes read=2771
FILE: Number of bytes written=644463
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2660
HDFS: Number of bytes written=32
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=6
Launched reduce tasks=1
Rack-local map tasks=6
Total time spent by all maps in occupied slots (ms)=136868
Total time spent by all reduces in occupied slots (ms)=2103
Total time spent by all map tasks (ms)=136868
Total time spent by all reduce tasks (ms)=2103
Total vcore-seconds taken by all map tasks=136868
Total vcore-seconds taken by all reduce tasks=2103
Total megabyte-seconds taken by all map tasks=140152832
Total megabyte-seconds taken by all reduce tasks=2153472
Map-Reduce Framework
Map input records=79
Map output records=79
Map output bytes=2607
Map output materialized bytes=2801
Input split bytes=2660
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=2801
Reduce input records=79
Reduce output records=1
Spilled Records=158
Shuffled Maps =6
Failed Shuffles=0
Merged Map outputs=6
GC time elapsed (ms)=549
CPU time spent (ms)=9460
Physical memory (bytes) snapshot=1888358400
Virtual memory (bytes) snapshot=6424895488
Total committed heap usage (bytes)=1409286144
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=32
To see the results of the job
— to verify that the program actually counted the correct number of rows in the table —
use the Hadoop CLI to display the contents of the MapReduce results file located in HDFS.
To do this, type the following at the command line of the Hadoop cluster's access node:
> hadoop fs -cat /user/example-user/CountTableRows/vehicleTable/<000N>/part-r-00000where
example-user and the <000N> token should
be replaced with the values you used when the job was run; appropriate to your particular
system. Assuming the table was populated with 79 rows, if the job was successful, then
the output should look like the following:
/type/make/model/class/color 79where
/type/make/model/class/color are the names of the fields making up the
PrimaryKey
of the vehicleTable; and 79 is the number of rows in the table.
/opt/ondb/kv.
/disk1/shard for host kv-host-1, /disk2/shard for host kv-host-2, and /disk3/shard for host kv-host-3.
> java -jar /opt/ondb/kv/lib/kvstore.jar makebootconfig \
-root /opt/ondb/example-store \
-config config.xml \
-port 5000 \
-admin 5001 \
-host kv-host-1 \
-harange 5002,5005 \
-num_cpus 0 \
-memory_mb 0 \
-capacity 3 \
-storagedir /disk1/shard
Then from kv-host-2, type:
> java -jar /opt/ondb/kv/lib/kvstore.jar makebootconfig \
-root /opt/ondb/example-store \
-config config.xml \
-port 5000 \
-admin 5001 \
-host kv-host-2 \
-harange 5002,5005 \
-num_cpus 0 \
-memory_mb 0 \
-capacity 3 \
-storagedir /disk2/shard
And finally from kv-host-3, type:
> java -jar /opt/ondb/kv/lib/kvstore.jar makebootconfig \
-root /opt/ondb/example-store \
-config config.xml \
-port 5000 \
-admin 5001 \
-host kv-host-3 \
-harange 5002,5005 \
-num_cpus 0 \
-memory_mb 0 \
-capacity 3 \
-storagedir /disk3/shard
> nohup java -jar /opt/ondb/kv/lib/kvstore.jar start -root /opt/ondb/example-store -config config.xml &which will start both an SNA and an admin service on the associated host.
> java -jar /opt/ondb/kv/lib/kvstore.jar runadmin -host kv-host-1 -port 5000 kv->Next, deploy the store by entering the following commands — either in succession, from the CLI prompt; or from a script, using the CLI command '
load -file <flnm>'.
configure -name kvstore-db plan deploy-zone -name zn1 -rf 3 -wait plan deploy-sn -znname zn1 -host kv-host-1 -port 5000 -wait plan deploy-admin -sn 1 -port 5001 -wait pool create -name snpool pool join -name snpool -sn sn1 plan deploy-sn -znname zn1 -host kv-host-2 -port 5000 -wait plan deploy-admin -sn 2 -port 5001 -wait pool join -name snpool -sn sn2 plan deploy-sn -znname zn1 -host kv-host-3 -port 5000 -wait plan deploy-admin -sn 3 -port 5001 -wait pool join -name snpool -sn sn3 change-policy -params "loggingConfigProps=oracle.kv.level=INFO;" topology create -name store-layout -pool snpool -partitions 300 plan deploy-topology -name store-layout -plan-name store-deploy-plan -wait
/opt/ondb/kv.
/disk1/shard for host kv-host-1, /disk2/shard for host kv-host-2, and /disk3/shard for host kv-host-3.
123456 (must be at least 6 characters).
example-user.
> java -jar /opt/ondb/kv/lib/kvstore.jar makebootconfig \
-root /opt/ondb/example-store \
-config config.xml \
-port 5000 \
-admin 5001 \
-host kv-host-1 \
-harange 5002,5005 \
-num_cpus 0 \
-memory_mb 0 \
-capacity 3 \
-storagedir /disk1/shard \
-store-security configure \
-pwdmgr pwdfile
Enter a password for the Java KeyStore:123456<RETURN>
Re-enter the KeyStore password for verification:123456<RETURN>
Created files
/opt/ondb/example-store/security/store.trust
/opt/ondb/example-store/security/store.keys
/opt/ondb/example-store/security/store.passwd
/opt/ondb/example-store/security/client.trust
/opt/ondb/example-store/security/security.xml
/opt/ondb/example-store/security/client.security
Note the value of the -store-security parameter for
kv-host-1 is configure.
After executing the command above, use a utility such as scp
to copy the resulting security directory to the other SN hosts; that is,
kv-host-2 and kv-host-3. For example,
> scp -r /opt/ondb/example-store/security example-user@kv-host-2:/opt/ondb/example-store
> scp -r /opt/ondb/example-store/security example-user@kv-host-3:/opt/ondb/example-store
store.trust 100% 508 0.5KB/s 00:00
store.keys 100% 1215 1.2KB/s 00:00
store.passwd 100% 39 0.0KB/s 00:00
client.trust 100% 508 0.5KB/s 00:00
security.xml 100% 2216 2.2KB/s 00:00
client.security 100% 255 0.3KB/s 00:00
After generating and distributing the security configuration files by
executing the commands shown above from the first SN host (kv-host-1),
login to each of the remaining SN hosts and enable security by
typing the commands shown below from the respective host's command line.
That is, from kv-host-2, type the following:
> java -jar /opt/ondb/kv/lib/kvstore.jar makebootconfig \
-root /opt/ondb/example-store \
-config config.xml \
-port 5000 \
-admin 5001 \
-host kv-host-2 \
-harange 5002,5005 \
-num_cpus 0 \
-memory_mb 0 \
-capacity 3 \
-storagedir /disk2/shard \
-store-security enable \
-pwdmgr pwdfile
And then from kv-host-3, type:
> java -jar /opt/ondb/kv/lib/kvstore.jar makebootconfig \
-root /opt/ondb/example-store \
-config config.xml \
-port 5000 \
-admin 5001 \
-host kv-host-3 \
-harange 5002,5005 \
-num_cpus 0 \
-memory_mb 0 \
-capacity 3 \
-storagedir /disk3/shard \
-store-security enable \
-pwdmgr pwdfile
For both commands above, note the value of the -store-security parameter
is enable rather than configure; as was used
with the first host.
> nohup java -jar /opt/ondb/kv/lib/kvstore.jar start -root /opt/ondb/example-store -config config.xml &which will start both an SNA and an admin service on the associated host.
> java -jar /opt/ondb/kv/lib/kvstore.jar runadmin \
-host kv-host-1 \
-port 5000 \
-security /opt/ondb/example-store/security/client.security
Logged in admin as anonymous
kv->
Next, deploy the store by entering the following commands — either in succession,
from the CLI prompt; or from a script, using the CLI command 'load -file <flnm>'.
configure -name kvstore-db plan deploy-zone -name zn1 -rf 3 -wait plan deploy-sn -znname zn1 -host kv-host-1 -port 5000 -wait plan deploy-admin -sn 1 -port 5001 -wait pool create -name snpool pool join -name snpool -sn sn1 plan deploy-sn -znname zn1 -host kv-host-2 -port 5000 -wait plan deploy-admin -sn 2 -port 5001 -wait pool join -name snpool -sn sn2 plan deploy-sn -znname zn1 -host kv-host-3 -port 5000 -wait plan deploy-admin -sn 3 -port 5001 -wait pool join -name snpool -sn sn3 change-policy -params "loggingConfigProps=oracle.kv.level=INFO;" topology create -name store-layout -pool snpool -partitions 300 plan deploy-topology -name store-layout -plan-name store-deploy-plan -waitAfter configuring the store and deploying the topology as shown above, create a user named
root, having administrative
privileges. To do this, type the following command at the CLI prompt:
plan create-user -name root -admin -wait Enter the new password: 123456<RETURN> Re-enter the new password: 123456<RETURN>
In addition to creating the root user, a non-administrative user should also be created, along with the necessary credentials. This second user is created and provisioned to demonstrate how an application can be run against a table using only the minimum required privileges; as opposed to full, "root" privileges.
— Generate "Root" User and Credentials —
To generate the root user along with the necessary security credentials, login to one of the hosts running the admin service (kv-host-1 for example) and type the following commands at the command line:
> java -jar /opt/ondb/kv/lib/kvstore.jar securityconfig pwdfile create \
-file /opt/ondb/example-store/security/root.passwd
Created
> java -jar /opt/ondb/kv/lib/kvstore.jar securityconfig pwdfile secret \
-file /opt/ondb/example-store/security/root.passwd -set -alias root
Enter the secret value to store: 123456<RETURN>
Re-enter the secret value for verification: 123456<RETURN>
Secret created
OK
> cp /opt/ondb/example-store/security/client.security /opt/ondb/example-store/security/root.login
> echo oracle.kv.auth.username=root >> /opt/ondb/example-store/security/root.login
> echo oracle.kv.auth.pwdfile.file=/opt/ondb/example-store/security/root.passwd >> /opt/ondb/example-store/security/root.login
Note that the contents of the client.security properties file are copied
to the file named root.login. The contents of that file are used when a
client that wishes to connect to the secure KVStore started above must authenticate as
the user named root. For the purposes of this document, this authentication
process will be referred to as logging in to the KVStore; and thus, the properties
file is referred to as a login file (or login properties file). For
convenience, the system properties oracle.kv.auth.username and
oracle.kv.auth.pwdfile.file are inserted into root.login;
which allows one to connect to the store as the root user without having to
specify those properties on the command line.
— Generate Non-Administrative User with Privileges —
To create the non-administrative user, along with the necessary credentials, first login to the admin CLI by typing the following at the command line of a node that has network connectivity with the admin service:
> java -jar /opt/ondb/kv/lib/kvstore.jar runadmin \
-host kv-host-1 \
-port 5000 \
-security /opt/ondb/example-store/security/root.login
Logged in admin as root
kv->
Next, create a custom role (named readwritemodifytables
for example) consisting of the privileges a user would need to create and populate
a table. After creating the necessary role, create a user named example-user
and then grant the new role to that user. To do this, enter the following commands
— either in succession, from the CLI prompt; or from a script, using the CLI
command 'load -file <flnm>'.
execute 'CREATE ROLE readwritemodifytables' execute 'GRANT SYSDBA TO readwritemodifytables' execute 'GRANT READ_ANY TO readwritemodifytables' execute 'GRANT WRITE_ANY TO readwritemodifytables' execute 'CREATE USER example-user IDENTIFIED BY \"123456\"' execute 'GRANT readwritemodifytables TO USER example-user'Note that the name of the user created above is not required to be the same as the OS user name under which the example is executed. The name above and its associated credentials are registered with the KVStore for the purposes of authenticating to the store, and so can be any value you wish to use.
— Generate Non-Administrative User Credentials —
Once the KVStore user example-user and its password have been created,
the KVSecurityCreation convenience program can be used to generate
the public and private credentials needed by that user to connect to the KVStore.
To do this, login to one of the hosts running the admin service
(kv-host-1 for example) and type the following at the command line:
> cd /opt/ondb/kv > javac -classpath lib/kvstore.jar:examples examples/hadoop/table/KVSecurityCreation.javawhich produces the following files:
/opt/ondb/kv/examples/hadoop/table/
KVSecurityUtil.class
KVSecurityCreation.class
Once KVSecurityCreation has been compiled, it can be executed to
generate the desired credential artifacts. If you want to store the password in
a clear text password file, then type the following at the command line:
> cd /opt/ondb/kv > java -classpath lib/kvstore.jar:examples hadoop.table.KVSecurityCreation -pwdfile example-user.passwd -set -alias example-user May 04, 2015 11:23:32 AM hadoop.table.KVSecurityUtil removeDir INFO: removed file [/tmp/example-user.passwd] May 04, 2015 11:23:32 AM hadoop.table.KVSecurityUtil removeDir INFO: removed file [/tmp/example-user-client-pwdfile.login] created login properties file [/tmp/example-user-client-pwdfile.login] created login properties file [/tmp/example-user-server.login] created credentials store [/tmp/example-user.passwd] Enter the secret value to store: 123456<RETURN> Re-enter the secret value for verification: 123456<RETURN> Secret createdOn the other hand, if you are using an Oracle Wallet (Enterprise Edition only) to store the user's password, then type the following:
> cd /opt/ondb/kv > java -classpath lib/kvstore.jar:examples hadoop.table.KVSecurityCreation -wallet example-user-wallet.dir -set -alias example-user May 04, 2015 11:30:54 AM hadoop.table.KVSecurityUtil removeDir INFO: removed file [/tmp/example-user-wallet.dir/cwallet.sso] May 04, 2015 11:30:55 AM hadoop.table.KVSecurityUtil removeDir INFO: removed directory [/tmp/example-user-wallet.dir] May 04, 2015 11:30:55 AM hadoop.table.KVSecurityUtil removeDir INFO: removed file [/tmp/example-user-client-wallet.login] created login properties file [/tmp/example-user-client-wallet.login] created login properties file [/tmp/example-user-server.login] created credentials store [/tmp/example-user-wallet.dir] Enter the secret value to store: 123456<RETURN> Re-enter the secret value for verification: 123456<RETURN> Secret createdCompare the artifacts generated when a password file is specified with the artifacts generated when a wallet is specified. When a password file is specified, you should see the following files:
/tmp
example-user-client-pwdfile.login
example-user-server.login
example-user.passwd
And when wallet storage is specified, you should see:
/tmp
example-user-client-wallet.login
example-user-server.login
/example-user-wallet.dir
cwallet.sso
Note that because this is an example for demonstration purposes, the credential
files generated by KVSecurityCreation are placed in the system's
/tmp directory. For your own applications, you may want to
place the credential files you generate in a more permanent location.
Note also that for both cases — password or wallet — two login properties
files are generated; one for client side connections, and one for server side connections.
The only difference between the client side login file and the server side login file is
that the client side login file specifies the username (the alias)
along with the location of the user's password
— specified by either the oracle.kv.auth.pwdfile or oracle.kv.auth.wallet.dir property.
Although optional, the reason for using two login files is to avoid passing private security
information to the server side; as explained in more detail in
Appendix C.
Additionally, observe that the server side login file (example-user-server.login)
is identical for both cases. This is because whether a password file or a wallet is used to
store the password, both use the same publicly visible communication transport information.
At this point, the KVStore has been deployed, configured for secure access, and provisioned with the necessary users and credentials; so that the table can be created and populated, and the example can be executed by a user whose password is stored either in a clear text password file or an Oracle Wallet (Enterprise Edition only) to demonstrate running against table data contained in a secure Oracle NoSQL Database store.
A final, important point to note is that the storage mechanism used for the example application's
user password (password file or Oracle Wallet) does not depend on the password
storage mechanism used by the KVStore with which that application will communicate. That is, although
Appendix B
(for convenience) deployed a secure KVStore using a password file rather than a wallet,
the fact that the KVStore placed the passwords it manages in a password file does not
prevent the developer/deployer of a client of that store from storing the client's
user password in an Oracle Wallet; or vice-versa. You should therefore view the use of
an Oracle Wallet or a password file by any client application as simply a "safe" place
(for some value of "safe") where the user password can be stored; which can be accessed
by only the user who owns the wallet or password file. This means that the choice of
password storage mechanism is at the discretion of the application developer/deployer;
no matter what mechanism is used by the KVStore itself.
With respect to running a MapReduce job against data contained in a secure KVStore,
a particularly important issue to address is related to the communication of user
credentials to the tasks run on each of the DataNodes on which the Hadoop
infrastructure executes the job. Recall from above that when using the
MapReduce
programming model defined by
Apache Hadoop
the tasks executed by a MapReduce job each act as a client of the KVStore. Thus,
if the store is configured for secure access, in order to retrieve the desired data
from the store, each task must have access to the credentials of the user associated
with that data. As described in the Oracle NoSQL Database Security Guide, the
typical mechanism for providing the necessary credentials to a client of a secure store
is to manually install the credentials on the client's local file system;
for example, by employing a utility such as scp. Although that mechanism
is practical for most clients of a secure KVStore, it is extremely impractical for a
MapReduce job. This is because a MapReduce job consists of multiple tasks running in
parallel, in separate address spaces, each with a separate file system that is generally
not under the control of the user. Assuming then, that write access is granted by the
Hadoop administrator (a problem in and of itself), this means that manual installation
of the client credentials for every possible user known to the KVStore would need to
occur on the file system of each of the multiple nodes in the Hadoop cluster; something
that may be very difficult to achieve.
To address this issue, the sections below present a model that developers and deployers can employ to facilitate the communication of each user's credentials to a given MapReduce job from the client side of the job; that is, from the address space controlled by the job's client process, owned by the user. As described below, this model will consist of two primary components: a programming model for executing MapReduce jobs that retrieve and process data contained in tables located in a secure KVStore; and a set of "best practices" for building, packaging, and deploying those jobs. Although there is nothing preventing a user from manually installing the necessary security credentials on all nodes in a given cluster, doing so is not only impractical, but may result in various security vulnerabilitites. Combining this programming model with the deployment best practices that are presented will help developers and deployers not only avoid the need to manually pre-install credentials on the DataNodes of the Hadoop cluster, but will also prevent the sort of security vulnerabilities that can occur with manual installation.
ResourceManager. If the job will be run against a secure KVStore,
then prior to initiating the job, the client must initialize the job's
TableInputFormat
with the following three pieces of information:
PasswordCredentials
containing the username and password the client will present to the store during authentication.
CountTableRows in this case — invokes the setKVSecurity
method defined in
TableInputFormat.
Once this initialization has been performed and the job has been initiated, the job uses that
TableInputFormat
to create and assign a
TableInputSplit
(a split) to each of the Mapper tasks that will run on one of the DataNodes
in the cluster. The
TableInputFormat
needs the information initialized by the setKVSecurity method for two reasons:
TableInputFormat
(and thus, it splits) with the information listed above, the model also requires that the
public and private security credentials referenced by that information be communicated to the
TableInputFormat,
as well as the splits, in a secure fashion. How this is achieved depends on whether that
information is being communicated to the
TableInputFormat
on the client side of the application, or to the splits on the server side.
— Communicating Security Credentials to the Splits —
To facilitate communication of the user's security credentials to the splits distributed
to each of the DataNodes of the cluster, the model separates public security
information from the private information (the username and password), and then
stores the private information as part of each split's internal state, rather than on the
local file system of each associated DataNode; which may be vulnerable or difficult/impossible
to secure. For communication of the public contents of the login and trust files
to each such split, the model supports an (optional) mechanism that allows the application
to communicate that information as Java resources that each split retrieves from
the classpath of the split's Java VM. This avoids the need to manually transfer the contents
of those files to each DataNode's local file system, and also avoids the potential security
vulnerabilities that can result from manual installation on those nodes. Note that when
an application wishes to employ this mechanism, it will typically include the necessary
information in a JAR file that is specified to the MapReduce job via the -libjars
hadoop command line directive.
The intent of the mechanism just described is to allow applications to exploit the Hadoop infrastructure to automatically distribute the public login and trust information to each of the job's splits via a JAR file added to the classpath on each remote DataNode. But it is important to note that although this mechanism is used to distribute the application's public credentials, it must not be used to distribute any of the private information related to authentication; specifically, the username and password. This is important because a JAR file that is distributed to the DataNodes in the manner described may be cached on the associated DataNode's local file system; which might expose a vulnerability. As a result, private authentication information is only communicated as part of each split's internal state.
The separation of public and private credentials supported by this model not only prevents caching the private credentials on each DataNode, but also facilitates the ability to guarantee the confidentiality of that information; via whatever external third party secure communication mechanism the current Hadoop implementation happens to employ. This capability is also important to support the execution of Hive queries against a secure store.
— Communicating Security Credentials to the TableInputFormat —
With respect to the job's
TableInputFormat,
the programming model supports different options for communicating the user's security information.
This is because the
TableInputFormat
operates only on the access node, on the client side of the job; which means that there is
only one file system that needs to be secured. Additionally, unlike the splits, the
TableInputFormat
is not sent on the wire. Thus, as long as only the user is granted read privileges,
both the public and private security information can be installed on the access node's
file system without fear of compromise. For this case, the application would typically
use system properties (on the command line) to specify the fully-qualified paths to the
login, trust, and password files (or Oracle Wallet); which the
TableInputFormat
would then read from the local file system, retrieving the necessary public and private
security information.
A second option for communicating the user's security credentials to the
TableInputFormat
is to include the public and private information as resources in the client side
classpath of the Java VM in which the
TableInputFormat
runs. This is the option employed by the example presented in this document, and is
similar to what was described above for the splits. This option demonstrates how an
application's build model can be exploited to simplify not only the applications's
command line, but also the deployment of secure MapReduce jobs in general. As was the
case with the splits, applications will typically communicate the necessary security
information as Java resources by including that information in a JAR file. But rather
than using the -libjars hadoop command line directive to specify the
JAR file to the server side of the MapReduce job, in this case, because the
TableInputFormat
operates on only the client side access node, the JAR file would simply be added to
the HADOOP_CLASSPATH environment variable.
Rather than manually installing the necessary security artifacts (login file, trust file, password file or Oracle Wallet) on each DataNode in the cluster, user's should instead install those artifacts only on the cluster's single access node; the node from which the client application is executed. The client application can then retrieve each artifact from the local environment, repackage the necessary information, and then employ mechanisms provided by the Hadoop infrastructure to transfer that information to the appropriate components of the MapReduce job that will be executed.
For example, as described in the previous section, your client application can be designed
to retrieve the username and location of the password from the command line, a configuration
file, or a resource in the client classpath; where the location of the user's password is
a locally installed password file or Oracle Wallet (Enterprise Edition only) that can only
be read by the user. After retrieving the username from the command line and the password
from the specified location, the client uses that information to create the user's
PasswordCredentials,
which are transferred to each MapReduce task via the splits that are created by the job's
TableInputFormat.
Using this model, the user's
PasswordCredentials,
are never written to the file systems of the cluster's DataNodes. They are only held in each
task's memory. As a result, the integrity and confidentiality of those credentials only
needs to be provided when on the wire; which can be achieved by using whatever external
third party secure communication mechanism the current Hadoop implementation happens
to employ.
With respect to the transfer of the public login and trust artifacts, the client application
can exploit the mechanisms provided by the Hadoop infrastructure to automatically transfer
classpath (JAR) artifacts to the job's tasks. As demonstrated by the CountTableRows
example presented in the body of this document, the client application's build process can be
designed to separate the application's class files from its public security artifacts.
Specifically, the application's class files (and optionally, the public and private credentials)
can be placed in a local (to the access node) JAR file for inclusion in the classpath of the
client itself; while only the public security artifacts (the public login properties and client
trust information) are placed in a separate JAR file that can be added to the
-libjars specification of the hadoop command line for inclusion in the classpath
of each MapReduce task.
— Review: Application Packaging for the Non-Secure Case —
To understand how the packaging model discussed here can be employed when executing an
application against a secure KVStore, it may be helpful to first review how the
CountTableRows example is executed against a non-secure store. Recall from
the previous sections, for the non-secure case, the following command was executed
to produce a JAR file containing only the class files needed by CountTableRows.
> cd /opt/ondb/kv/examples > jar cvf CountTableRows.jar hadoop/table/CountTableRows*.classwhich produces the file
CountTableRows.jar, whose contents look like:
0 Fri Feb 20 12:53:24 PST 2015 META-INF/
68 Fri Feb 20 12:53:24 PST 2015 META-INF/MANIFEST.MF
3842 Fri Feb 20 12:49:16 PST 2015 hadoop/table/CountTableRows.class
2623 Fri Feb 20 12:49:16 PST 2015 hadoop/table/CountTableRows$Map.class
3842 Fri Feb 20 12:49:16 PST 2015 hadoop/table/CountTableRows$Reduce.class
Then the following commands can be used to execute the CountTableRows
example MapReduce job against a non-secure KVStore:
> export=HADOOP_CLASSPATH $HADOOP_CLASSPATH:/opt/ondb/kv/lib/kvclient.jar
> cd /opt/ondb/kv
> hadoop jar examples/non_secure_CountTableRows.jar hadoop.table.CountTableRows \
-libjars /opt/ondb/kv/lib/kvclient.jar \
example-store \
kv-host-1:5000 \
vehicleTable \
/user/example-user/CountTableRows/vehicleTable/0001
Note that there are three classpaths that must be set when a MapReduce job is
executed. First, the jar specification to the hadoop command interpreter
makes the class files of the main program (CountTableRows in this case)
accessible to the hadoop launcher mechanism; so that the program can be loaded and
executed. Next, the HADOOP_CLASSPATH environment variable must be
set to include any third party libraries that the program or the Hadoop framework
(running on the local access node) may need to load. For the example above,
kvclient.jar is added to HADOOP_CLASSPATH so
that the Hadoop framework's job initiation mechanism on the access node can access
TableInputFormat
and its related classes.
Finally, the hadoop command interpreter's -libjars argument is
used to include any third party libraries in the classpath of each MapReduce task
executing on the cluster's DataNodes. Again, for the case above, kvclient.jar
is specified in -libjars so that each MapReduce task can access
classes such as,
TableInputSplit
and
TableRecordReader.
— Application Packaging for the Secure Case —
Compare the non-secure case above with what would be done to run the
CountTableRows MapReduce job against a secure KVStore. For
the secure case, two JAR files are built; one for the classpath on
the client side, and one for the classpaths of the DataNodes on the server side. The
first JAR file will be added to the client side classpath and includes not only the
class files for the application but also the public and private credentials the
client will need to interact with the secure KVStore; where including the public and
private credentials in the client side JAR file avoids the inconvenience of having to
specify that information on the command line. The second JAR file will be added
(via the -libjars argument) to the DataNode classpaths on the server
side, and will include only the user's public credentials.
As described in
Appendix B,
the user's password can be stored in either a clear text password file or an Oracle
Wallet. As a result, how the first JAR is generated is dependent on whether a password file
is used or a wallet. For example, assuming that a password file is used and the
user's security artifacts are generated using the KVSecurityCreation
program in the manner presented in
Appendix B,
to generate both the client side and server side JAR files for the CountTableRows
example application, type the following:
> cd /opt/ondb/kv/examples > jar cvf CountTableRows-pwdClient.jar hadoop/table/CountTableRows*.class hadoop/table/KVSecurityUtil*.class > cd /opt/ondb/example-store/security > jar uvf /opt/ondb/kv/examples/CountTableRows-pwdClient.jar client.trust > cd /tmp > jar uvf /opt/ondb/kv/examples/CountTableRows-pwdClient.jar example-user-client-pwdfile.login > jar uvf /opt/ondb/kv/examples/CountTableRows-pwdClient.jar example-user.passwd > cd /opt/ondb/example-store/security > jar cvf /opt/ondb/kv/examples/CountTableRows-pwdServer.jar client.trust > cd /tmp > jar uvf /opt/ondb/kv/examples/CountTableRows-pwdServer.jar example-user-server.loginwhich produces the client side JAR file named
CountTableRows-pwdClient.jar,
with contents that look like:
0 Mon May 04 13:01:04 PDT 2015 META-INF/
68 Mon May 04 13:01:04 PDT 2015 META-INF/MANIFEST.MF
3650 Mon May 04 13:00:52 PDT 2015 hadoop/table/CountTableRows.class
2623 Mon May 04 13:00:52 PDT 2015 hadoop/table/CountTableRows$Map.class
437 Mon May 04 13:00:52 PDT 2015 hadoop/table/CountTableRows$Reduce.class
6628 Mon May 04 13:00:52 PDT 2015 hadoop/table/KVSecurityUtil.class
508 Wed Apr 22 12:23:32 PDT 2015 client.trust
322 Mon May 04 11:23:32 PDT 2015 example-user-client-pwdfile.login
34 Mon May 04 11:23:38 PDT 2015 example-user.passwd
and produces the server side JAR file named CountTableRows-pwdServer.jar,
with contents that look like:
0 Mon May 04 13:01:04 PDT 2015 META-INF/
68 Mon May 04 13:01:04 PDT 2015 META-INF/MANIFEST.MF
508 Wed Apr 22 12:23:32 PDT 2015 client.trust
255 Mon May 04 11:30:54 PDT 2015 example-user-server.login
Alternatively, if KVSecurityCreation was used to generate wallet based
artifacts for CountTableRows, then the client side and server side
JAR files would be generated by typing:
> cd /opt/ondb/kv/examples
> jar cvf CountTableRows-walletClient.jar hadoop/table/CountTableRows*.class hadoop/table/KVSecurityUtil*.class
> cd /opt/ondb/example-store/security
> jar uvf /opt/ondb/kv/examples/CountTableRows-walletClient.jar client.trust
> cd /tmp
> jar uvf /opt/ondb/kv/examples/CountTableRows-walletClient.jar example-user-client-wallet.login
> jar uvf /opt/ondb/kv/examples/CountTableRows-walletClient.jar example-user-wallet.dir
> cd /opt/ondb/example-store/security
> jar cvf /opt/ondb/kv/examples/CountTableRows-walletServer.jar client.trust
> cd /tmp
> jar uvf /opt/ondb/kv/examples/CountTableRows-walletServer.jar example-user-server.login
each with contents identical or analogous to the contents of the JAR files for
the password case. That is,
0 Mon May 04 13:22:36 PDT 2015 META-INF/
68 Mon May 04 13:22:36 PDT 2015 META-INF/MANIFEST.MF
3650 Mon May 04 13:00:52 PDT 2015 hadoop/table/CountTableRows.class
2623 Mon May 04 13:00:52 PDT 2015 hadoop/table/CountTableRows$Map.class
437 Mon May 04 13:00:52 PDT 2015 hadoop/table/CountTableRows$Reduce.class
6628 Mon May 04 13:00:52 PDT 2015 hadoop/table/KVSecurityUtil.class
508 Wed Apr 22 12:23:32 PDT 2015 client.trust
324 Mon May 04 11:30:54 PDT 2015 example-user-client-wallet.login
0 Mon May 04 11:30:54 PDT 2015 example-user-wallet.dir/
3677 Mon May 04 11:31:00 PDT 2015 example-user-wallet.dir/cwallet.sso
and
0 Mon May 04 13:01:04 PDT 2015 META-INF/
68 Mon May 04 13:01:04 PDT 2015 META-INF/MANIFEST.MF
508 Wed Apr 22 12:23:32 PDT 2015 client.trust
255 Mon May 04 11:30:54 PDT 2015 example-user-server.login
Finally, in a fashion similar to that described for the non-secure case above, to
execute the CountTableRows MapReduce job — using a password file —
against a secure KVStore, you would type the following:
> export=HADOOP_CLASSPATH $HADOOP_CLASSPATH:/opt/ondb/kv/lib/kvclient.jar:/opt/ondb/kv/examples/CountTableRows-pwdServer.jar
> cd /opt/ondb/kv
> hadoop jar examples/CountTableRows-pwdClient.jar \
hadoop.table.CountTableRows \
-libjars /opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/examples/CountTableRows-pwdServer.jar \
example-store \
kv-host-1:5000 \
vehicleTable \
/user/example-user/CountTableRows/vehicleTable/0001 \
example-user-client-pwdfile.login \
example-user-server.login
Similarly, if the application stores its password in an Oracle Wallet, then you
would type:
> export=HADOOP_CLASSPATH $HADOOP_CLASSPATH:/opt/ondb/kv/lib/kvclient.jar:/opt/ondb/kv/examples/CountTableRows-walletServer.jar
> cd /opt/ondb/kv
> hadoop jar examples/CountTableRows-walletClient.jar \
hadoop.table.CountTableRows \
-libjars /opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/examples/CountTableRows-walletServer.jar \
example-store \
kv-host-1:5000 \
vehicleTable \
/user/example-user/CountTableRows/vehicleTable/0001 \
example-user-client-wallet.login \
example-user-server.login
When comparing the command lines above with the command line used for the non-secure
case, you should notice that HADOOP_CLASSPATH and -libjars
both have been augmented with the JAR file that contains only the public
login and trust credentials (CountTableRows-pwdServer.jar or
CountTableRows-walletServer.jar); whereas the local classpath
of the client side of the application is augmented — via the jar directive —
with the JAR file that includes both the public and private credentials (CountTableRows-pwdClient.jar
or CountTableRows-walletClient.jar). The only other difference with the
non-secure case is the two additional arguments at the end of the argument list;
example-user-client-pwdfile.login (or example-user-client-wallet.login)
and example-user-server.login. The values of those arguments specify, respectively,
the names of the client side and server side login files; which will be retrieved as
resources from the corresponding JAR file.
Observe that when you package and execute your MapReduce application in a manner like that shown in the example above, there is no need to specify the username or password file (or wallet) on the command line; as that information is included as part of the client side JAR file. Additionally, the server side JAR file that is transferred from the access node to the job's DataNodes does not include that private information; which is important because that transferred JAR file will be cached in the file system of each of those DataNodes.
As the example above demonstrates, the programming model
for MapReduce and Oracle NoSQL Database Security supports (even encourages) the
best practices presented in this section for building, packaging, and deploying
any given MapReduce job that employs the Oracle NoSQL Database Table API to retrieve
and process data in a given KVStore — either secure or non-secure. As a result,
simply generating separate JAR files — a set of JAR files for the secure case,
and one for the non-secure case — allows deployers to conveniently run the job
with or without security.
Note that this model for separating public and private user credentials will play an important role when executing Hive queries against table data in a secure KVStore.
Copyright (c) 2011, 2015 Oracle and/or its affiliates. All rights reserved.