Apache Hive
or
Oracle Big Data SQL
queries against data written via the Oracle NoSQL Database Table API.See: Description
Class | Description |
---|---|
LoadRmvTable |
Class that creates an example table in a given NoSQL Database store and
then uses the Table API to populate the table with sample records.
|
Apache Hive
or
Oracle Big Data SQL
queries against data written via the Oracle NoSQL Database Table API.
Oracle NoSQL Database Hadoop integration classes
,
which support running Hadoop MapReduce jobs against data stored in an
Oracle NoSQL Database table, it was natural to also provide new interfaces
and classes which support running Hive queries against such table data
(since a typical Hive query generally results in the execution of a
MapReduce job). In addition to describing the core interfaces and classes
involved in running a Hive query against data from a table located in a
given Oracle NoSQL Database store (a KVStore), the information
presented below also walks through the steps to take to execute a
given set of basic Hive queries against example table data contained in
such a store; either secure
or non-secure.
Apache Hive
and its programming model; that is, become familiar with how to write
and execute a Hive query.
Apache Hadoop
;
specifically, become familiar with how Hive and Hadoop interact.
/opt/ondb/kv
—
that is network reachable from the nodes of the Hadoop cluster.
/opt/ondb/example-store
—
to 3 machines (real or virtual) with host names, kv-host-1,
kv-host-2, and kv-host-3; where an admin service,
listening on port 5000, is deployed on each host.
Appendix B
of the Hadoop/Table API Example
to securely connect to the
store's admin service and create a KVStore user named example-user;
along with the appropriate security artifacts (login file,
trust file, and either password file or
Oracle Wallet [Enterprise Edition only]).
/opt/apps
.
steps
presented in that example to create and populate a table named
vehicleTable with example data consisting of only
primitive data types.
LoadRmvTable
program provided
with this example, and then follow the steps presented in the sections of
Appendix A
of this document to create and populate a table named rmvTable with example
data consisting of both primitive and non-primitive data types.
<KVHOME>
and <KVROOT>
environment variables, as well as the store name,
host names, admin port, and example code location described above should allow you to more easily
understand and use the example commands presented below. Combined with the information contained
in the Oracle NoSQL Database Getting Started Guide, along with the
Oracle NoSQL Database Admin Guide, the Oracle NoSQL Database Security Guide
and the
Hadoop/Table API Example,
you should then be able to generalize and extend these examples to your own
particular development scenario; substituting the values specific to the
given environment where necessary.
Note that detailed instructions for deploying a non-secure KVStore are provided in
Appendix A
of the Hadoop/Table API Example. Similarly,
Appendix B
of the Hadoop/Table API Example provides instructions for deploying a
KVStore configured for security.
A Brief Hive Primer
Paraphrasing wikipedia,
Apache Hive
is a data warehouse infrastructure built on top of
Apache Hadoop
that facilitates querying datasets residing in distributed file systems or data stores such as
Hadoop HDFS
or
Amazon S3
.
Additionally, Hive also provides a pluggable programming model that allows
you to specify custom interfaces and classes that support querying
data residing in data sources other than HDFS and S3; in particular,
data written to an Oracle NoSQL Database table.
To access and analyze data stored in these data sources, Hive provides a
mechanism to project structure onto the data and query the data using
a SQL-like language called the Hive Query Language, or
HQL. Depending on the complexity of a given Hive query,
the Hive infrastructure may construct and deploy a set of MapReduce
jobs to retrieve and process the data or, when possible, it may simply
satisfy the query via the metadata stored in the Hive metastore
(Derby or MySQL).
In addition to the Hive infrastructure itself, Hive also provides a convenient client-side command line interface (the Hive CLI); which allows you to interact with the Hive infrastructure to create a Hive external table and then map it to the data located in a source like those described above.
As indicated above, a set of interfaces and classes that satisfy the
Hive programming model are provided by Oracle NoSQL Database which support
running Hive queries against table data contained in a KVStore
— either secure or non-secure.
These classes are located in the
oracle.kv.hadoop.hive.table
package, and consist of the following Hive and Hadoop types:
org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
,
which is the mechanism (pluggable interface) to use to specify the
location of the data the Hive infrastructure should process, as well
as how to process that data; and which consists of the following components:
org.apache.hadoop.mapred.InputFormat
,
which specifies how the associated MapReduce job reads its input data
(from an Oracle NoSQL Database table).
org.apache.hadoop.mapred.OutputFormat
,
which specifies how the associated MapReduce job writes its output.
org.apache.hadoop.hive.serde2.SerDe
,
which is used to deserialize the table data that is retrieved and
sent to the Hive infrastructure and/or Hadoop MapReduce job for processing,
and to serialize data input to Hive that will be written to the table (currently not supported).
org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler
interface; which, as described in Appendix D
,
supports the decomposition of a query's WHERE
clause (the predicates) into information
that can be passed to the database so that some/all search processing can be performed in the database itself
rather than on the client side.
org.apache.hadoop.mapred.RecordReader
,
which specifies how the mapped keys and values are located and retrieved
during any MapReduce processing performed as part of executing a Hive query.
org.apache.hadoop.mapred.InputSplit
,
which represents the data to be processed by an individual Mapper
that operates during the MapReduce processing performed as part of
executing a Hive query.
HiveStorageHandler
provided in the Oracle NoSQL Database distribution that the Hive infrastructure
obtains access to a given KVStore and the table data on which to run
the desired Hive query.
oracle.kv.hadoop.hive.table.TableStorageHandler
oracle.kv.hadoop.hive.table.TableHiveInputFormat
oracle.kv.hadoop.hive.table.TableHiveInputSplit
oracle.kv.hadoop.hive.table.TableHiveRecordReader
oracle.kv.hadoop.hive.table.TableSerDe
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector
interface that support deserialization of the primitive and non-primitive data types defined by the Oracle NoSQL Database Table API (see below).
ObjectInspector
interface are used in the deserialization process to convert the associated data type
defined by the Oracle NoSQL Database Table API to its corresponding type in the
Hive data model:
oracle.kv.hadoop.hive.table.TableBinaryObjectInspector
.
oracle.kv.hadoop.hive.table.TableBooleanObjectInspector
.
oracle.kv.hadoop.hive.table.TableDoubleObjectInspector
.
oracle.kv.hadoop.hive.table.TableFloatObjectInspector
.
oracle.kv.hadoop.hive.table.TableIntObjectInspector
.
oracle.kv.hadoop.hive.table.TableLongObjectInspector
.
oracle.kv.hadoop.hive.table.TableEnumObjectInspector
oracle.kv.hadoop.hive.table.TableArrayObjectInspector
oracle.kv.hadoop.hive.table.TableMapObjectInspector
oracle.kv.hadoop.hive.table.TableRecordObjectInspector
oracle.kv.table.FieldDef.Type
)
is mapped to only a subset of the types defined by Hive's
data model. Specifically, when creating a Hive external table so that you
can query the data in a given Oracle NoSQL Database table, the Hive table
must be created with a schema consistent with the mappings shown in the following
table:
Oracle NoSQL Database Table API | Hive |
---|---|
FieldDef.Type.STRING | STRING |
VARCHAR | |
CHAR | |
FieldDef.Type.BOOLEAN | BOOLEAN |
FieldDef.Type.BINARY | BINARY |
FieldDef.Type.FIXED_BINARY | BINARY |
TINYINT | |
SMALLINT | |
FieldDef.Type.INTEGER | INT |
FieldDef.Type.LONG | BIGINT |
FieldDef.Type.FLOAT | FLOAT |
DECIMAL | |
FieldDef.Type.DOUBLE | DOUBLE |
FieldDef.Type.ENUM | STRING |
TIMESTAMP | |
DATE | |
FieldDef.Type.ARRAY | ARRAY |
FieldDef.Type.MAP | MAP<STRING, data_type> |
FieldDef.Type.RECORD | STRUCT<col_name : data_type, ...> |
UNIONTYPE<data_type, data_type, ...> |
It is important to understand that when using Hive to query data in an
Oracle NoSQL Database table, the schema of the Hive external table you
create is dependent on the schema of the corresponding Oracle NoSQL Database
table you wish to query. Thus, if you create a Hive external table with
a schema that includes a Hive data type that is not mapped
from an Oracle NoSQL Database
FieldDef.Type
,
then an error will occur when any attempt is made to query the table.
— YARN versus MapReduce version 1 —
Currently, Hadoop deployments include two versions of MapReduce.
The first version (referred to as MRv1) is the
original version of MapReduce; and consists of interfaces and classes
from the Java package org.apache.hadoop.mapred
.
The newer version of MapReduce is referred to as YARN
(Yet Another Resource Negotiator) or, more generally,
MRv2; and resides in the package
org.apache.hadoop.mapreduce
. Unfortunately, the
Table API Hive integration
classes must address the existence of both versions of MapReduce, for the
following reasons:
As a result, the Oracle NoSQL Database Table API Hadoop integration
classes employ MRv2.
core classes
listed above must be subclasses of the previously described
MRv1 classes
.
For example, rather than subclassing the MRv2 based Table API Hadoop integration class
TableInputFormat
(which would be preferred), because of the incompatability between MRv1 and MRv2,
the Table API Hive integration class described above
(TableHiveInputFormat
)
actually subclasses the Hadoop MRv1 class
org.apache.hadoop.mapred.InputFormat
. Thus, to exploit and reuse the mechanisms
provided by the
Table API Hadoop integration
classes, the
Table API Hive integration
classes presented here internally create, manage, and delegate to subclasses of the
appropriate MRv2 based classes.
It is also important to note that because the
Table API Hadoop integration
classes do not currently support writing data from a MapReduce job into
a KVStore, the
Table API Hive integration
classes do not support queries that modify the contents of a table in a KVStore.
Example 1: Executing Hive Queries Against Primitive Data Types
This first example demonstrates how to execute various Hive queries against a
simple Oracle NoSQL Database table; defined with a schema consisting of
primitive data types only. To run the queries described
in this example, after satisfying the prerequisites
listed above, if not done so already, you should deploy either a
non-secure
or a
secure
KVStore (or both), and then
create and populate
a table named vehicleTable in each such store. If any of the stores you
deploy are configured for secure access, then you must also follow the steps provided in
Appendix C
of this document to configure Hive with the artifacts and environment necessary to interact
with a secure KVStore. Once these initial steps are performed, you can then create a
Hive external table mapped to the KVStore vehicleTable, and execute the
set of example Hive queries (described below) against the data stored in that table.
Create a Hive External Table Mapped to the Oracle NoSQL Database 'vehicleTable'
After satisfying the above prerequisites
related to the Hive programming model and its CLI, you can create a Hive external table
mapped to the Oracle NoSQL Database 'vehicleTable' located in the desired KVStore,
and then query the data in that table. To do this, first execute the Hive interactive CLI
by typing the following at the command line of the Hive client node:
> hivewhich should then present the following Hive CLI command prompt:
hive>At this point, the desired Hive commands (table creation and/or queries) can be executed.
— Creating a Hive External Table Mapped to 'vehicleTable' in a Non-Secure KVStore —
If you will be running your Hive query against a non-secure KVStore
deployed in the manner described in
If you will be running your Hive query against a secure KVStore
deployed in the manner described in
Appendix A
of the Hadoop/Table API Example, after adding kvclient.jar
and threetenbp.jar
to
the HIVE_AUX_JARS_PATH
environment variable of the Hive client
(as descibed in
Appendix B
of this document), you can create the external Hive table required by this example by typing the following
at the Hive CLI command prompt:
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
vehicleTable (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT)
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "vehicleTable",
"oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3");
The command above applies the required data model mapping
to create a Hive table named vehicleTable with columns whose types are consistent
with the corresponding fields of the Oracle NoSQL Database table created and populated in the
Hadoop/Table API Example
;
which was also named vehicleTable. Additionally, you should note the
following points:
— Creating a Hive External Table Mapped to 'vehicleTable' in a Secure KVStore —
"oracle.kv.hosts"
property; as long as at least one valid administrative host (and port) is included.
"oracle.kv.hadoop.hosts"
property in the example command above:
Oracle Big Data SQL
system.
Appendix B
of the Hadoop/Table API Example, and if your password storage employs a
password file instead of an Oracle Wallet, then you can create the external Hive
table required by this example by typing the following at the Hive CLI command
prompt:
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
vehicleTablePasswd (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT)
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "vehicleTable",
"oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3",
"oracle.kv.security" = "/tmp/hive-nosql.login",
"oracle.kv.ssl.trustStore" = "/tmp/client.trust",
"oracle.kv.auth.username" = "example-user",
"oracle.kv.auth.pwdfile.file" = "/tmp/example-user.passwd");
On the other hand, if your password storage employs an Oracle Wallet instead of a
password file, then type the following at the command prompt to create the required
Hive table,
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
vehicleTableWallet (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT)
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "vehicleTable",
"oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3",
"oracle.kv.security" = "/tmp/hive-nosql.login",
"oracle.kv.ssl.trustStore" = "/tmp/client.trust",
"oracle.kv.auth.username" = "example-user",
"oracle.kv.auth.wallet.dir" = "/tmp/example-user-wallet.dir");
where the commands above each apply the required
data model mapping
to create a Hive table named vehicleTablePasswd and vehicleTableWallet
respectively; each with the same structure, schema, and attributes as the
vehicleTable created in the non-secure case. And where the same
additional points
noted for the non-secure case also apply in this case.
Note the additional properties specified in the
Thus, from the Hive command prompt, type the following queries for each Hive table
and observe that the output of each query you execute, if successful, looks
something like the corresponding sample output shown.
— List the Contents of Each Row in 'vehicleTable' —
— Count the Total Number of Rows in 'vehicleTable' —
— Creating a Hive External Table Mapped to 'rmvTable' in a Non-Secure KVStore —
If you will be running your Hive query against a non-secure KVStore
deployed in the manner described in
Note that it is important to know the specific schema of the Oracle NoSQL Database table
you wish to query before specifying the schema of that table's corresponding Hive
external table. For example, observe that the field named gender in the
— Creating a Hive External Table Mapped to 'rmvTable' in a Secure KVStore —
If you will be running your Hive query against a secure KVStore
deployed in the manner described in
Note the additional properties specified in the
Thus, from the Hive command prompt, type the following queries for each Hive table
and observe that the output of each query you execute, if successful, looks
something like the corresponding sample output shown.
— List the Contents of Each Row in the 'rmvTable' —
— List the Name, Gender, and Address of Each Vehicle Owner in 'rmvTable' —
— List the Name and Phone Number (Home, Cell, or Work) of Each Vehicle Owner in 'rmvTable' —
— Count the Total Number of Rows in 'rmvTable' —
— For Each Owner's Primary Vehicle in 'rmvTable', Find the Minimum Assessed Value —
— List all Vehicle Information for the Primary (or Second or Third) Vehicle of Each Owner in 'rmvTable' —
— List the Name, Address, and Primary Vehicle Information of All Owners in 'rmvTable' Whose Last Name Starts with 'H' —
— List the Name, Address, and Vehicle Information of Each Owner in 'rmvTable' Whose Second Vehicle is a GM Vehicle —
— List the Name, Address, and Vehicle Information of Each Owner in 'rmvTable' Whose Primary Vehicle is a Silverado Truck —
— List the Name, Address, Model, Assessed Value and Registration Fee (Paid or Not) for Each Chrysler Primary Vehicle in 'rmvTable' —
The goal of the Big Data SQL product is to allow users to employ
SQL to manage and manipulate data stored in a number of different locations.
Specifically, Big Data SQL is designed to provide SQL access to
data stored in Hadoop HDFS, various NoSQL databases
— including Oracle NoSQL Database — as well as various
relational databases. Big Data SQL achieves this by presenting Hadoop HDFS,
Oracle NoSQL Database, and other NoSQL data sources as enhanced
Because you may not have access to either form of an Oracle Big Data SQL system, the
TBLPROPERTIES
directive;
that is, oracle.kv.security
, oracle.kv.ssl.trustStore
,
oracle.kv.auth.username
, and oracle.kv.auth.pwdfile.file
(or oracle.kv.auth.wallet.dir
). Each of these additional properties
corresponds to one of the artifacts required to access a secure KVStore; where
the nature of each property and its corresponding value is described in detail in
Appendix C
of the Hadoop/Table API Example.
Execute Example Hive Queries Against the Oracle NoSQL Database 'vehicleTable'
After creating and mapping the Hive tables described above to the Oracle NoSQL Database
'vehicleTable', the data in that table can be queried in the manner shown below by querying
each Hive table. Note that because three Hive tables were created (vehicleTable,
vehicleTablePasswd, and vehicleTableWallet), where the only difference
is whether the KVStore is non-secure or secure and, if secure, whether the password is
stored in a password file or a wallet, for each type of query that is demonstrated, three
instances of the given example query are shown; one for each of the three Hive tables.
Additionally, note that for any given query, the output of the query will be similar whether
it is executed against the Hive vehicleTable, vehicleTablePasswd,
or vehicleTableWallet. Therefore, to save space, for each example
query that is presented for the three Hive tables, sample output is shown only once.
hive> SELECT * FROM vehicleTable;
hive> SELECT * FROM vehicleTablePasswd;
hive> SELECT * FROM vehicleTableWallet;
OK
auto Chrysler Imperial FrontWheelDrive white 20743.943359375 5
auto GM Impala 4WheelDrive black 20743.91015625 46
auto GM Impala FrontWheelDrive yellow 20743.5390625 28
truck Ford F250 AllWheelDrive blue 31115.759765625 47
..........
Note that unlike the more complicated example queries below, the query above
does not result in the execution of a MapReduce job.
This is because there is enough metadata in the Hive metastore to satisfy
the query.
hive> SELECT count(type) FROM vehicleTable;
hive> SELECT count(type) FROM vehicleTablePasswd;
hive> SELECT count(type) FROM vehicleTableWallet;
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 1
2014-12-12 12:04:18,403 Stage-1 map = 0%, reduce = 0%
2014-12-12 12:05:12,431 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 2.26 sec
2014-12-12 12:05:13,816 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 6.7 sec
2014-12-12 12:05:15,201 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 6.87 sec
2014-12-12 12:05:16,594 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.16 sec
2014-12-12 12:05:17,980 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.16 sec
2014-12-12 12:05:19,364 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.16 sec
2014-12-12 12:05:20,754 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.24 sec
2014-12-12 12:05:22,140 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.24 sec
..........
Job 0: Map: 6 Reduce: 1 Cumulative CPU: 15.24 sec HDFS Read: 4532 HDFS Write: 3 SUCCESS
Total MapReduce CPU Time Spent: 15 seconds 240 msec
OK
79
Time taken: 89.359 seconds, Fetched: 1 row(s)
— For All Vehicles in 'vehicleTable' Find the Lowest Price —
hive> SELECT min(price) FROM vehicleTable;
hive> SELECT min(price) FROM vehicleTablePasswd;
hive> SELECT min(price) FROM vehicleTableWallet;
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 1
2014-12-12 12:11:10,924 Stage-1 map = 0%, reduce = 0%
2014-12-12 12:12:06,213 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 6.77 sec
2014-12-12 12:12:07,606 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 6.77 sec
2014-12-12 12:12:09,076 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.16 sec
2014-12-12 12:12:10,464 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.3 sec
2014-12-12 12:12:11,849 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.3 sec
2014-12-12 12:12:13,238 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.3 sec
2014-12-12 12:12:14,629 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.38 sec
2014-12-12 12:12:16,031 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.38 sec
..........
Job 0: Map: 6 Reduce: 1 Cumulative CPU: 15.38 sec HDFS Read: 4532 HDFS Write: 16 SUCCESS
Total MapReduce CPU Time Spent: 15 seconds 380 msec
OK
20743.244140625
Time taken: 89.615 seconds, Fetched: 1 row(s)
— List All GM Vehicles in 'vehicleTable' —
hive> SELECT * FROM vehicleTable WHERE make LIKE "%GM%";
hive> SELECT * FROM vehicleTablePasswd WHERE make LIKE "%GM%";
hive> SELECT * FROM vehicleTableWallet WHERE make LIKE "%GM%";
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2014-12-12 12:19:24,269 Stage-1 map = 0%, reduce = 0%
2014-12-12 12:20:18,239 Stage-1 map = 9%, reduce = 0%, Cumulative CPU 2.43 sec
2014-12-12 12:20:19,622 Stage-1 map = 26%, reduce = 0%, Cumulative CPU 4.81 sec
2014-12-12 12:20:21,006 Stage-1 map = 26%, reduce = 0%, Cumulative CPU 7.14 sec
2014-12-12 12:20:22,395 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 13.09 sec
2014-12-12 12:20:23,777 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 16.06 sec
2014-12-12 12:20:25,162 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 16.06 sec
..........
Job 0: Map: 6 Cumulative CPU: 16.06 sec HDFS Read: 4532 HDFS Write: 1491 SUCCESS
Total MapReduce CPU Time Spent: 16 seconds 60 msec
OK
suv GM Equinox 4WheelDrive yellow 41486.78125 37
truck GM Sierra 4WheelDrive black 31115.224609375 87
auto GM Corvette FrontWheelDrive yellow 20743.84375 7
auto GM Impala 4WheelDrive black 20743.91015625 46
..........
— List All Silverado (1500 & 2500) Trucks in 'vehicleTable' —
hive> SELECT * FROM vehicleTable WHERE model LIKE "%Silverado%";
hive> SELECT * FROM vehicleTablePasswd WHERE model LIKE "%Silverado%";
hive> SELECT * FROM vehicleTableWallet WHERE model LIKE "%Silverado%";
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2014-12-12 13:36:24,284 Stage-1 map = 0%, reduce = 0%
2014-12-12 13:37:19,528 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 9.35 sec
2014-12-12 13:37:20,910 Stage-1 map = 65%, reduce = 0%, Cumulative CPU 11.98 sec
2014-12-12 13:37:22,296 Stage-1 map = 97%, reduce = 0%, Cumulative CPU 15.31 sec
2014-12-12 13:37:23,681 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 15.58 sec
2014-12-12 13:37:25,069 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 15.58 sec
..........
Job 0: Map: 6 Cumulative CPU: 15.58 sec HDFS Read: 4532 HDFS Write: 496 SUCCESS
Total MapReduce CPU Time Spent: 15 seconds 580 msec
OK
truck GM Silverado2500 4WheelDrive blue 31115.548828125 68
truck GM Silverado2500 4WheelDrive-4cylinder blue-on-green 31114.91015625 17
truck GM Silverado2500 AllWheelDrive white 31115.275390625 36
truck GM Silverado2500 AllWheelDrive yellow 31114.796875 73
truck GM Silverado1500 4WheelDrive black 31114.98046875 64
truck GM Silverado1500 4WheelDrive green 31115.078125 38
truck GM Silverado1500 RearWheelDrive blue 31115.640625 55
truck GM Silverado1500 RearWheelDrive white 31115.517578125 37
..........
Time taken: 83.589 seconds, Fetched: 8 row(s)
Example 2: Executing Hive Queries Against Non-Primitive Data Types
This second example demonstrates how to execute various Hive queries against an
Oracle NoSQL Database table defined with a more complex schema than that employed
in Example 1
.
In this example, a schema is employed that consists of a variety of data types;
both primitive and non-primitive. And in the same manner as
Example 1
,
you must take the following initial steps to run the queries presented in this example:
After performing each of these initial steps, you can then create a Hive external table mapped to
the KVStore rmvTable, and execute the set of example Hive queries (described below)
against the data stored in that table.
prerequisites
.
non-secure
or a
secure
KVStore (or both).
Appendix A
of this document to create and populate a table named rmvTable in each store
that you deploy.
Appendix C
of this document to configure Hive with the artifacts and environment necessary to interact
with a secure KVStore.
Create a Hive External Table Mapped to the Oracle NoSQL Database 'rmvTable'
After performing the initial steps listed in the previous section, use the Hive CLI
to create a Hive external table mapped to the Oracle NoSQL Database rmvTable
located in each KVStore that was deployed. That is, type the following at the command line
of the Hive client node:
> hive
which should present the following Hive CLI command prompt:
hive>
At this point, you can execute the Hive commands presented in the next sub-sections
for the non-secure KVStore or the secure KVStore (or both).
Appendix A
of the Hadoop/Table API Example, after adding kvclient.jar
and threetenbp.jar
to
the HIVE_AUX_JARS_PATH
environment variable of the Hive client
(as descibed in
Appendix B
of this document), you can create the external Hive table required by this example by typing the following
at the Hive CLI command prompt:
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
rmvTable (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING,
SSN BIGINT,
GENDER STRING,
LICENSE BINARY,
PHONEINFO MAP<STRING, STRING>,
ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>,
VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
COMMENT 'Hive table rmvTable <---> KVStore table rmvTable'
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "rmvTable",
"oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3");
Noting the same additional points
that were noted for each of the cases presented in Example 1
,
the command above applies the required
Oracle NoSQL Database-to-Hive Data Model Mappings
to create a Hive table named rmvTable with columns whose types are consistent with the corresponding
fields of the Oracle NoSQL Database table — also named rmvTable — that you created
and populated when you followed the steps outlined in
Appendix A
of this document.
schema
of the Oracle NoSQL Database rmvTable presented in Appendix C is defined as type
FieldDef.Type.ENUM
,
whereas the GENDER
field specified in the Hive external table created
above is defined as type STRING
; as specified by the required
data model mapping
.
Thus, the combination of the
Oracle NoSQL Database-to-Hive Data Model Mapping
with the given Oracle NoSQL Database table's specific schema
allows you to determine whether a STRING
field in a Hive table is mapped from an Oracle NoSQL Database
FieldDef.Type.STRING
or
FieldDef.Type.ENUM
.
Appendix B
of the Hadoop/Table API Example, and if your password storage employs a
password file instead of an Oracle Wallet, then you can create the external Hive
table required by this example by typing the following at the Hive CLI command
prompt:
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
rmvTablePasswd (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING,
SSN BIGINT,
GENDER STRING,
LICENSE BINARY,
PHONEINFO MAP<STRING, STRING>,
ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>,
VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
COMMENT 'Hive table rmvTablePasswd <---> KVStore table rmvTable'
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "rmvTable",
"oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3",
"oracle.kv.security" = "/tmp/hive-nosql.login",
"oracle.kv.ssl.trustStore" = "/tmp/client.trust",
"oracle.kv.auth.username" = "example-user",
"oracle.kv.auth.pwdfile.file" = "/tmp/example-user.passwd");
On the other hand, if your password storage employs an Oracle Wallet instead of a
password file, then type the following at the command prompt to create the required
Hive table,
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
rmvTableWallet (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING,
SSN BIGINT,
GENDER STRING,
LICENSE BINARY,
PHONEINFO MAP<STRING, STRING>,
ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>,
VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
COMMENT 'Hive table rmvTableWallet <---> KVStore table rmvTable'
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "rmvTable",
"oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3",
"oracle.kv.security" = "/tmp/hive-nosql.login",
"oracle.kv.ssl.trustStore" = "/tmp/client.trust",
"oracle.kv.auth.username" = "example-user",
"oracle.kv.auth.wallet.dir" = "/tmp/example-user-wallet.dir");
where the commands above each combine the required
data model mapping
with the schema
of the
rmvTable that you created
to produce a Hive external table named rmvTablePasswd and rmvTableWallet
respectively; each with the same structure, schema, and attributes as the Hive rmvTable
produced in the non-secure case. And where the same
additional points
noted for the non-secure case also apply in this case.
TBLPROPERTIES
directive;
that is, oracle.kv.security
, oracle.kv.ssl.trustStore
,
oracle.kv.auth.username
, and oracle.kv.auth.pwdfile.file
(or oracle.kv.auth.wallet.dir
). Each of these additional properties
corresponds to one of the artifacts required to access a secure KVStore; where
the nature of each property and its corresponding value is described in detail in
Appendix C
of the Hadoop/Table API Example.
Execute Example Hive Queries Against the Oracle NoSQL Database 'rmvTable'
After creating and mapping the Hive tables described above to the Oracle NoSQL Database
rmvTable, the data in that table can be queried in the manner shown below by querying
each Hive table. Note that because three Hive tables were created (rmvTable,
rmvTablePasswd, and rmvTableWallet), where the only difference
is whether the KVStore is non-secure or secure and, if secure, whether the password is
stored in a password file or a wallet, for each type of query that is demonstrated, three
instances of the given example query are shown; one for each of the three Hive tables.
Additionally, note that for any given query, the output of the query will be similar whether
it is executed against the Hive rmvTable, rmvTablePasswd,
or rmvTableWallet. Therefore, to save space, for each example
query that is presented for the three Hive tables, sample output is shown only once.
hive> SELECT * FROM rmvTable;
hive> SELECT * FROM rmvTablePasswd;
hive> SELECT * FROM rmvTableWallet;
OK
49027 GOMEZ CHRISTOPHER 509367447 male S57428836
{"cell":"616-351-0185","home":"213-630-2419","work":"617-227-9840"}
{"number":88072,"street":"Fifth Avenue","unit":6,"city":"Cambridge","state":"OK","zip":49027}
[{"type":"auto","make":"Ford","model":"Taurus","class":"AllWheelDrive","color":"blue","value":20743.234,"tax":566.290283203125,"paid":false},
{"type":"auto","make":"Ford","model":"Taurus","class":"FrontWheelDrive","color":"blue","value":20743.559,"tax":566.2991333007812,"paid":true}]
40719 ROSARIO ANNE 448406765 female S04809975
{"cell":"303-804-1660","home":"408-630-2412","work":"415-804-9515"}
{"number":96581,"street":"Third Avenue","unit":7,"city":"Springfield","state":"RI","zip":40719}
[{"type":"truck","make":"Chrysler","model":"Ram3500","class":"RearWheelDrive","color":"blue","value":31115.26,"tax":849.4465942382812,"paid":true},
{"type":"truck","make":"Chrysler","model":"Ram1500","class":"AllWheelDrive","color":"blue","value":31114.873,"tax":849.43603515625,"paid":false},
{"type":"auto","make":"Ford","model":"Edge","class":"RearWheelDrive","color":"yellow","value":20743.889,"tax":566.3081665039062,"paid":true}]
..........
Note that unlike the more complicated example queries below, the query above
does not result in the execution of a MapReduce job.
This is because there is enough metadata in the Hive metastore to satisfy
the query.
hive> SELECT lastname,firstname,gender,address FROM rmvTable;
hive> SELECT lastname,firstname,gender,address FROM rmvTablePasswd;
hive> SELECT lastname,firstname,gender,address FROM rmvTableWallet;
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2015-06-25 05:39:04,940 Stage-1 map = 0%, reduce = 0%
2015-06-25 05:39:43,752 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 8.38 sec
2015-06-25 05:39:45,242 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 21.73 sec
2015-06-25 05:39:46,727 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 26.72 sec
..........
Job 0: Map: 6 Cumulative CPU: 26.72 sec HDFS Read: 4760 HDFS Write: 4702 SUCCESS
Total MapReduce CPU Time Spent: 26 seconds 720 msec
OK
SNIDER FRANK male {"number":33512,"street":"Summer Street","unit":1,"city":"Arlington","state":"TN","zip":89150}
MILLER ROCH male {"number":25698,"street":"Mullberry Street","unit":6,"city":"Madison","state":"VA","zip":5740}
TATE BENJAMIN male {"number":2894,"street":"View Street","unit":-1,"city":"Clinton","state":"KY","zip":57466}
..........
Time taken: 87.327 seconds, Fetched: 79 row(s)
hive> SELECT firstname,lastname,phoneinfo["home"] FROM rmvTable;
hive> SELECT firstname,lastname,phoneinfo["cell"] FROM rmvTablePasswd;
hive> SELECT firstname,lastname,phoneinfo["work"] FROM rmvTableWallet;
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2015-06-25 06:08:09,885 Stage-1 map = 0%, reduce = 0%
2015-06-25 06:08:41,094 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 18.11 sec
2015-06-25 06:08:42,582 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 18.11 sec
..........
Job 0: Map: 6 Cumulative CPU: 18.11 sec HDFS Read: 4724 HDFS Write: 2141 SUCCESS
Total MapReduce CPU Time Spent: 18 seconds 110 msec
OK
CHRISTOPHER GOMEZ 213-630-2419
ANNE ROSARIO 408-630-2412
MEGAN PHELPS 978-541-5710
MICHAEL BRADLEY 313-351-4580
..........
Time taken: 69.173 seconds, Fetched: 79 row(s)
hive> SELECT count(vehicleinfo[0].type) FROM rmvTable;
hive> SELECT count(vehicleinfo[0].type) FROM rmvTablePasswd;
hive> SELECT count(vehicleinfo[0].type) FROM rmvTableWallet;
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 1
2015-06-25 05:53:31,200 Stage-1 map = 0%, reduce = 0%
2015-06-25 05:54:09,856 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 12.12 sec
2015-06-25 05:54:11,346 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 24.45 sec
2015-06-25 05:54:12,835 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 24.45 sec
2015-06-25 05:54:14,336 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 25.51 sec
2015-06-25 05:54:15,825 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 25.51 sec
..........
Job 0: Map: 6 Reduce: 1 Cumulative CPU: 25.51 sec HDFS Read: 4760 HDFS Write: 3 SUCCESS
Total MapReduce CPU Time Spent: 25 seconds 510 msec
OK
79
Time taken: 82.824 seconds, Fetched: 1 row(s)
hive> SELECT min(vehicleinfo[0].value) FROM rmvTable;
hive> SELECT min(vehicleinfo[0].value) FROM rmvTablePasswd;
hive> SELECT min(vehicleinfo[0].value) FROM rmvTableWallet;
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 1
2014-12-12 12:11:10,924 Stage-1 map = 0%, reduce = 0%
2014-12-12 12:12:06,213 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 6.77 sec
2014-12-12 12:12:07,606 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 6.77 sec
2014-12-12 12:12:09,076 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.16 sec
2014-12-12 12:12:10,464 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.3 sec
2014-12-12 12:12:11,849 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.3 sec
2014-12-12 12:12:13,238 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.3 sec
2014-12-12 12:12:14,629 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.38 sec
2014-12-12 12:12:16,031 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.38 sec
..........
Job 0: Map: 6 Reduce: 1 Cumulative CPU: 15.38 sec HDFS Read: 4532 HDFS Write: 16 SUCCESS
Total MapReduce CPU Time Spent: 15 seconds 380 msec
OK
20743.244140625
Time taken: 89.615 seconds, Fetched: 1 row(s)
hive> SELECT vehicleinfo[0] FROM rmvTable;
hive> SELECT vehicleinfo[1] FROM rmvTablePasswd;
hive> SELECT vehicleinfo[2] FROM rmvTableWallet;
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2015-06-25 06:31:54,779 Stage-1 map = 0%, reduce = 0%
2015-06-25 06:32:33,430 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 4.59 sec
2015-06-25 06:32:34,919 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 27.33 sec
2015-06-25 06:32:36,416 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 27.89 sec
2015-06-25 06:32:37,903 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 27.89 sec
..........
Job 0: Map: 6 Cumulative CPU: 27.89 sec HDFS Read: 4760 HDFS Write: 5681 SUCCESS
Total MapReduce CPU Time Spent: 27 seconds 890 msec
OK
{"type":"suv","make":"GM","model":"Tahoe","class":"4WheelDrive","color":"black","value":41487.242,"tax":1132.6016845703125,"paid":true}
{"type":"auto","make":"Chrysler","model":"Imperial","class":"AllWheelDrive","color":"red","value":20743.926,"tax":566.3092041015625,"paid":true}
{"type":"auto","make":"Ford","model":"Taurus","class":"RearWheelDrive","color":"blue","value":20744.076,"tax":566.3132934570312,"paid":true}
{"type":"truck","make":"Ford","model":"F150","class":"AllWheelDrive","color":"green","value":31115.299,"tax":849.4476928710938,"paid":false}
..........
Time taken: 82.056 seconds, Fetched: 79 row(s)
hive> SELECT firstname,lastname,address,vehicleinfo[0] FROM rmvTable WHERE lastname RLIKE "^[H].*";
hive> SELECT firstname,lastname,address,vehicleinfo[0] FROM rmvTablePasswd WHERE lastname RLIKE "^[H].*";
hive> SELECT firstname,lastname,address,vehicleinfo[0] FROM rmvTableWallet WHERE lastname RLIKE "^[H].*";
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2015-06-25 06:41:35,578 Stage-1 map = 0%, reduce = 0%
2015-06-25 06:42:14,310 Stage-1 map = 15%, reduce = 0%
2015-06-25 06:42:15,803 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 9.46 sec
2015-06-25 06:42:17,290 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 23.29 sec
2015-06-25 06:42:18,778 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 27.78 sec
2015-06-25 06:42:20,271 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 27.78 sec
..........
Job 0: Map: 6 Cumulative CPU: 27.78 sec HDFS Read: 4760 HDFS Write: 1143 SUCCESS
Total MapReduce CPU Time Spent: 27 seconds 780 msec
OK
CINDY HODGES {"number":56758,"street":"Vaughan Avenue","unit":-1,"city":"Madison","state":"NH","zip":79623}
{"type":"truck","make":"Chrysler","model":"Ram1500","class":"RearWheelDrive","color":"black","value":31115.129,"tax":849.4430541992188,"paid":true}
JULIA HOLDEN {"number":56209,"street":"Main Street","unit":1,"city":"Georgetown","state":"CA","zip":62154}
{"type":"auto","make":"Ford","model":"Taurus","class":"FrontWheelDrive","color":"blue","value":20743.8,"tax":566.3057861328125,"paid":true}
PHYLLIS HOGAN {"number":47358,"street":"Park Street","unit":6,"city":"Arlington","state":"AL","zip":77333}
{"type":"suv","make":"Chrysler","model":"Journey","class":"RearWheelDrive","color":"yellow","value":41486.79,"tax":1132.58935546875,"paid":false}
..........
Time taken: 81.171 seconds, Fetched: 9 row(s)
hive> SELECT firstname,lastname,address,vehicleinfo[1] FROM rmvTable WHERE vehicleinfo[1].make LIKE "%GM%";
hive> SELECT firstname,lastname,address,vehicleinfo[1] FROM rmvTablePasswd WHERE vehicleinfo[1].make LIKE "%GM%";
hive> SELECT firstname,lastname,address,vehicleinfo[1] FROM rmvTableWallet WHERE vehicleinfo[1].make LIKE "%GM%";
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2015-06-25 06:57:01,117 Stage-1 map = 0%, reduce = 0%
2015-06-25 06:57:32,506 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 9.29 sec
2015-06-25 06:57:34,000 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 18.8 sec
2015-06-25 06:57:35,484 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 18.8 sec
..........
Job 0: Map: 6 Cumulative CPU: 18.8 sec HDFS Read: 4724 HDFS Write: 2087 SUCCESS
Total MapReduce CPU Time Spent: 18 seconds 800 msec
OK
NANCY STOUT {"number":31126,"street":"Cedar Street","unit":8,"city":"Arlington","state":"MO","zip":731}
{"type":"suv","make":"GM","model":"Equinox","class":"AllWheelDrive","color":"red","value":41486.43,"tax":1132.57958984375,"paid":true}
DONALD COOK {"number":34287,"street":"Third Avenue","unit":1,"city":"Cambridge","state":"NV","zip":67995}
{"type":"suv","make":"GM","model":"Tahoe","class":"AllWheelDrive","color":"black","value":41487.168,"tax":1132.5997314453125,"paid":false}
RANDY MCDOWELL {"number":18391,"street":"Lane Avenue","unit":8,"city":"Concord","state":"NH","zip":42540}
{"type":"auto","make":"GM","model":"Corvette","class":"FrontWheelDrive","color":"black","value":20744.035,"tax":566.3121337890625,"paid":false}
..........
Time taken: 71.794 seconds, Fetched: 17 row(s)
hive> SELECT firstname,lastname,address,vehicleinfo[0] FROM rmvTable WHERE vehicleinfo[0].model LIKE "%Silverado%";
hive> SELECT firstname,lastname,address,vehicleinfo[0] FROM rmvTablePasswd WHERE vehicleinfo[0].model LIKE "%Silverado%";
hive> SELECT firstname,lastname,address,vehicleinfo[0] FROM rmvTableWallet WHERE vehicleinfo[0].model LIKE "%Silverado%";
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2015-06-25 07:05:40,301 Stage-1 map = 0%, reduce = 0%
2015-06-25 07:06:18,907 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 9.6 sec
2015-06-25 07:06:20,406 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 27.65 sec
2015-06-25 07:06:21,894 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 27.65 sec
..........
Job 0: Map: 6 Cumulative CPU: 27.65 sec HDFS Read: 4760 HDFS Write: 1185 SUCCESS
Total MapReduce CPU Time Spent: 27 seconds 650 msec
OK
PHYLLIS MARTINEZ {"number":29978,"street":"Park Street","unit":7,"city":"Burlington","state":"NM","zip":28564}
{"type":"truck","make":"GM","model":"Silverado2500","class":"4WheelDrive","color":"black","value":31115.295,"tax":849.4475708007812,"paid":false}
SHANNON KELLEY {"number":98370,"street":"Maple Street","unit":9,"city":"Boston","state":"TX","zip":16166}
{"type":"truck","make":"GM","model":"Silverado2500","class":"FrontWheelDrive","color":"yellow","value":31115.209,"tax":849.4451904296875,"paid":true}
FRANCES HOLLOWAY {"number":2650,"street":"Second Avenue","unit":6,"city":"Arlington","state":"KY","zip":7182}
{"type":"truck","make":"GM","model":"Silverado2500","class":"FrontWheelDrive","color":"yellow","value":31115.31,"tax":849.447998046875,"paid":true}
..........
Time taken: 79.539 seconds, Fetched: 9 row(s)
hive> SELECT firstname,lastname,address,vehicleinfo[0].model,vehicleinfo[0].value,vehicleinfo[0].tax,vehicleinfo[0].paid FROM rmvTable WHERE vehicleinfo[0].make LIKE "%Chrysler%";
hive> SELECT firstname,lastname,address,vehicleinfo[0].model,vehicleinfo[0].value,vehicleinfo[0].tax,vehicleinfo[0].paid FROM rmvTablePasswd WHERE vehicleinfo[0].make LIKE "%Chrysler%";
hive> SELECT firstname,lastname,address,vehicleinfo[0].model,vehicleinfo[0].value,vehicleinfo[0].tax,vehicleinfo[0].paid FROM rmvTableWallet WHERE vehicleinfo[0].make LIKE "%Chrysler%";
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2015-06-25 07:19:55,601 Stage-1 map = 0%, reduce = 0%
2015-06-25 07:20:26,973 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 18.15 sec
2015-06-25 07:20:28,454 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 18.15 sec
..........
Job 0: Map: 6 Cumulative CPU: 18.15 sec HDFS Read: 4724 HDFS Write: 2164 SUCCESS
Total MapReduce CPU Time Spent: 18 seconds 150 msec
OK
ANNE ROSARIO {"number":96581,"street":"Third Avenue","unit":7,"city":"Springfield","state":"RI","zip":40719}
Ram3500 31115.26 849.4465942382812 true
MEGAN PHELPS {"number":12713,"street":"MAC Avenue","unit":4,"city":"Salem","state":"MS","zip":76554}
Ram1500 31115.309 849.4479370117188 true
BRIAN ROWLAND {"number":37868,"street":"First Street","unit":3,"city":"Salem","state":"GA","zip":98106}
Imperial 20744.156 566.3154907226562 true
..........
Time taken: 72.163 seconds, Fetched: 23 row(s)
Example 3: Executing Big Data SQL Queries
Oracle Big Data SQL
(for the purposes of this document, referred to as Big Data SQL)
began as a system consisting of proprietary hardware and software; including an
Oracle Big Data Appliance
,
Oracle Exadata Database Machine
,
custom software extended to support Big Data SQL functionality, and of course,
Oracle NoSQL Database
itself. In addition to providing Big Data SQL in the form of one of
these so-called "fully engineered hardware systems", Oracle now also provides
Big Data SQL functionality in the form of a software solution that
can be installed on other Hadoop systems; with no requirement for an
Oracle Exadata Database Machine
.
Oracle External Tables
(external tables) of the Oracle Relational Database Management System
(RDBMS); mapping the external semantics of accessing data from those
sources — horizontal parallelism, location, and schema — to
Oracle RDBMS internals.
Oracle Big Data Lite Virtual Machine
(referred to in this document as the Big Data Lite VM), is provided to
allow you to evaluate and experiment with Big Data SQL without requiring
you to purchase, install, and manage the real system. Although detailed documentation
is provided when you purchase a Big Data SQL system, as a convenience, this
section walks you through an example that employs the Big Data Lite VM to
demonstrate how to use Big Data SQL to query data stored in a KVStore
— both secure and non-secure. After working with this example on the
Big Data Lite VM, you should be able to extend the concepts that are
demonstrated, and apply that knowledge should you ever have access to a real
Big Data SQL system.
A Brief Big Data SQL Primer
As stated above, Big Data SQL allows SQL access to various external data
sources — specifically, a KVStore — by presenting such data sources as
Oracle External Tables
.
To achieve this, a mechanism referred to as an access driver is employed
to access data as if it were a table in the Oracle relational database provided in
the system. Big Data SQL extends the access driver mechanism of external
tables by specifying new access driver types; one for each type of external data
source that will be accessed. Prior to the introduction of Big Data SQL,
the Oracle RDBMS external tables mechanism defined only two access driver types:
With the introduction of Big Data SQL, the following new access driver
types are defined:
Both the ORACLE_HDFS
and ORACLE_HIVE
access drivers
require the specification of a number of classes that satisfy the Hadoop MapReduce
programming model. Some of those classes are required by both access driver types,
whereas some are required by only the ORACLE_HIVE
access driver. The
class types required by both ORACLE_HDFS
and ORACLE_HIVE
are the same as those described in the
Hadoop/Table API Example
;
whereas the class types required by ORACLE_HIVE
but not by
ORACLE_HDFS
are:
Note that the org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
; for example, the Oracle NoSQL Database
TableStorageHandler
.
org.apache.hadoop.hive.serde2.SerDe
; for example, the Oracle NoSQL Database
TableSerDe
.
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector
;
for example, instances of the Oracle NoSQL Database classes
described previously
in this document.
ORACLE_HDFS
access driver can only read data stored in
HDFS files, whereas the ORACLE_HIVE
access driver can read data stored
not only in HDFS files, but in other locations as well; for example, a KVStore.
As you will see below, the integration of Oracle NoSQL Database with Hive
— as described above
—
plays a prominent role in the integration of Oracle NoSQL Database with
Big Data SQL.
Mapping the Oracle RDBMS Data Model to the Oracle NoSQL Database Table API Data Model
As the examples in this section demonstrate, in order to execute a Big Data SQL query
against data stored in an Oracle NoSQL Database table, a Hive external table
must first be created with a schema mapped from the schema of the desired Oracle NoSQL Database
table, and then a corresponding Oracle RDBMS external table must be created with a schema
mapped from the schema of the Hive table. This is accomplished by applying the mappings
shown in the following table:
Oracle NoSQL Database Table API | Hive | Oracle RDBMS |
---|---|---|
FieldDef.Type.STRING | STRING | VARCHAR2(N) |
VARCHAR | ||
CHAR | ||
FieldDef.Type.BOOLEAN | BOOLEAN | VARCHAR2(5) |
FieldDef.Type.BINARY | BINARY | VARCHAR2(N) |
FieldDef.Type.FIXED_BINARY | BINARY | VARCHAR2(N) |
TINYINT | ||
SMALLINT | ||
FieldDef.Type.INTEGER | INT | NUMBER |
FieldDef.Type.LONG | BIGINT | NUMBER |
FieldDef.Type.FLOAT | FLOAT | NUMBER |
DECIMAL | ||
FieldDef.Type.DOUBLE | DOUBLE | NUMBER |
FieldDef.Type.ENUM | STRING | VARCHAR2(N) |
TIMESTAMP | ||
DATE | ||
FieldDef.Type.ARRAY | ARRAY | VARCHAR2(N) |
FieldDef.Type.MAP | MAP<STRING, data_type> | VARCHAR2(N) |
FieldDef.Type.RECORD | STRUCT<col_name : data_type, ...> | VARCHAR2(N) |
UNIONTYPE<data_type, data_type, ...> |
It is important to understand that when using Big Data SQL to query data in an Oracle NoSQL Database table, the schema of the Oracle external table you create is dependent on the schema of the corresponding Hive external table; which, in turn, is dependent on the schema of the Oracle NoSQL Database table you wish to query. Thus, if either type of external table is created using a schema that includes a data type that does not belong to one of the end-to-end mappings presented above, then an error will occur when any attempt is made to query the table.
Note that for fields in the Oracle external table specified as VARCHAR2(N)
,
the value of N
is the maximum number of characters of the variable length
STRING
that represents the specified field in the corresponding Hive and
Oracle NoSQL Database tables. Therefore, you should use the type, structure, and expected
length or size of the corresponding Hive and Oracle NoSQL Database fields to determine the
appropriate value to specify for N
when creating the Oracle external table.
Installing the Big Data Lite VM (the BDL)
Before proceeding with this example you should obtain, install, and deploy
the Big Data Lite VM as described
here
. Note that the Big Data Lite VM must be deployed
to a virtualization product such as
Oracle VirtualBox
.
After installing the necessary environment and deploying the Big Data Lite VM,
you should verify that the necessary services are running in that VM and start those
that are not running. You can do this by executing the services
utility
from the command line; which will display an interactive user interface with information
like that shown below. That is,
> services
[*] | ORCL | Oracle Database 12c (on) |
[*] | Zookeeper | zookeeper (on) |
[*] | HDFS | namenode (on) datanode (on) ... |
[ ] | HBase | master (off) regionserver (off) thrift (off) |
[*] | Hive | metastore (on) hive-server2 (on) |
[] | Hue | Hue (off) |
[] | Impala | impala-server (off) |
[] | NoSQL | Oracle NoSQL Database (off) |
[] | Oozie | oozie (of) |
[] | ORDS-Apex | Oracle Rest Data Service-Apex (off) |
[] | Solar | solar-server (off) |
[] | Sqoop2 | sqoop2-server (off) |
[] | WebLogic-MovieDemo | WebLogic-MovieDemo (off) |
[*] | YARN | resourcemanager (on) nodemanager (on) ... |
The services selected (marked with asterisks) are the services that are necessary
when walking through the examples presented in the following sections. In an effort
to improve boot performance, the latest version of the Big Data Lite VM will typically
come out of the box running only the Zookeeper
, HDFS
,
Hive
, and YARN
services. As a result, to run the
examples presented here, you will need to select the ORCL
service
for auto-start. Additionally, because those examples are designed to work with an
externally deployed KVStore, there is no need to select the Oracle NoSQL Dabase
service for auto-start.
The examples presented here assume that you have satisfied the
prerequisites
listed
previously in this document; which means that a KVStore
— non-secure
,
secure
, or both —
has been deployed to separate, external nodes unrelated to the Big Data Lite VM;
where the nodes are network reachable from that VM. Although it is possible to select the
Oracle NoSQL Database
service and deploy a KVStore
(with KVHOME=/u01/nosql/kv-ee
) on the Big Data Lite VM, it is
important to understand that typical deployments of the
Oracle Big Data Appliance
do not run the Hadoop services (HDFS
, YARN
, and Hive
)
on the same nodes used to run the Oracle NoSQL Database
service. As a result,
the examples presented here are run in a manner intended to be consistent with the more
typical deployment scenarios.
After satisfying the necessary prerequisites
,
creating and populating a
vehicleTable
and rmvTable
in a KVStore,
installing and deploying the Big Data Lite VM and verifying
that the necessary services are running on that VM, you are then ready to walk through the
Big Data SQL examples presented in the following sections.
Executing a Big Data SQL Query Against a KVStore Table
Once the KVStore
—
non-secure
or
secure
—
is deployed with the expected tables, you can follow the steps presented in the next section to
configure the Big Data Lite VM environment for executing either Hive or
Big Data SQL queries against the data in those tables. But before proceeding
to that section, you should first become familiar with the directory structure and base
configuration of the Big Data SQL system; specifically, consider the following
directories and files from the 4.5.0
release of the Big Data Lite VM:
/etc/hive/conf.bigdatalite hive-env.sh hive-site.xml /opt/hive-aux-jars apache-xmlbeans.jar -> /u01/bigdatasql_config/hive_aux_jars/apache-xmlbeans.jar hive-hcatalog-core.jar -> /u01/bigdatasql_config/hive_aux_jars/hive-hcatalog-core.jar kvclient.jar -> /u01/nosql-ee/lib/kvclient.jar threetenbp.jar -> /u01/nosql-ee/lib/threetenbp.jar ojdbc7.jar -> /u01/orahivedp/jlib/ojdbc7.jar .... xqjapi.jar -> /u01/bigdatasql_config/hive_aux_jars/xqjapi.jar /u01 /bigdatasql_config bigdata.properties bigdata-log4j.properties /bigdatalite core-site.xml hdfs-site.xml hive-env.sh hive-site.xml mapred-site.xml /hive_aux_jars apache-xmlbeans.jar -> /u01/connectors/oxh/hive/lib/apache-xmlbeans.jar hive-hcatalog-core.jar -> /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar orai18n-collation.jar -> /u01/connectors/oxh/hive/lib/orai18n-collation.jar .... xqjapi.jar -> /u01/connectors/oxh/hive/lib/xqjapi.jar /log .... /kv-ee /doc /lib LICENSE.txt README.txt /nosql kv-ee -> /u01/kv-ee /scripts ....Note that as of version
4.3.0
of Oracle NoSQL Database, the
threetenbp.jar
third party library must belong to the set of
Hive auxilary jar files (referenced in the directory /opt/hive-aux-jars
).
As a result, if your version of the Big Data SQL system does not include the link to
/u01/nosql-ee/lib/threetenbp.jar
in /opt/hive-aux-jars
shown
above, then you should create such a link; for example,
> cd /opt/hive-aux-jars > sudo ln -s /u01/nosql-ee/lib/threetenbp.jar threetenbp.jarYou should also note that, as described in the
Big Data Lite VM documentation
, the username oracle with password
welcome1 must be used when logging into the Big Data Lite VM.
Configuring the Big Data Lite VM for Hive and Big Data SQL Queries
If you wish to run queries — Hive queries or Big Data SQL queries — against a
secure
KVStore, then depending on the particular version of the Big Data Lite VM (or the
Big Data SQL system) you are using, you may need to add one or more security related
JAR files to both the classpath used by Hive and the classpath used by Big Data SQL.
Note that when using future versions of Big Data SQL, the installation of the Oracle Wallet JAR files described above should not be necessary; as it is expected that those JAR files will be pre-installed, and simply work out-of-the-box.
Configuring the Big Data Lite VM for Queries Against a Secure Store
Although the installation of the Oracle Wallet JAR files described in the previous
sections is a necessary condition for using an Oracle Wallet when running Hive and/or
Big Data SQL queries against a secure KVStore, it is not a sufficient condition.
In addition to those Oracle-provided JAR files, there are a number of other security
artifacts that also must be installed and added to the Hive and Big Data SQL classpaths.
Unlike the Oracle Wallet JAR files, which are installed only once and will (in future releases)
always be available, these additional artifacts are typically generated and installed
on a case-by-case basis, depending on the configuration of the given KVStore.
Appendix C
describes the additional security artifacts that must be generated and installed
to support executing queries (both Hive and Big Data SQL) against table data stored
in a secure KVStore; including the
login, trust, and password artifacts
as well as the
server side JAR file
that contains the necessary public credentials. Each of those artifacts should be placed
in a user-specific directory; for example, place them in a directory like the following:
/u01/nosql/example-user/security
client.trust
hive-nosql.login
hive-nosql-server.jar
example-user.passwd
/example-user-wallet.dir
cwallet.sso
In addition to installing the above artifacts under /u01/nosql/example-user/securtiy
,
you must also add the
server side JAR file
to both the classpath employed by Hive, and the classpath employed by Big Data SQL. This is
necessary so that the table data associated with the username and password specified
in those credentials can be successfully queried.
— Configuring the BDL's Hive Classpath for Queries Against a Secure Store —
If you are using Hive (rather than Big Data SQL) to query a secure KVStore, then the
classpath employed by Hive must include the
server side JAR file
.
This is accomplished by adding that JAR file to the /opt/hive-aux-jars
directory; that is,
> cd /opt/hive-aux-jars
> sudo ln -s /u01/nosql/example-user/security/hive-nosql-server.jar hive-nosql-server.jar
Thus, like the Oracle Wallet JAR files, as of version 4.5.0
of the
Big Data Lite VM, it is no longer necessary to add the
server side JAR file
to the HIVE_AUX_JARS_PATH
in /etc/hive/conf.bigdatalite/hive-env.sh
.
— Configuring the BDL's Big Data SQL Classpath for Queries Against a Secure Store —
If you are using Big Data SQL rather than Hive to query a secure KVStore, then the
server side JAR file
must be added to the Big Data SQL classpath. This is accomplished by editing the file
/u01/bigdatasql_config/bigdata.properties
, and adding that JAR file to
the property named, java.classpath.oracle
; that is,
> edit /u01/bigdatasql_config/bigdata.properties
java.classpath.oracle=/u01/app/oracle/product/12.1.0.2/dbhome_1/jlib/oracle-hadoop-sql.jar:\
/u01/app/oracle/product/12.1.0.2/dbhome_1/jlib/oraloader.jar:\
....
/u01/bigdatasql_config/hive_aux_jars/kvclient.jar:\
/u01/bigdatasql_config/hive_aux_jars/threetenbp.jar:
/u01/bigdatasql_config/hive_aux_jars/oraclepki.jar:\
/u01/bigdatasql_config/hive_aux_jars/osdt_cert.jar:\
/u01/bigdatasql_config/hive_aux_jars/osdt_core.jar:\
/u01/nosql/example-user/security/hive-nosql-server.jar
At this point, after configuring the Big Data Lite VM in the manner
described above, you can follow the steps presented in the next sections to execute
Hive and/or Big Data SQL queries against the table data in the desired KVStore; either
non-secure
or
secure
.
Using Hive and Big Data SQL to Query Data in a KVStore
In order to query data in a KVStore from a Big Data SQL system (for example,
a Big Data Lite VM), you must first create and map a Hive external table to
the table defined in the store; whether the table is located in a
non-secure store
or a secure store
.
That is, given the configuration described in the previous section, login to the
Big Data Lite VM and if the Oracle NoSQL Database vehicleTable
and rmvTable are located in a
non-secure store
,
then enter the Hive CLI and type the following commands:
> hive hive> CREATE EXTERNAL TABLE IF NOT EXISTS vehicleTable (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT) STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' TBLPROPERTIES ("oracle.kv.kvstore" = "example-store", "oracle.kv.hosts" = "kv-host-1:5000", "oracle.kv.tableName" = "vehicleTable", "oracle.kv.hadoop.hosts" = "bigdatalite.localdomain"); hive> CREATE EXTERNAL TABLE IF NOT EXISTS rmvTable (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING, SSN BIGINT, GENDER STRING, LICENSE BINARY, PHONEINFO MAP<STRING, STRING>, ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>, VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>) STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' TBLPROPERTIES ("oracle.kv.kvstore" = "example-store", "oracle.kv.hosts" = "kv-host-1:5000", "oracle.kv.tableName" = "rmvTable", "oracle.kv.hadoop.hosts" = "bigdatalite.localdomain");On the other hand, if the store in which vehicleTable and rmvTable are located is a
secure store
,
then enter the Hive CLI and type commands like these:
> hive hive> CREATE EXTERNAL TABLE IF NOT EXISTS vehicleTablePasswd (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT) STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' TBLPROPERTIES ("oracle.kv.kvstore" = "example-store", "oracle.kv.hosts" = "kv-host-1:5000", "oracle.kv.tableName" = "vehicleTable", "oracle.kv.hadoop.hosts" = "bigdatalite.localdomain", "oracle.kv.security" = "/u01/nosql/example-user/security/hive-nosql.login", "oracle.kv.ssl.trustStore" = "/u01/nosql/example-user/security/client.trust", "oracle.kv.auth.username" = "example-user", "oracle.kv.auth.pwdfile.file" = "/u01/nosql/example-user/security/example-user.passwd"); hive> CREATE EXTERNAL TABLE IF NOT EXISTS vehicleTableWallet (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT) STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' TBLPROPERTIES ("oracle.kv.kvstore" = "example-store", "oracle.kv.hosts" = "kv-host-1:5000", "oracle.kv.tableName" = "vehicleTable", "oracle.kv.hadoop.hosts" = "bigdatalite.localdomain", "oracle.kv.security" = "/u01/nosql/example-user/security/hive-nosql.login", "oracle.kv.ssl.trustStore" = "/u01/nosql/example-user/security/client.trust", "oracle.kv.auth.username" = "example-user", "oracle.kv.auth.wallet.dir" = "/u01/nosql/example-user/security/example-user-wallet.dir"); hive> CREATE EXTERNAL TABLE IF NOT EXISTS rmvTablePasswd (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING, SSN BIGINT, GENDER STRING, LICENSE BINARY, PHONEINFO MAP<STRING, STRING>, ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>, VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>) STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' TBLPROPERTIES ("oracle.kv.kvstore" = "example-store", "oracle.kv.hosts" = "kv-host-1:5000", "oracle.kv.tableName" = "rmvTable", "oracle.kv.hadoop.hosts" = "bigdatalite.localdomain", "oracle.kv.security" = "/u01/nosql/example-user/security/hive-nosql.login", "oracle.kv.ssl.trustStore" = "/u01/nosql/example-user/security/client.trust", "oracle.kv.auth.username" = "example-user", "oracle.kv.auth.pwdfile.file" = "/u01/nosql/example-user/security/example-user.passwd"); hive> CREATE EXTERNAL TABLE IF NOT EXISTS rmvTableWallet (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING, SSN BIGINT, GENDER STRING, LICENSE BINARY, PHONEINFO MAP<STRING, STRING>, ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>, VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>) STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' TBLPROPERTIES ("oracle.kv.kvstore" = "example-store", "oracle.kv.hosts" = "kv-host-1:5000", "oracle.kv.tableName" = "rmvTable", "oracle.kv.hadoop.hosts" = "bigdatalite.localdomain", "oracle.kv.security" = "/u01/nosql/example-user/security/hive-nosql.login", "oracle.kv.ssl.trustStore" = "/u01/nosql/example-user/security/client.trust", "oracle.kv.auth.username" = "example-user", "oracle.kv.auth.wallet.dir" = "/u01/nosql/example-user/security/example-user-wallet.dir");At this point, you can execute Hive queries against the data in both the
vehicleTable
and the rmvTable
.
Note that although the tables in the KVStore are named vehicleTable and rmvTable,
the Hive tables that are created above are named, respectively, vehicleTable and rmvTable
if the KVStore is not configured for security, vehicleTablePasswd and rmvTablePasswd if the
KVStore is secure and you store the password in a password file, and finally, vehicleTableWallet
and rmvTableWallet if the KVStore is secure and you store the password in an Oracle Wallet.
Also note that the value of the oracle.kv.hadoop.hosts
property is
bigdatalite.localdomain
. This is the hostname of the
Big Data Lite VM; thus, it is the name of the host on which the Hadoop
DataNode(s) execute.
After you create and map the external Hive tables to their corresponding KVStore
tables, to then use Big Data SQL to query the data in those KVStore tables,
you must apply the data model mapping
presented previously
,
along with the schemas for the
vehicleTable
and rmvTable
, to
create and map the corresponding Oracle RDBMS external tables to each of the Hive tables that you created.
But before doing this, you must first create a user in the RDBMS, with the appropriate
privileges; as explained in the next section.
After creating the Oracle external tables above, you can execute SQL queries against the corresponding KVStore table data. For example, to query the data contained in the KVStore's vehicleTable, you can execute queries such as:
SQL> set linesize 200; SQL> SELECT * FROM vehicleTable; SQL> SELECT count(*) FROM vehicleTable; SQL> SELECT min(price) FROM vehicleTable; SQL> SELECT * FROM vehicleTable WHERE make='GM'; SQL> SELECT * FROM vehicleTable WHERE model='Equinox'; SQL> SELECT * FROM vehicleTable WHERE model='Camaro'; SQL> SELECT * FROM vehicleTable WHERE model='Silverado1500'; SQL> SELECT * FROM vehicleTablePasswd; SQL> SELECT count(*) FROM vehicleTablePasswd; SQL> SELECT min(price) FROM vehicleTablePasswd; SQL> SELECT * FROM vehicleTablePasswd WHERE make='GM'; SQL> SELECT * FROM vehicleTablePasswd WHERE model='Equinox'; SQL> SELECT * FROM vehicleTablePasswd WHERE model='Camaro'; SQL> SELECT * FROM vehicleTablePasswd WHERE model='Silverado1500'; SQL> SELECT * FROM vehicleTableWallet; SQL> SELECT min(price) FROM vehicleTableWallet; SQL> SELECT * FROM vehicleTableWallet WHERE make='GM'; SQL> SELECT * FROM vehicleTableWallet WHERE model='Equinox'; SQL> SELECT * FROM vehicleTableWallet WHERE model='Camaro'; SQL> SELECT * FROM vehicleTableWallet WHERE model='Silverado1500';Similarly, to query the data contained in the KVStore's rmvTable, you can execute simple queries such as these:
SQL> set linesize 200; SQL> SELECT * FROM rmvTable; SQL> SELECT * FROM rmvTablePasswd; SQL> SELECT * FROM rmvTableWallet; SQL> SELECT lastname,firstname,gender,address FROM rmvTable; SQL> SELECT lastname,firstname,gender,address FROM rmvTablePasswd; SQL> SELECT lastname,firstname,gender,address FROM rmvTableWallet; SQL> SELECT min(ssn) FROM rmvTable; SQL> SELECT min(ssn) FROM rmvTablePasswd; SQL> SELECT min(ssn) FROM rmvTableWallet; SQL> SELECT count(*) FROM rmvTable; SQL> SELECT count(*) FROM rmvTablePasswd; SQL> SELECT count(*) FROM rmvTableWallet; SQL> SELECT firstname,lastname,phoneinfo FROM rmvTable; SQL> SELECT firstname,lastname,phoneinfo FROM rmvTablePasswd; SQL> SELECT firstname,lastname,phoneinfo FROM rmvTableWallet; SQL> SELECT vehicleinfo FROM rmvTable; SQL> SELECT vehicleinfo FROM rmvTablePasswd; SQL> SELECT vehicleinfo FROM rmvTableWallet;To achieve the more complicated query functionality demonstrated by the correspondng Hive queries that dereferenced individual field values of the non-primitive data types, you can employ either
Oracle Regular Expression Functions
such as REGEXP_LIKE
and REGEXP_SUBSTR
, or
Oracle SQL JSON Operators
such as JSON_QUERY
and JSON_EXISTS
(or a combination).
For example, consider the following queries that use
Oracle Regular Expressions
:
SQL> SELECT firstname,lastname,address,vehicleinfo FROM rmvTable WHERE REGEXP_LIKE (lastname, '^[H].*'); SQL> SELECT firstname,lastname,address,vehicleinfo FROM rmvTablePasswd WHERE REGEXP_LIKE (lastname, '^[H].*'); SQL> SELECT firstname,lastname,address,vehicleinfo FROM rmvTableWallet WHERE REGEXP_LIKE (lastname, '^[H].*'); SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary make: GM" FROM rmvTable WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"make":"GM"'); SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary make: GM" FROM rmvTablePasswd WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"make":"GM"'); SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary make: GM" FROM rmvTableWallet WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"make":"GM"'); SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Equinox" FROM rmvTable WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Equinox"'); SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Equinox" FROM rmvTablePasswd WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Equinox"'); SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Equinox" FROM rmvTableWallet WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Equinox"'); SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Camaro" FROM rmvTable WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Camaro"'); SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Camaro" FROM rmvTablePasswd WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Camaro"'); SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Camaro" FROM rmvTableWallet WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Camaro"'); SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Silverado" FROM rmvTable WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Silverado'); SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Silverado" FROM rmvTablePasswd WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Silverado'); SQL> SELECT firstname,lastname,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Silverado" FROM rmvTableWallet WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Silverado'); SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary fee NOT paid" FROM rmvTable WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"paid":false'); SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary fee NOT paid" FROM rmvTablePasswd WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"paid":false'); SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary fee NOT paid" FROM rmvTableWallet WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"paid":false');Finally, consider the following example queries that employ
Oracle SQL JSON Operators
:
SQL> SELECT FIRSTNAME, LASTNAME, j.ADDRESS.street, j.ADDRESS.city, j.ADDRESS.state, j.VEHICLEINFO.model FROM rmvTable j; SQL> SELECT FIRSTNAME, LASTNAME, j.ADDRESS.street, j.ADDRESS.city, j.ADDRESS.state, j.VEHICLEINFO.model FROM rmvTablePasswd j; SQL> SELECT FIRSTNAME, LASTNAME, j.ADDRESS.street, j.ADDRESS.city, j.ADDRESS.state, j.VEHICLEINFO.model FROM rmvTableWallet j; SQL> SELECT JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) FROM rmvTable; SQL> SELECT JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) FROM rmvTablePasswd; SQL> SELECT JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) FROM rmvTableWallet; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: GM" FROM rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%'; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: GM" FROM rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%'; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: GM" FROM rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%'; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Equinox" FROM rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Equinox%'; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Equinox" FROM rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Equinox%'; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Equinox" FROM rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Equinox%'; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Camaro" FROM rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Camaro%'; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Camaro" FROM rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Camaro%'; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Camaro" FROM rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Camaro%'; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Silverado" FROM rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Silverado%'; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Silverado" FROM rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Silverado%'; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Silverado" FROM rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Silverado%'; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY( VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Model", JSON_QUERY(VEHICLEINFO, '$[0].value' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Value", JSON_QUERY(VEHICLEINFO, '$[0].tax' WITH CONDITIONAL WRAPPER) AS "Tax Owed", JSON_QUERY(VEHICLEINFO, '$[0].paid' WITH CONDITIONAL WRAPPER) AS "Tax Paid" FROM rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%'; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY( VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Model", JSON_QUERY(VEHICLEINFO, '$[0].value' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Value", JSON_QUERY(VEHICLEINFO, '$[0].tax' WITH CONDITIONAL WRAPPER) AS "Tax Owed", JSON_QUERY(VEHICLEINFO, '$[0].paid' WITH CONDITIONAL WRAPPER) AS "Tax Paid" FROM rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%'; SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY( VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Model", JSON_QUERY(VEHICLEINFO, '$[0].value' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Value", JSON_QUERY(VEHICLEINFO, '$[0].tax' WITH CONDITIONAL WRAPPER) AS "Tax Owed", JSON_QUERY(VEHICLEINFO, '$[0].paid' WITH CONDITIONAL WRAPPER) AS "Tax Paid" FROM rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%'; SQL> SELECT JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street" FROM rmvTable WHERE JSON_EXISTS(ADDRESS, '$.street'); SQL> SELECT JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street" FROM rmvTablePasswd WHERE JSON_EXISTS(ADDRESS, '$.street'); SQL> SELECT JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street" FROM rmvTableWallet WHERE JSON_EXISTS(ADDRESS, '$.street'); SQL> SELECT JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street" FROM rmvTable WHERE JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) LIKE '%High Street%'; SQL> SELECT JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street" FROM rmvTablePasswd WHERE JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) LIKE '%High Street%'; SQL> SELECT JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street" FROM rmvTableWallet WHERE JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) LIKE '%High Street%';
Appendix A
of the Hadoop/Table API Example, or start a KVStore configured for security
using the steps presented in
Appendix B
of that document. Once a KVStore has been deployed, the standalone Java program
LoadRmvTable
can then be run against either type of store to create
a table with the name and schema required by the various example queries, and
then populate the table with rows of data consistent with that schema. Once
the table is created and populated with example data, queries such as those
presented in Example 1
,
Example 2
, or the
section on Big Data SQL
can be executed against the data in that table.
Schema for the Example 'rmvTable'
To execute any of the example queries presented in
Example 2
,
of this document a table named rmvTable — having the schema
shown in the table below — must be created in the KVStore that was
deployed for that example; where the data types specified in the schema are
defined by the Oracle NoSQL Database Table API (see
oracle.kv.table.FieldDef.Type
).
Field Name | Field Type | |||
---|---|---|---|---|
zipcode | STRING | |||
lastname | STRING | |||
firstname | STRING | |||
ssn | LONG | |||
gender | ENUM | |||
license | FIXED_BINARY(9) | |||
phoneinfo | MAP(STRING) | |||
address | RECORD(field_name : field_type) | |||
address Record Schema | number : INTEGER | |||
street : STRING | ||||
unit : INTEGER | ||||
city : STRING | ||||
state : STRING | ||||
zip : INTEGER | ||||
vehicleinfo | ARRAY(RECORD(field_name : field_type)) | |||
vehicleinfo Element Record Schema | type : STRING | |||
make : STRING | ||||
model : STRING | ||||
class : STRING | ||||
color : STRING | ||||
value : FLOAT | ||||
tax : DOUBLE | ||||
paid : BOOLEAN | ||||
Primary Key Field Names | ||||
zipcode | lastname | firstname | ssn | |
Shard Key Field Names | ||||
zipcode |
Thus, rmvTable will consist of rows of data the Registry of Motor Vehicles might maintain about vehicle owners who have registered a primary vehicle and (optionally) a second and maybe a third vehicle. In addition to personal information about each owner — such as name, address, gender, phone number(s), etc. — each row of data also contains an array in which each element of the array is a record whose contents consists of information about each vehicle the owner registers. For example, in addition to vehicle attributes such as the make, model, color, etc., the record will also contain the vehicle's assessed value, registration fee (the tax), and whether or not the owner has paid the fee. Although the table schema presented above may seem a bit contrived, it is intended to demonstrate a broad spectrum of data types from the Oracle NoSQL Database Table API.
Although you can enter individual commands in the admin CLI to create a table with the above
schema, an alternative mechanism is to follow the instructions presented in the next sections
to compile and execute the LoadRmvTable
program; which will use the
Oracle NoSQL Database Data Definition Language (DDL)
to create as well as
populate the desired table.
— Compiling the LoadRmvTable Program —
After the KVStore
— either
non-secure
or
secure
—
has been deployed, before executing LoadRmvTable
, that program must first
be compiled. Thus, assuming you installed the separate example distribution under
the directory /opt/apps
, you would type the following command at the
OS command line:
> cd /opt/apps/kv
> javac -classpath /opt/ondb/kv/lib/kvstore.jar:examples examples/hadoop/hive/table/LoadRmvTable.java
which should produce the file:
/opt/apps/kv/examples/hadoop/hive/table/LoadRmvTable.class
In the example command line above, the argument -nops 79
specifies
that 79 rows be written to the rmvTable. If more or less
than that number of rows is desired, then the value of the -nops
parameter should be changed.
If LoadRmvTable
is executed a second time and the
optional -delete
parameter is specified, then all rows added by
any previous executions of LoadRmvTable
are deleted from the
table prior to adding the new rows. Otherwise, all pre-existing rows are left
in place, and the number of rows in the table will be increased by the specified
-nops
number of new rows.
— Creating and Populating 'rmvTable' with Example Data in a Secure KVStore —
To execute LoadRmvTable
against a secure KVStore deployed
and provisioned with a non-administrative user employing the steps presented in
Appendix B
of the Hadoop/Table API Example, an additional parameter must be added to
the command line above. That is, type the following:
> cd /opt/apps/kv
> javac -classpath /opt/ondb/kv/lib/kvclient.jar:/opt/ondb/kv/lib/threetenbp.jar:examples examples/hadoop/hive/table/LoadRmvTable.java
> cp /opt/ondb/example-store/security/client.trust /tmp
> java -classpath /opt/ondb/kv/lib/kvstore.jar:examples hadoop.table.LoadRmvTable \
-store example-store -host kv-host-1 -port 5000 -nops 79 \
-security /tmp/example-user-client-pwdfile.login
[-delete]
where the role of the single additonal -security
parameter is identical
to the role of that parameter when executing the
/LoadVehicleTable
program; as explained in
Appendix B
of the Hadoop/Table API Example.
At this point, the rmvTable created in the specified KVStore (non-secure or secure)
should be populated with the desired example data; which can then be queried via Hive HQL or
Big Data SQL.
Appendix B: Setting the HIVE_AUX_JARS_PATH Environment Variable
In order to make the
Table API Hive Integration classes
available to the Hive client's Java VM, as well as the Java VMs of the
DataNodes of the Hadoop Cluster, kvclient.jar
and threetenbp.jar
must be added
to the classpaths of those respective VMs. To do this, both kvclient.jar
and threetenbp.jar
must be added to the Hive client's HIVE_AUX_JARS_PATH
environment
variable. One way to achieve this is to simply export a new value for
HIVE_AUX_JARS_PATH
from the Hive client's command line; that is,
> export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jarThe other, more permanent (and preferable), way to achieve this is to modify the
hive-env.sh
script located in the HIVE_CONF_DIR
of the Hive client. For example,
> cd $HIVE_CONF_DIR > edit hive-env.sh if [ -z "$HIVE_AUX_JARS_PATH" ]; then export HIVE_AUX_JARS_PATH=/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar else export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar fiIt is important to note that in both cases above, the separator that is used is a comma, not a colon.
Appendix C
of the Hadoop/Table API Example; which describes the necessary artifacts in
detail — their purpose, as well as how to generate and install them.
— Generating the Login, Trust, and Password Artifacts —
To execute a Hive query against a secure KVStore, the necessary public and private
credentials must be incorporated in the definition of the Hive table that will be
queried. To do this, in a fashion similar to that presented in
Appendix C
of the Hadoop/Table API Example, you must create artifacts like those shown
below, and store them on the Hive client's local file system; for example,
/tmp
client.trust
hive-nosql.login
example-user.passwd
Or, if wallet storage will be employed:
/tmp
client.trust
hive-nosql.login
/example-user-wallet.dir
cwallet.sso
where hive-nosql.login
is analogous (actually, identical)
to example-user-server.login
in
Appendix C
.
— Setting the
Recall that
If the password is stored in a password file, then in addition to HIVE_AUX_JARS_PATH
Environment Variable —
Appendix B
of this document described how to add kvclient.jar
and threetenbp.jar
to the Hive client's
HIVE_AUX_JARS_PATH
environment variable in order to run a Hive query
against a non-secure KVStore. In order to run a Hive query against a
secure KVStore, in addition to adding kvclient.jar
and threetenbp.jar
, a number
of other JAR files must also be added to HIVE_AUX_JARS_PATH
; depending
on how the password is stored.
kvclient.jar
and threetenbp.jar
,
the hive-nosql-server.jar
file must also be added to HIVE_AUX_JARS_PATH
;
that is, do one of the following:
> export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar,/opt/apps/kv/examples/hive-nosql-server.jar
or
> cd HIVE_CONF_DIR
> edit hive-env.sh
if [ -z "$HIVE_AUX_JARS_PATH" ]; then
export HIVE_AUX_JARS_PATH=/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar,/opt/apps/kv/examples/hive-nosql-server.jar
else
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar,/opt/apps/kv/examples/hive-nosql-server.jar
fi
On the other hand, if the password is stored in an Oracle Wallet, then you would
also add three wallet related JAR files that are only provided with the Enterprise
Edition of Oracle NoSQL Database. That is, you would do one of the following:
> export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH, \
/opt/ondb/kv/lib/kvclient.jar, \
/opt/ondb/kv/lib/threetenbp.jar, \
/opt/ondb/kv/lib/oraclepki.jar, \
/opt/ondb/kv/lib/osdt_cert.jar, \
/opt/ondb/kv/lib/osdt_core.jar, \
/opt/apps/kv/examples/hive-nosql-server.jar
or
> cd HIVE_CONF_DIR
> edit hive-env.sh
if [ -z "$HIVE_AUX_JARS_PATH" ]; then
export HIVE_AUX_JARS_PATH=/opt/ondb/kv/lib/kvclient.jar, \
/opt/ondb/kv/lib/threetenbp.jar, \
/opt/ondb/kv/lib/oraclepki.jar, \
/opt/ondb/kv/lib/osdt_cert.jar, \
/opt/ondb/kv/lib/osdt_core.jar, \
/opt/apps/kv/examples/hive-nosql-server.jar
else
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH, \
/opt/ondb/kv/lib/kvclient.jar, \
/opt/ondb/kv/lib/threetenbp.jar, \
/opt/ondb/kv/lib/oraclepki.jar, \
/opt/ondb/kv/lib/osdt_cert.jar, \
/opt/ondb/kv/lib/osdt_core.jar, \
/opt/apps/kv/examples/hive-nosql-server.jar
fi
Once Hive has been configured for Oracle NoSQL Database security in the manner
just presented, you can then incorporate the necessary artifacts in your Hive
external table definition in the fashion presented previously in this document;
for vehicleTable in
Example 1
,
and for rmvTable in
Example 2
.
Note that specifying the security related artifacts in the manner presented
in those examples is akin to specifying that information on the command line
(rather than via classpath resources) when initiating a MapReduce job.
Appendix D: Support for Predicate Pushdown
To improve query performance,
Apache Hive
and Oracle Big Data SQL
both support a form of what is referred to as predicate pushdown; in which the client
side frontend processing of a given query decomposes the WHERE
clause (the predicate) of the query into column information and corresponding
comparison operators, passing (pushing) the resulting components to the database
where search processing (filtering) can be performed on the database's server side
backend. To achieve analogous performance improvements, the
Oracle NoSQL Database Table API Hive Integration classes
support similar predicate pushdown functionality when executing Hive or Big Data SQL queries
against data in an Oracle NoSQL Database table. For example, consider the following
query executed against the example vehicleTable
whose schema was described
previously
in this document:
SELECT * FROM vehicleTable WHERE type = 'auto' AND make = 'Chrysler' AND model >= 'Imperial' AND model < 'Sebring';This query will return all rows corresponding to automobiles (rather than trucks or SUVs) made by Chrysler; whose model is 'Imperial', 'Lebaron', or 'PTCruiser', but not 'Sebring'. If predicate pushdown is not employed when executing this query, then all rows from the vehicleTable will be retrieved from the KVStore's backend database and returned to the frontend client, where the predicate information will be applied to search the returned rows for the desired matches. On the other hand, if predicate pushdown is employed, then the information in the
WHERE
clause is sent to the KVStore
and all filtering is performed in the database itself, so that only the rows of the
vehicleTable that match the predicate are returned to the client. It should be
clear then that predicate pushdown, when it can be employed, can result in significant
performance improvements.
As Examples 1
,
2
, and
3
of this document demonstrate, the variety of predicates that can be employed
when querying a table can be virtually unlimited. So it is
important to understand that the predicates that can actually be pushed to the backend
KVStore database are restricted to a finite subset of all possible predicates. This is
because the predicates that can be supported by Oracle NoSQL Database are not only
dependent on what the Hive and Big Data SQL predicate pushdown mechanisms support, but
the semantics of the
Oracle NoSQL Database Table API
as well. As a result, the operators
that are supported by the predicate pushdown mechanism of the
Table API Hive Integration classes
are currently limited to:
= < <=> >= AND OR IN =>In addition to the above set of operators, the semantics of the
Table API
can also affect how the table's fields (columns) will be handled during predicate pushdown.
Specifically, for a given query's predicate,
if a valid PrimaryKey
, IndexKey
and/or FieldRange
(as defined by the
Table API
)
cannot be formed from all or a subset of that predicate's fields, and no part of the predicate can be pushed to the server using the
SQL for Oracle NoSQL Database
mechanism,
then the query's predicate will not be decomposed and sent to the database for
backend filtering. Instead, the system will fallback to the default mechanism, and perform
all filtering on the client side; applying the predicate to all the rows in the given table.
For example, consider the example query presented above. For that query, each component of
the predicate satisfies the necessary criteria for pushdown, and so the whole predicate
will be pushed to the database for search processing. To understand this, first observe that
the operators referenced in the query's predicate belong to the set described above; that is,
'=
', 'AND
', '>=
', '<
'.
Next, based on the schema of vehicleTable, the fields named type and
make form a valid PrimaryKey
for performing a table scan; and the
predicate components referencing the field named 'model' form a valid FieldRange
.
Compare this with a query such as,
SELECT * FROM vehicleTable WHERE make = 'Chrysler' AND model >= 'Imperial' AND model < 'Sebring';Assuming there is no index of the form
(make,model)
, for this query, although
the absence of the key's first field prevents the construction of a valid PrimaryKey
as required by the semantics of the
Table API
,
the predicate can still be pushed to the backend store because it is considered valid for filtering by
SQL For Oracle NoSQL Database
. Finally, consider a query such as,
SELECT * FROM vehicleTable WHERE model LIKE "%Silverado%";For this query, predicate pushdown will be bypassed and all filtering will be performed on the client side. This is because the predicate employs the
LIKE
operator, which is not eligible for predicate pushdown.
Note that the initial two example queries that were presented each result in the whole predicate being pushed and all filtering being performed on the backend; whereas the third example query results in no predicate pushdown and all filtering being performed on the client side. But this does not mean that predicate pushdown will always be handled in such an all-or-nothing manner. On the contrary, for many queries, only part of the predicate will be pushed to the database to produce initial filtered results, which are then further filtered on the client side using the remaining — residual — part of the predicate. For example, consider a query that wishes to find each '4WheelDrive' vehicle in the database that is either a 'Ford' or a 'GM', is 'blue', 'red', or 'yellow', and has a model name that begins with the letter 'E' (GM Equinox, Ford Expedition, or Ford Explorer). Such a query would look like the following:
SELECT * FROM vehicleTable WHERE class = '4WheelDrive' AND (make = 'Ford' OR make = 'GM') AND color IN ('blue', 'red', 'yellow') AND model LIKE 'E%';Based on the criteria presented in the next section, the only component of the query's predicate that cannot be pushed to the backend is the component that employs the
LIKE
operator
(model LIKE 'E%'
); whereas all other components in the query can be pushed.
Thus, when executing the given query, the part of the predicate consisting of the components
"class = '4WheelDrive' AND (make = 'Ford' OR make = 'GM') AND color IN ('blue', 'red', 'yellow')
"
will be pushed to the backend, producing rows referencing all Ford and GM
vehicles that are blue, red, or yellow and four wheel drive;
after which the client will apply the model LIKE 'E%'
residual predicate
to the results from the backend, to select and return only those rows with model
name beginning with the letter 'E'.
— Predicate Pushdown Criteria —
When processing a given query that includes a predicate, the mechanism provided by the
Table API Hive Integration classes
will analyze the query's predicate and apply the following criteria to determine
whether all, part, or none of the predicate can be pushed to the database for
filtering on the backend.
It is important to understand the criteria listed above in conjunction
with the data model and search patterns you expect to employ when you
define the {=, >, >=, <, <="}
, as well as zero or more combinations of
the AND
conjunction, the OR
conjunction, and/or one or more
IN
lists, then the predicate is eligible for predicate pushdown.
Table API
)
PrimaryKey
, IndexKey
, or FieldRange
is eligible
for predicate pushdown; using mechanisms that optimize for scale.
PrimaryKey
, along with any Index
,
for a given KVStore table that will be queried. That is, although the
predicate pushdown mechanism will be employed automatically
— without user intervention or special configuration —
how you define your table and indexes can impact how well common queries
will perform and scale.
Note that predicate pushdown is employed automatically with no obvious
indication (other than improved performance) that it is "on and working".
As a result, if you wish to verify that the mechanism is indeed operating as described
above, you can set the
— Hive Debug Logging —
If you are using Hive (rather than
One way to add new logger elements to the Hive configuration is to:
— Big Data SQL Debug Logging —
If you are using Big Data SQL to query data stored in an Oracle NoSQL Database
table and you wish to enable debug logging for the query processing that is
performed by the
Level
of the Logger
s
named oracle.kv.hadoop.hive.table.TableStorageHandlerBase
and oracle.kv.hadoop.hive.table.TableHiveInputFormat
to the
DEBUG
value (see Appendix E
for more detail). After setting those loggers to DEBUG
, you
can run a query and then observe how the predicate pushdown mechanism
processes the query's predicate by analyzing the contents of the logger
output.
Appendix E: Debug Logging
The Table API Hive Integration classes
employ a number of loggers that can be used for debugging. Two of the loggers are associated with the
Table API Hadoop Integration classes
and descend from the parent logger named oracle.kv.hadoop.table
.
The remaining loggers are specified by the
Table API Hive Integration classes
and descend from the oracle.kv.hadoop.hive.table
parent logger.
In order to enable debug logging, set the logger corresponding to the desired
component to DEBUG
; or TRACE
for more verbose output.
To do this, add or change the appropriate items in the Hive logger configuration;
for example,
log4j.logger.oracle.kv.hadoop.table.TableInputFormatBase=DEBUG
log4j.logger.oracle.kv.hadoop.table.TableRecordReaderBase=INFO
log4j.logger.oracle.kv.hadoop.hive.table.TableStorageHandlerBase=DEBUG
log4j.logger.oracle.kv.hadoop.hive.table.TableHiveInputFormat=DEBUG
log4j.logger.oracle.kv.hadoop.hive.table.TableHiveRecordReader=INFO
log4j.logger.oracle.kv.hadoop.hive.table.TableSerDeBase=INFO
log4j.logger.oracle.kv.hadoop.hive.table.TableSerDe=INFO
log4j.logger.oracle.kv.hadoop.hive.table.TableFieldTypeEnum=INFO
log4j.logger.oracle.kv.hadoop.hive.table.V1V2TableUtil=INFO
Big Data SQL
)
to query data stored in an Oracle NoSQL Database table and you wish to enable debug
logging for the query processing that is performed, then the loggers presented in the
previous section
must be specified in the Hive logging configuration; and that configuration
must be included in the Hive classpath. Since the Hive logging configuration is
generally loaded as a Java resource rather than specified as a system property
on the command line, there are a couple of ways to add/modify the desired loggers
and levels to the Hive logging configuration and then add that configuration
to the Hive classpath.
For example, consider version 4.5.0
of the Big Data Lite VM;
which uses the 1.1.0-cdh5.7.0
version of the Cloudera distribution of Hive.
To enable debug logging for predicate pushdown processing that is performed when executing
a Hive query against a KVStore, you would do something like the following:
> cd /usr/lib/hive/lib
> sudo jar xvf hive-common-1.1.0-cdh5.7.0.jar hive-log4j.properties
> edit hive-log4j.properties
log4j.logger.oracle.kv.hadoop.hive.table.TableStorageHandlerBase=DEBUG
log4j.logger.oracle.kv.hadoop.hive.table.TableHiveInputFormat=DEBUG
> sudo jar uvf hive-common-1.1.0-cdh5.7.0.jar hive-log4j.properties
An alternative way to add new logger elements to the Hive configuration is to
create your own hive-log4j.properties file with the desired logging
configuration, add it to your own JAR file — for example, hive-conf.jar —
and then include that JAR file in the specification of the
HIVE_AUX_JARS_PATH
environment variable set in the hive-env.sh
script located in
the HIVE_CONF_DIR
of the Hive client.
Table API Hive Integration classes
,
then the loggers
employed by those classes must be specified in the file named bigdata-log4j.properties
that is provided with each Big Data SQL system. For example, in a manner similar
to what was done above for
Hive debug logging
,
if you wish to enable debug logging for predicate pushdown processing when a Big Data SQL
query is executed against data in a KVStore table, you would do the following
(on the Big Data Lite VM):
> cd /u01/bigdatasql_config
> edit bigdata.properties
bigdatasql.enabled=true
> edit bigdata-log4j.properties
log4j.logger.oracle.kv.hadoop.hive.table=ALL, file
log4j.logger.oracle.kv.hadoop.hive.table.TableStorageHandlerBase=DEBUG
log4j.logger.oracle.kv.hadoop.hive.table.TableHiveInputFormat=DEBUG
Note that for versions of the Big Data Lite VM beyond 4.5.0
,
it should no longer be necessary to enable the Big Data SQL predicate pushdown mechanism
by setting the bigdatasql.enabled
property to true
;
as that property should be set by default in the later versions.
Copyright (c) 2011, 2017 Oracle and/or its affiliates. All rights reserved.