Apache Hive or
Oracle Big Data SQL
queries against data written via the Oracle NoSQL Database Table API.See: Description
| Class | Description |
|---|---|
| LoadRmvTable |
Class that creates an example table in a given NoSQL Database store and
then uses the Table API to populate the table with sample records.
|
Apache Hive or
Oracle Big Data SQL
queries against data written via the Oracle NoSQL Database Table API.
Oracle NoSQL Database Hadoop integration classes,
which support running Hadoop MapReduce jobs against data stored in an
Oracle NoSQL Database table, it was natural to also provide new interfaces
and classes which support running Hive queries against such table data
(since a typical Hive query generally results in the execution of a
MapReduce job). In addition to describing the core interfaces and classes
involved in running a Hive query against data from a table located in a
given Oracle NoSQL Database store (a KVStore), the information
presented below also walks through the steps to take to execute a
given set of basic Hive queries against example table data contained in
such a store; either secure
or non-secure.
Apache Hive
and its programming model; that is, become familiar with how to write
and execute a Hive query.
Apache Hadoop;
specifically, become familiar with how Hive and Hadoop interact.
/opt/ondb/kv —
that is network reachable from the nodes of the Hadoop cluster.
/opt/ondb/example-store —
to 3 machines (real or virtual) with host names, kv-host-1,
kv-host-2, and kv-host-3; where an admin service,
listening on port 5000, is deployed on each host.
Appendix B
of the Hadoop/Table API Example to securely connect to the
store's admin service and create a KVStore user named example-user;
along with the appropriate security artifacts (login file,
trust file, and either password file or
Oracle Wallet [Enterprise Edition only]).
steps
presented in that example to create and populate a table named
vehicleTable with example data consisting of only
primitive data types.
LoadRmvTable program provided
with this example, and then follow the steps presented in the sections of
Appendix A
of this document to create and populate a table named rmvTable with example
data consisting of both primitive and non-primitive data types.
<KVHOME>
and <KVROOT> environment variables, the store name,
host names, and admin port described above should allow you to more easily
follow the example that is presented. Combined with the information contained
in the Oracle NoSQL Database Getting Started Guide, as well as the
Oracle NoSQL Database Admin Guide, the Oracle NoSQL Database Security Guide
and the
Hadoop/Table API Example,
you should then be able to generalize and extend the examples presented here to your own
particular development scenario; substituting the values specific to the
given environment where necessary.
Note that detailed instructions for deploying a non-secure KVStore are provided in
Appendix A
of the Hadoop/Table API Example. Similarly,
Appendix B
of the Hadoop/Table API Example provides instructions for deploying a
KVStore configured for security.
Apache Hive
is a data warehouse infrastructure built on top of
Apache Hadoop
that facilitates querying datasets residing in distributed file systems or data stores such as
Hadoop HDFS
or
Amazon S3.
Additionally, Hive also provides a pluggable programming model that allows
you to specify custom interfaces and classes that support querying
data residing in data sources other than HDFS and S3; in particular,
data written to an Oracle NoSQL Database table.
To access and analyze data stored in these data sources, Hive provides a
mechanism to project structure onto the data and query the data using
a SQL-like language called the Hive Query Language, or
HQL. Depending on the complexity of a given Hive query,
the Hive infrastructure may construct and deploy a set of MapReduce
jobs to retrieve and process the data or, when possible, it may simply
satisfy the query via the metadata stored in the Hive metastore
(Derby or MySQL).
In addition to the Hive infrastructure itself, Hive also provides a convenient client-side command line interface (the Hive CLI); which allows you to interact with the Hive infrastructure to create a Hive external table and then map it to the data located in a source like those described above.
As indicated above, a set of interfaces and classes that satisfy the
Hive programming model are provided by Oracle NoSQL Database which support
running Hive queries against table data contained in a KVStore
— either secure or non-secure.
These classes are located in the
oracle.kv.hadoop.hive.table
package, and consist of the following Hive and Hadoop types:
org.apache.hadoop.hive.ql.metadata.HiveStorageHandler,
which is the mechanism (pluggable interface) to use to specify the
location of the data the Hive infrastructure should process, as well
as how to process that data; and which consists of the following components:
org.apache.hadoop.mapred.InputFormat,
which specifies how the associated MapReduce job reads its input data
(from an Oracle NoSQL Database table).
org.apache.hadoop.mapred.OutputFormat,
which specifies how the associated MapReduce job writes its output.
org.apache.hadoop.hive.serde2.SerDe,
which is used to deserialize the table data that is retrieved and
sent to the Hive infrastructure and/or Hadoop MapReduce job for processing,
and to serialize data input to Hive that will be written to the table (currently not supported).
org.apache.hadoop.mapred.RecordReader,
which specifies how the mapped keys and values are located and retrieved
during any MapReduce processing performed as part of executing a Hive query.
org.apache.hadoop.mapred.InputSplit,
which represents the data to be processed by an individual Mapper
that operates during the MapReduce processing performed as part of
executing a Hive query.
HiveStorageHandler
provided in the Oracle NoSQL Database distribution that the Hive infrastructure
obtains access to a given KVStore and the table data on which to run
the desired Hive query.
oracle.kv.hadoop.hive.table.TableStorageHandler
oracle.kv.hadoop.hive.table.TableHiveInputFormat
oracle.kv.hadoop.hive.table.TableHiveInputSplit
oracle.kv.hadoop.hive.table.TableHiveRecordReader
oracle.kv.hadoop.hive.table.TableSerDe
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector
interface that support deserialization of the primitive and non-primitive data types defined by the Oracle NoSQL Database Table API (see below).
ObjectInspector
interface are used in the deserialization process to convert the associated data type
defined by the Oracle NoSQL Database Table API to its corresponding type in the
Hive data model:
oracle.kv.hadoop.hive.table.TableBinaryObjectInspector.
oracle.kv.hadoop.hive.table.TableBooleanObjectInspector.
oracle.kv.hadoop.hive.table.TableDoubleObjectInspector.
oracle.kv.hadoop.hive.table.TableFloatObjectInspector.
oracle.kv.hadoop.hive.table.TableIntObjectInspector.
oracle.kv.hadoop.hive.table.TableLongObjectInspector.
oracle.kv.hadoop.hive.table.TableEnumObjectInspector
oracle.kv.hadoop.hive.table.TableArrayObjectInspector
oracle.kv.hadoop.hive.table.TableMapObjectInspector
oracle.kv.hadoop.hive.table.TableRecordObjectInspector
oracle.kv.table.FieldDef.Type)
is mapped to only a subset of the types defined by Hive's
data model. Specifically, when creating a Hive external table so that you
can query the data in a given Oracle NoSQL Database table, the Hive table
must be created with a schema consistent with the mappings shown in the following
table:
| Oracle NoSQL Database Table API | Hive |
|---|---|
| FieldDef.Type.STRING | STRING |
| VARCHAR | |
| CHAR | |
| FieldDef.Type.BOOLEAN | BOOLEAN |
| FieldDef.Type.BINARY | BINARY |
| FieldDef.Type.FIXED_BINARY | BINARY |
| TINYINT | |
| SMALLINT | |
| FieldDef.Type.INTEGER | INT |
| FieldDef.Type.LONG | BIGINT |
| FieldDef.Type.FLOAT | FLOAT |
| DECIMAL | |
| FieldDef.Type.DOUBLE | DOUBLE |
| FieldDef.Type.ENUM | STRING |
| TIMESTAMP | |
| DATE | |
| FieldDef.Type.ARRAY | ARRAY |
| FieldDef.Type.MAP | MAP<STRING, data_type> |
| FieldDef.Type.RECORD | STRUCT<col_name : data_type, ...> |
| UNIONTYPE<data_type, data_type, ...> |
It is important to understand that when using Hive to query data in an
Oracle NoSQL Database table, the schema of the Hive external table you
create is dependent on the schema of the corresponding Oracle NoSQL Database
table you wish to query. Thus, if you create a Hive external table with
a schema that includes a Hive data type that is not mapped
from an Oracle NoSQL Database
FieldDef.Type,
then an error will occur when any attempt is made to query the table.
— YARN versus MapReduce version 1 —
Currently, Hadoop deployments include two versions of MapReduce.
The first version (referred to as MRv1) is the
original version of MapReduce; and consists of interfaces and classes
from the Java package org.apache.hadoop.mapred.
The newer version of MapReduce is referred to as YARN
(Yet Another Resource Negotiator) or, more generally,
MRv2; and resides in the package
org.apache.hadoop.mapreduce. Unfortunately, the
Table API Hive integration
classes must address the existence of both versions of MapReduce, for the
following reasons:
Oracle NoSQL Database Table API Hadoop integration classes employ MRv2.
core classes
listed above must be subclasses of the previously described
MRv1 classes.
For example, rather than subclassing the MRv2 based Table API Hadoop integration class
TableInputFormat
(which would be preferred), because of the incompatability between MRv1 and MRv2,
the Table API Hive integration class described above
(TableHiveInputFormat)
actually subclasses the Hadoop MRv1 class
org.apache.hadoop.mapred.InputFormat. Thus, to exploit and reuse the mechanisms
provided by the
Table API Hadoop integration
classes, the
Table API Hive integration
classes presented here internally create, manage, and delegate to subclasses of the
appropriate MRv2 based classes.
It is also important to note that because the
Table API Hadoop integration
classes do not currently support writing data from a MapReduce job into
a KVStore, the
Table API Hive integration
classes do not support queries that modify the contents of a table in a KVStore.
prerequisites
listed above, if not done so already, you should deploy either a
non-secure
or a
secure
KVStore (or both), and then
create and populate
a table named vehicleTable in each such store. If any of the stores you
deploy are configured for secure access, then you must also follow the steps provided in
Appendix C
of this document to configure Hive with the artifacts and environment necessary to interact
with a secure KVStore. Once these initial steps are performed, you can then create a
Hive external table mapped to the KVStore vehicleTable, and execute the
set of example Hive queries (described below) against the data stored in that table.
prerequisites
related to the Hive programming model and its CLI, you can create a Hive external table
mapped to the Oracle NoSQL Database 'vehicleTable' located in the desired KVStore,
and then query the data in that table. To do this, first execute the Hive interactive CLI
by typing the following at the command line of the Hive client node:
> hivewhich should then present the following Hive CLI command prompt:
hive>At this point, the desired Hive commands (table creation and/or queries) can be executed.
— Creating a Hive External Table Mapped to 'vehicleTable' in a Non-Secure KVStore —
If you will be running your Hive query against a non-secure KVStore
deployed in the manner described in
Appendix A
of the Hadoop/Table API Example, after adding kvclient.jar to
the HIVE_AUX_JARS_PATH environment variable of the Hive client
(as descibed in
Appendix B
of this document), you can create the external Hive table required by this example by typing the following
at the Hive CLI command prompt:
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
vehicleTable (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT)
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "vehicleTable",
"oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3");
The command above applies the required data model mapping
to create a Hive table named vehicleTable with columns whose types are consistent
with the corresponding fields of the Oracle NoSQL Database table created and populated in the
Hadoop/Table API Example;
which was also named vehicleTable. Additionally, you should note the
following points:
"oracle.kv.hosts" property; as long as at least one valid administrative host (and port) is included.
"oracle.kv.hadoop.hosts" property in the example command above:
Oracle Big Data SQL system.
If you will be running your Hive query against a secure KVStore
deployed in the manner described in
Appendix B
of the Hadoop/Table API Example, and if your password storage employs a
password file instead of an Oracle Wallet, then you can create the external Hive
table required by this example by typing the following at the Hive CLI command
prompt:
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
vehicleTablePasswd (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT)
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "vehicleTable",
"oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3",
"oracle.kv.security" = "/tmp/hive-nosql.login",
"oracle.kv.ssl.trustStore" = "/tmp/client.trust",
"oracle.kv.auth.username" = "example-user",
"oracle.kv.auth.pwdfile.file" = "/tmp/example-user.passwd");
On the other hand, if your password storage employs an Oracle Wallet instead of a
password file, then type the following at the command prompt to create the required
Hive table,
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
vehicleTableWallet (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT)
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "vehicleTable",
"oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3",
"oracle.kv.security" = "/tmp/hive-nosql.login",
"oracle.kv.ssl.trustStore" = "/tmp/client.trust",
"oracle.kv.auth.username" = "example-user",
"oracle.kv.auth.wallet.dir" = "/tmp/example-user-wallet.dir");
where the commands above each apply the required
data model mapping
to create a Hive table named vehicleTablePasswd and vehicleTableWallet
respectively; each with the same structure, schema, and attributes as the
vehicleTable created in the non-secure case. And where the same
additional points
noted for the non-secure case also apply in this case.
Note the additional properties specified in the TBLPROPERTIES directive;
that is, oracle.kv.security, oracle.kv.ssl.trustStore,
oracle.kv.auth.username, and oracle.kv.auth.pwdfile.file
(or oracle.kv.auth.wallet.dir). Each of these additional properties
corresponds to one of the artifacts required to access a secure KVStore; where
the nature of each property and its corresponding value is described in detail in
Appendix C
of the Hadoop/Table API Example.
Thus, from the Hive command prompt, type the following queries for each Hive table and observe that the output of each query you execute, if successful, looks something like the corresponding sample output shown.
— List the Contents of Each Row in 'vehicleTable' —
hive> select * from vehicleTable; hive> select * from vehicleTablePasswd; hive> select * from vehicleTableWallet; OK auto Chrysler Imperial FrontWheelDrive white 20743.943359375 5 auto GM Impala 4WheelDrive black 20743.91015625 46 auto GM Impala FrontWheelDrive yellow 20743.5390625 28 truck Ford F250 AllWheelDrive blue 31115.759765625 47 ..........Note that unlike the more complicated example queries below, the query above does not result in the execution of a MapReduce job. This is because there is enough metadata in the Hive metastore to satisfy the query.
— Count the Total Number of Rows in 'vehicleTable' —
hive> select count(type) from vehicleTable; hive> select count(type) from vehicleTablePasswd; hive> select count(type) from vehicleTableWallet; Launching Job 1 out of 1 .......... Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 1 2014-12-12 12:04:18,403 Stage-1 map = 0%, reduce = 0% 2014-12-12 12:05:12,431 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 2.26 sec 2014-12-12 12:05:13,816 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 6.7 sec 2014-12-12 12:05:15,201 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 6.87 sec 2014-12-12 12:05:16,594 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.16 sec 2014-12-12 12:05:17,980 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.16 sec 2014-12-12 12:05:19,364 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.16 sec 2014-12-12 12:05:20,754 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.24 sec 2014-12-12 12:05:22,140 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.24 sec .......... Job 0: Map: 6 Reduce: 1 Cumulative CPU: 15.24 sec HDFS Read: 4532 HDFS Write: 3 SUCCESS Total MapReduce CPU Time Spent: 15 seconds 240 msec OK 79 Time taken: 89.359 seconds, Fetched: 1 row(s)— For All Vehicles in 'vehicleTable' Find the Lowest Price —
hive> select min(price) from vehicleTable; hive> select min(price) from vehicleTablePasswd; hive> select min(price) from vehicleTableWallet; Launching Job 1 out of 1 .......... Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 1 2014-12-12 12:11:10,924 Stage-1 map = 0%, reduce = 0% 2014-12-12 12:12:06,213 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 6.77 sec 2014-12-12 12:12:07,606 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 6.77 sec 2014-12-12 12:12:09,076 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.16 sec 2014-12-12 12:12:10,464 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.3 sec 2014-12-12 12:12:11,849 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.3 sec 2014-12-12 12:12:13,238 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.3 sec 2014-12-12 12:12:14,629 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.38 sec 2014-12-12 12:12:16,031 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.38 sec .......... Job 0: Map: 6 Reduce: 1 Cumulative CPU: 15.38 sec HDFS Read: 4532 HDFS Write: 16 SUCCESS Total MapReduce CPU Time Spent: 15 seconds 380 msec OK 20743.244140625 Time taken: 89.615 seconds, Fetched: 1 row(s)— List All GM Vehicles in 'vehicleTable' —
hive> select * from vehicleTable where make LIKE "%GM%"; hive> select * from vehicleTablePasswd where make LIKE "%GM%"; hive> select * from vehicleTableWallet where make LIKE "%GM%"; Launching Job 1 out of 1 .......... Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0 2014-12-12 12:19:24,269 Stage-1 map = 0%, reduce = 0% 2014-12-12 12:20:18,239 Stage-1 map = 9%, reduce = 0%, Cumulative CPU 2.43 sec 2014-12-12 12:20:19,622 Stage-1 map = 26%, reduce = 0%, Cumulative CPU 4.81 sec 2014-12-12 12:20:21,006 Stage-1 map = 26%, reduce = 0%, Cumulative CPU 7.14 sec 2014-12-12 12:20:22,395 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 13.09 sec 2014-12-12 12:20:23,777 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 16.06 sec 2014-12-12 12:20:25,162 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 16.06 sec .......... Job 0: Map: 6 Cumulative CPU: 16.06 sec HDFS Read: 4532 HDFS Write: 1491 SUCCESS Total MapReduce CPU Time Spent: 16 seconds 60 msec OK suv GM Equinox 4WheelDrive yellow 41486.78125 37 truck GM Sierra 4WheelDrive black 31115.224609375 87 auto GM Corvette FrontWheelDrive yellow 20743.84375 7 auto GM Impala 4WheelDrive black 20743.91015625 46 ..........— List All Sierra Trucks in 'vehicleTable' —
hive> select * from vehicleTable where model LIKE "%Sierra%"; hive> select * from vehicleTablePasswd where model LIKE "%Sierra%"; hive> select * from vehicleTableWallet where model LIKE "%Sierra%"; Launching Job 1 out of 1 .......... Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0 2014-12-12 13:36:24,284 Stage-1 map = 0%, reduce = 0% 2014-12-12 13:37:19,528 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 9.35 sec 2014-12-12 13:37:20,910 Stage-1 map = 65%, reduce = 0%, Cumulative CPU 11.98 sec 2014-12-12 13:37:22,296 Stage-1 map = 97%, reduce = 0%, Cumulative CPU 15.31 sec 2014-12-12 13:37:23,681 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 15.58 sec 2014-12-12 13:37:25,069 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 15.58 sec .......... Job 0: Map: 6 Cumulative CPU: 15.58 sec HDFS Read: 4532 HDFS Write: 496 SUCCESS Total MapReduce CPU Time Spent: 15 seconds 580 msec OK truck GM Silverado2500 4WheelDrive blue 31115.548828125 68 truck GM Silverado2500 4WheelDrive-4cylinder blue-on-green 31114.91015625 17 truck GM Silverado2500 AllWheelDrive white 31115.275390625 36 truck GM Silverado2500 AllWheelDrive yellow 31114.796875 73 truck GM Silverado1500 4WheelDrive black 31114.98046875 64 truck GM Silverado1500 4WheelDrive green 31115.078125 38 truck GM Silverado1500 RearWheelDrive blue 31115.640625 55 truck GM Silverado1500 RearWheelDrive white 31115.517578125 37 .......... Time taken: 83.589 seconds, Fetched: 8 row(s)
Example 1.
In this example, a schema is employed that consists of a variety of data types;
both primitive and non-primitive. And in the same manner as
Example 1,
you must take the following initial steps to run the queries presented in this example:
prerequisites.
non-secure
or a
secure KVStore (or both).
Appendix A
of this document to create and populate a table named rmvTable in each store
that you deploy.
Appendix C
of this document to configure Hive with the artifacts and environment necessary to interact
with a secure KVStore.
> hivewhich should present the following Hive CLI command prompt:
hive>At this point, you can execute the Hive commands presented in the next sub-sections for the non-secure KVStore or the secure KVStore (or both).
— Creating a Hive External Table Mapped to 'rmvTable' in a Non-Secure KVStore —
If you will be running your Hive query against a non-secure KVStore
deployed in the manner described in
Appendix A
of the Hadoop/Table API Example, after adding kvclient.jar to
the HIVE_AUX_JARS_PATH environment variable of the Hive client
(as descibed in
Appendix B
of this document), you can create the external Hive table required by this example by typing the following
at the Hive CLI command prompt:
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
rmvTable (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING,
SSN BIGINT,
GENDER STRING,
LICENSE BINARY,
PHONEINFO MAP<STRING, STRING>,
ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>,
VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
COMMENT 'Hive table rmvTable <---> KVStore table rmvTable'
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "rmvTable",
"oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3");
Noting the same additional points
that were noted for each of the cases presented in Example 1,
the command above applies the required
Oracle NoSQL Database-to-Hive Data Model Mappings
to create a Hive table named rmvTable with columns whose types are consistent with the corresponding
fields of the Oracle NoSQL Database table — also named rmvTable — that you created
and populated when you followed the steps outlined in
Appendix A of this document.
Note that it is important to know the specific schema of the Oracle NoSQL Database table
you wish to query before specifying the schema of that table's corresponding Hive
external table. For example, observe that the field named gender in the
schema
of the Oracle NoSQL Database rmvTable presented in Appendix C is defined as type
FieldDef.Type.ENUM,
whereas the GENDER field specified in the Hive external table created
above is defined as type STRING; as specified by the required
data model mapping.
Thus, the combination of the
Oracle NoSQL Database-to-Hive Data Model Mapping
with the given Oracle NoSQL Database table's specific schema
allows you to determine whether a STRING field in a Hive table is mapped from an Oracle NoSQL Database
FieldDef.Type.STRING
or
FieldDef.Type.ENUM.
— Creating a Hive External Table Mapped to 'rmvTable' in a Secure KVStore —
If you will be running your Hive query against a secure KVStore
deployed in the manner described in
Appendix B
of the Hadoop/Table API Example, and if your password storage employs a
password file instead of an Oracle Wallet, then you can create the external Hive
table required by this example by typing the following at the Hive CLI command
prompt:
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
rmvTablePasswd (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING,
SSN BIGINT,
GENDER STRING,
LICENSE BINARY,
PHONEINFO MAP<STRING, STRING>,
ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>,
VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
COMMENT 'Hive table rmvTablePasswd <---> KVStore table rmvTable'
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "rmvTable",
"oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3",
"oracle.kv.security" = "/tmp/hive-nosql.login",
"oracle.kv.ssl.trustStore" = "/tmp/client.trust",
"oracle.kv.auth.username" = "example-user",
"oracle.kv.auth.pwdfile.file" = "/tmp/example-user.passwd");
On the other hand, if your password storage employs an Oracle Wallet instead of a
password file, then type the following at the command prompt to create the required
Hive table,
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
rmvTableWallet (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING,
SSN BIGINT,
GENDER STRING,
LICENSE BINARY,
PHONEINFO MAP<STRING, STRING>,
ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>,
VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
COMMENT 'Hive table rmvTableWallet <---> KVStore table rmvTable'
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "rmvTable",
"oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3",
"oracle.kv.security" = "/tmp/hive-nosql.login",
"oracle.kv.ssl.trustStore" = "/tmp/client.trust",
"oracle.kv.auth.username" = "example-user",
"oracle.kv.auth.wallet.dir" = "/tmp/example-user-wallet.dir");
where the commands above each combine the required
data model mapping
with the schema of the
rmvTable that you created
to produce a Hive external table named rmvTablePasswd and rmvTableWallet
respectively; each with the same structure, schema, and attributes as the Hive rmvTable
produced in the non-secure case. And where the same
additional points
noted for the non-secure case also apply in this case.
Note the additional properties specified in the TBLPROPERTIES directive;
that is, oracle.kv.security, oracle.kv.ssl.trustStore,
oracle.kv.auth.username, and oracle.kv.auth.pwdfile.file
(or oracle.kv.auth.wallet.dir). Each of these additional properties
corresponds to one of the artifacts required to access a secure KVStore; where
the nature of each property and its corresponding value is described in detail in
Appendix C
of the Hadoop/Table API Example.
Thus, from the Hive command prompt, type the following queries for each Hive table and observe that the output of each query you execute, if successful, looks something like the corresponding sample output shown.
— List the Contents of Each Row in the 'rmvTable' —
hive> select * from rmvTable;
hive> select * from rmvTablePasswd;
hive> select * from rmvTableWallet;
OK
49027 GOMEZ CHRISTOPHER 509367447 male S57428836
{"cell":"616-351-0185","home":"213-630-2419","work":"617-227-9840"}
{"number":88072,"street":"Fifth Avenue","unit":6,"city":"Cambridge","state":"OK","zip":49027}
[{"type":"auto","make":"Ford","model":"Taurus","class":"AllWheelDrive","color":"blue","value":20743.234,"tax":566.290283203125,"paid":false},
{"type":"auto","make":"Ford","model":"Taurus","class":"FrontWheelDrive","color":"blue","value":20743.559,"tax":566.2991333007812,"paid":true}]
40719 ROSARIO ANNE 448406765 female S04809975
{"cell":"303-804-1660","home":"408-630-2412","work":"415-804-9515"}
{"number":96581,"street":"Third Avenue","unit":7,"city":"Springfield","state":"RI","zip":40719}
[{"type":"truck","make":"Chrysler","model":"Ram3500","class":"RearWheelDrive","color":"blue","value":31115.26,"tax":849.4465942382812,"paid":true},
{"type":"truck","make":"Chrysler","model":"Ram1500","class":"AllWheelDrive","color":"blue","value":31114.873,"tax":849.43603515625,"paid":false},
{"type":"auto","make":"Ford","model":"Edge","class":"RearWheelDrive","color":"yellow","value":20743.889,"tax":566.3081665039062,"paid":true}]
..........
Note that unlike the more complicated example queries below, the query above
does not result in the execution of a MapReduce job.
This is because there is enough metadata in the Hive metastore to satisfy
the query.
— List the Name, Gender, and Address of Each Vehicle Owner in 'rmvTable' —
hive> select lastname,firstname,gender,address from rmvTable;
hive> select lastname,firstname,gender,address from rmvTablePasswd;
hive> select lastname,firstname,gender,address from rmvTableWallet;
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2015-06-25 05:39:04,940 Stage-1 map = 0%, reduce = 0%
2015-06-25 05:39:43,752 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 8.38 sec
2015-06-25 05:39:45,242 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 21.73 sec
2015-06-25 05:39:46,727 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 26.72 sec
..........
Job 0: Map: 6 Cumulative CPU: 26.72 sec HDFS Read: 4760 HDFS Write: 4702 SUCCESS
Total MapReduce CPU Time Spent: 26 seconds 720 msec
OK
SNIDER FRANK male {"number":33512,"street":"Summer Street","unit":1,"city":"Arlington","state":"TN","zip":89150}
MILLER ROCH male {"number":25698,"street":"Mullberry Street","unit":6,"city":"Madison","state":"VA","zip":5740}
TATE BENJAMIN male {"number":2894,"street":"View Street","unit":-1,"city":"Clinton","state":"KY","zip":57466}
..........
Time taken: 87.327 seconds, Fetched: 79 row(s)
— List the Name and Phone Number (Home, Cell, or Work) of Each Vehicle Owner in 'rmvTable' —
hive> select firstname,lastname,phoneinfo["home"] from rmvTable; hive> select firstname,lastname,phoneinfo["cell"] from rmvTablePasswd; hive> select firstname,lastname,phoneinfo["work"] from rmvTableWallet; Launching Job 1 out of 1 .......... Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0 2015-06-25 06:08:09,885 Stage-1 map = 0%, reduce = 0% 2015-06-25 06:08:41,094 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 18.11 sec 2015-06-25 06:08:42,582 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 18.11 sec .......... Job 0: Map: 6 Cumulative CPU: 18.11 sec HDFS Read: 4724 HDFS Write: 2141 SUCCESS Total MapReduce CPU Time Spent: 18 seconds 110 msec OK CHRISTOPHER GOMEZ 213-630-2419 ANNE ROSARIO 408-630-2412 MEGAN PHELPS 978-541-5710 MICHAEL BRADLEY 313-351-4580 .......... Time taken: 69.173 seconds, Fetched: 79 row(s)
— Count the Total Number of Rows in 'rmvTable' —
hive> select count(vehicleinfo[0].type) from rmvTable; hive> select count(vehicleinfo[0].type) from rmvTablePasswd; hive> select count(vehicleinfo[0].type) from rmvTableWallet; Launching Job 1 out of 1 .......... Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 1 2015-06-25 05:53:31,200 Stage-1 map = 0%, reduce = 0% 2015-06-25 05:54:09,856 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 12.12 sec 2015-06-25 05:54:11,346 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 24.45 sec 2015-06-25 05:54:12,835 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 24.45 sec 2015-06-25 05:54:14,336 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 25.51 sec 2015-06-25 05:54:15,825 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 25.51 sec .......... Job 0: Map: 6 Reduce: 1 Cumulative CPU: 25.51 sec HDFS Read: 4760 HDFS Write: 3 SUCCESS Total MapReduce CPU Time Spent: 25 seconds 510 msec OK 79 Time taken: 82.824 seconds, Fetched: 1 row(s)
— For Each Owner's Primary Vehicle in 'rmvTable', Find the Minimum Assessed Value —
hive> select min(vehicleinfo[0].value) from rmvTable; hive> select min(vehicleinfo[0].value) from rmvTablePasswd; hive> select min(vehicleinfo[0].value) from rmvTableWallet; Launching Job 1 out of 1 .......... Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 1 2014-12-12 12:11:10,924 Stage-1 map = 0%, reduce = 0% 2014-12-12 12:12:06,213 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 6.77 sec 2014-12-12 12:12:07,606 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 6.77 sec 2014-12-12 12:12:09,076 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.16 sec 2014-12-12 12:12:10,464 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.3 sec 2014-12-12 12:12:11,849 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.3 sec 2014-12-12 12:12:13,238 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 14.3 sec 2014-12-12 12:12:14,629 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.38 sec 2014-12-12 12:12:16,031 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 15.38 sec .......... Job 0: Map: 6 Reduce: 1 Cumulative CPU: 15.38 sec HDFS Read: 4532 HDFS Write: 16 SUCCESS Total MapReduce CPU Time Spent: 15 seconds 380 msec OK 20743.244140625 Time taken: 89.615 seconds, Fetched: 1 row(s)
— List all Vehicle Information for the Primary (or Second or Third) Vehicle of Each Owner in 'rmvTable' —
hive> select vehicleinfo[0] from rmvTable;
hive> select vehicleinfo[1] from rmvTablePasswd;
hive> select vehicleinfo[2] from rmvTableWallet;
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2015-06-25 06:31:54,779 Stage-1 map = 0%, reduce = 0%
2015-06-25 06:32:33,430 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 4.59 sec
2015-06-25 06:32:34,919 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 27.33 sec
2015-06-25 06:32:36,416 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 27.89 sec
2015-06-25 06:32:37,903 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 27.89 sec
..........
Job 0: Map: 6 Cumulative CPU: 27.89 sec HDFS Read: 4760 HDFS Write: 5681 SUCCESS
Total MapReduce CPU Time Spent: 27 seconds 890 msec
OK
{"type":"suv","make":"GM","model":"Tahoe","class":"4WheelDrive","color":"black","value":41487.242,"tax":1132.6016845703125,"paid":true}
{"type":"auto","make":"Chrysler","model":"Imperial","class":"AllWheelDrive","color":"red","value":20743.926,"tax":566.3092041015625,"paid":true}
{"type":"auto","make":"Ford","model":"Taurus","class":"RearWheelDrive","color":"blue","value":20744.076,"tax":566.3132934570312,"paid":true}
{"type":"truck","make":"Ford","model":"F150","class":"AllWheelDrive","color":"green","value":31115.299,"tax":849.4476928710938,"paid":false}
..........
Time taken: 82.056 seconds, Fetched: 79 row(s)
— List the Name, Address, and Primary Vehicle Information of All Owners in 'rmvTable' Whose Last Name Starts with 'H' —
hive> select firstname,lastname,address,vehicleinfo[0] from rmvTable where lastname RLIKE "^[H].*";
hive> select firstname,lastname,address,vehicleinfo[0] from rmvTablePasswd where lastname RLIKE "^[H].*";
hive> select firstname,lastname,address,vehicleinfo[0] from rmvTableWallet where lastname RLIKE "^[H].*";
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2015-06-25 06:41:35,578 Stage-1 map = 0%, reduce = 0%
2015-06-25 06:42:14,310 Stage-1 map = 15%, reduce = 0%
2015-06-25 06:42:15,803 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 9.46 sec
2015-06-25 06:42:17,290 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 23.29 sec
2015-06-25 06:42:18,778 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 27.78 sec
2015-06-25 06:42:20,271 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 27.78 sec
..........
Job 0: Map: 6 Cumulative CPU: 27.78 sec HDFS Read: 4760 HDFS Write: 1143 SUCCESS
Total MapReduce CPU Time Spent: 27 seconds 780 msec
OK
CINDY HODGES {"number":56758,"street":"Vaughan Avenue","unit":-1,"city":"Madison","state":"NH","zip":79623}
{"type":"truck","make":"Chrysler","model":"Ram1500","class":"RearWheelDrive","color":"black","value":31115.129,"tax":849.4430541992188,"paid":true}
JULIA HOLDEN {"number":56209,"street":"Main Street","unit":1,"city":"Georgetown","state":"CA","zip":62154}
{"type":"auto","make":"Ford","model":"Taurus","class":"FrontWheelDrive","color":"blue","value":20743.8,"tax":566.3057861328125,"paid":true}
PHYLLIS HOGAN {"number":47358,"street":"Park Street","unit":6,"city":"Arlington","state":"AL","zip":77333}
{"type":"suv","make":"Chrysler","model":"Journey","class":"RearWheelDrive","color":"yellow","value":41486.79,"tax":1132.58935546875,"paid":false}
..........
Time taken: 81.171 seconds, Fetched: 9 row(s)
— List the Name, Address, and Vehicle Information of Each Owner in 'rmvTable' Whose Second Vehicle is a GM Vehicle —
hive> select firstname,lastname,address,vehicleinfo[1] from rmvTable where vehicleinfo[1].make LIKE "%GM%";
hive> select firstname,lastname,address,vehicleinfo[1] from rmvTablePasswd where vehicleinfo[1].make LIKE "%GM%";
hive> select firstname,lastname,address,vehicleinfo[1] from rmvTableWallet where vehicleinfo[1].make LIKE "%GM%";
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2015-06-25 06:57:01,117 Stage-1 map = 0%, reduce = 0%
2015-06-25 06:57:32,506 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 9.29 sec
2015-06-25 06:57:34,000 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 18.8 sec
2015-06-25 06:57:35,484 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 18.8 sec
..........
Job 0: Map: 6 Cumulative CPU: 18.8 sec HDFS Read: 4724 HDFS Write: 2087 SUCCESS
Total MapReduce CPU Time Spent: 18 seconds 800 msec
OK
NANCY STOUT {"number":31126,"street":"Cedar Street","unit":8,"city":"Arlington","state":"MO","zip":731}
{"type":"suv","make":"GM","model":"Equinox","class":"AllWheelDrive","color":"red","value":41486.43,"tax":1132.57958984375,"paid":true}
DONALD COOK {"number":34287,"street":"Third Avenue","unit":1,"city":"Cambridge","state":"NV","zip":67995}
{"type":"suv","make":"GM","model":"Tahoe","class":"AllWheelDrive","color":"black","value":41487.168,"tax":1132.5997314453125,"paid":false}
RANDY MCDOWELL {"number":18391,"street":"Lane Avenue","unit":8,"city":"Concord","state":"NH","zip":42540}
{"type":"auto","make":"GM","model":"Corvette","class":"FrontWheelDrive","color":"black","value":20744.035,"tax":566.3121337890625,"paid":false}
..........
Time taken: 71.794 seconds, Fetched: 17 row(s)
— List the Name, Address, and Vehicle Information of Each Owner in 'rmvTable' Whose Primary Vehicle is a Silverado Truck —
hive> select firstname,lastname,address,vehicleinfo[0] from rmvTable where vehicleinfo[0].model LIKE "%Silverado%";
hive> select firstname,lastname,address,vehicleinfo[0] from rmvTablePasswd where vehicleinfo[0].model LIKE "%Silverado%";
hive> select firstname,lastname,address,vehicleinfo[0] from rmvTableWallet where vehicleinfo[0].model LIKE "%Silverado%";
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2015-06-25 07:05:40,301 Stage-1 map = 0%, reduce = 0%
2015-06-25 07:06:18,907 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 9.6 sec
2015-06-25 07:06:20,406 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 27.65 sec
2015-06-25 07:06:21,894 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 27.65 sec
..........
Job 0: Map: 6 Cumulative CPU: 27.65 sec HDFS Read: 4760 HDFS Write: 1185 SUCCESS
Total MapReduce CPU Time Spent: 27 seconds 650 msec
OK
PHYLLIS MARTINEZ {"number":29978,"street":"Park Street","unit":7,"city":"Burlington","state":"NM","zip":28564}
{"type":"truck","make":"GM","model":"Silverado2500","class":"4WheelDrive","color":"black","value":31115.295,"tax":849.4475708007812,"paid":false}
SHANNON KELLEY {"number":98370,"street":"Maple Street","unit":9,"city":"Boston","state":"TX","zip":16166}
{"type":"truck","make":"GM","model":"Silverado2500","class":"FrontWheelDrive","color":"yellow","value":31115.209,"tax":849.4451904296875,"paid":true}
FRANCES HOLLOWAY {"number":2650,"street":"Second Avenue","unit":6,"city":"Arlington","state":"KY","zip":7182}
{"type":"truck","make":"GM","model":"Silverado2500","class":"FrontWheelDrive","color":"yellow","value":31115.31,"tax":849.447998046875,"paid":true}
..........
Time taken: 79.539 seconds, Fetched: 9 row(s)
— List the Name, Address, Model, Assessed Value and Registration Fee (Paid or Not) for Each Chrysler Primary Vehicle in 'rmvTable' —
hive> select firstname,lastname,address,vehicleinfo[0].model,vehicleinfo[0].value,vehicleinfo[0].tax,vehicleinfo[0].paid from rmvTable where vehicleinfo[0].make LIKE "%Chrysler%";
hive> select firstname,lastname,address,vehicleinfo[0].model,vehicleinfo[0].value,vehicleinfo[0].tax,vehicleinfo[0].paid from rmvTablePasswd where vehicleinfo[0].make LIKE "%Chrysler%";
hive> select firstname,lastname,address,vehicleinfo[0].model,vehicleinfo[0].value,vehicleinfo[0].tax,vehicleinfo[0].paid from rmvTableWallet where vehicleinfo[0].make LIKE "%Chrysler%";
Launching Job 1 out of 1
..........
Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
2015-06-25 07:19:55,601 Stage-1 map = 0%, reduce = 0%
2015-06-25 07:20:26,973 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 18.15 sec
2015-06-25 07:20:28,454 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 18.15 sec
..........
Job 0: Map: 6 Cumulative CPU: 18.15 sec HDFS Read: 4724 HDFS Write: 2164 SUCCESS
Total MapReduce CPU Time Spent: 18 seconds 150 msec
OK
ANNE ROSARIO {"number":96581,"street":"Third Avenue","unit":7,"city":"Springfield","state":"RI","zip":40719}
Ram3500 31115.26 849.4465942382812 true
MEGAN PHELPS {"number":12713,"street":"MAC Avenue","unit":4,"city":"Salem","state":"MS","zip":76554}
Ram1500 31115.309 849.4479370117188 true
BRIAN ROWLAND {"number":37868,"street":"First Street","unit":3,"city":"Salem","state":"GA","zip":98106}
Imperial 20744.156 566.3154907226562 true
..........
Time taken: 72.163 seconds, Fetched: 23 row(s)
Oracle Big Data SQL
(for the purposes of this document, referred to as Big Data SQL)
is a system consisting of properietary hardware and software; including an
Oracle Big Data Appliance,
Oracle Exadata Database,
custom software extended to support Big Data SQL functionality, and of course,
Oracle NoSQL Database
itself. The goal of the Big Data SQL product is to allow users to employ
SQL to manage and manipulate data stored in a number of different locations.
Specifically, Big Data SQL is designed to provide SQL access to
data stored in Hadoop HDFS, various NoSQL databases
— including Oracle NoSQL Database — as well as various
relational databases. Big Data SQL achieves this by presenting Hadoop HDFS,
Oracle NoSQL Database, and other NoSQL data sources as enhanced
Oracle External Tables
(external tables) of the Oracle Relational Database Management System
(RDBMS); mapping the external semantics of accessing data from those
sources — horizontal parallelism, location, and schema — to
Oracle RDBMS internals.
Because you may not have access to an Oracle Big Data SQL system, the
Oracle Big Data Lite Virtual Machine
(referred to in this document as the Big Data Lite VM), is provided to
allow you to evaluate and experiment with Big Data SQL without requiring
you to purchase, install, and manage the real system. Although detailed documentation
is provided when you purchase a Big Data SQL system, as a convenience, this
section walks you through an example that employs the Big Data Lite VM to
demonstrate how to use Big Data SQL to query data stored in a KVStore
— both secure and non-secure. After working with this example on the
Big Data Lite VM, you should be able to extend the concepts that are
demonstrated, and apply that knowledge should you ever have access to a real
Big Data SQL system.
Oracle External Tables.
To achieve this, a mechanism referred to as an access driver is employed
to access data as if it were a table in the Oracle relational database provided in
the system. Big Data SQL extends the access driver mechanism of external
tables by specifying new access driver types; one for each type of external data
source that will be accessed. Prior to the introduction of Big Data SQL,
the Oracle RDBMS external tables mechanism defined only two access driver types:
ORACLE_HDFS and ORACLE_HIVE access drivers
require the specification of a number of classes that satisfy the Hadoop MapReduce
programming model. Some of those classes are required by both access driver types,
whereas some are required by only the ORACLE_HIVE access driver. The
class types required by both ORACLE_HDFS and ORACLE_HIVE
are the same as those described in the
Hadoop/Table API Example;
whereas the class types required by ORACLE_HIVE but not by
ORACLE_HDFS are:
org.apache.hadoop.hive.ql.metadata.HiveStorageHandler; for example, the Oracle NoSQL Database
TableStorageHandler.
org.apache.hadoop.hive.serde2.SerDe; for example, the Oracle NoSQL Database
TableSerDe.
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
for example, instances of the Oracle NoSQL Database classes
described previously in this document.
ORACLE_HDFS access driver can only read data stored in
HDFS files, whereas the ORACLE_HIVE access driver can read data stored
not only in HDFS files, but in other locations as well; for example, a KVStore.
As you will see below, the integration of Oracle NoSQL Database with Hive
— as described above —
plays a prominent role in the integration of Oracle NoSQL Database with
Big Data SQL.
| Oracle NoSQL Database Table API | Hive | Oracle RDBMS |
|---|---|---|
| FieldDef.Type.STRING | STRING | VARCHAR2(N) |
| VARCHAR | ||
| CHAR | ||
| FieldDef.Type.BOOLEAN | BOOLEAN | VARCHAR2(5) |
| FieldDef.Type.BINARY | BINARY | VARCHAR2(N) |
| FieldDef.Type.FIXED_BINARY | BINARY | VARCHAR2(N) |
| TINYINT | ||
| SMALLINT | ||
| FieldDef.Type.INTEGER | INT | NUMBER |
| FieldDef.Type.LONG | BIGINT | NUMBER |
| FieldDef.Type.FLOAT | FLOAT | NUMBER |
| DECIMAL | ||
| FieldDef.Type.DOUBLE | DOUBLE | NUMBER |
| FieldDef.Type.ENUM | STRING | VARCHAR2(N) |
| TIMESTAMP | ||
| DATE | ||
| FieldDef.Type.ARRAY | ARRAY | VARCHAR2(N) |
| FieldDef.Type.MAP | MAP<STRING, data_type> | VARCHAR2(N) |
| FieldDef.Type.RECORD | STRUCT<col_name : data_type, ...> | VARCHAR2(N) |
| UNIONTYPE<data_type, data_type, ...> |
It is important to understand that when using Big Data SQL to query data in an Oracle NoSQL Database table, the schema of the Oracle external table you create is dependent on the schema of the corresponding Hive external table; which, in turn, is dependent on the schema of the Oracle NoSQL Database table you wish to query. Thus, if either type of external table is created using a schema that includes a data type that does not belong to one of the end-to-end mappings presented above, then an error will occur when any attempt is made to query the table.
Note that for fields in the Oracle external table specified as VARCHAR2(N),
the value of N is the maximum number of characters of the variable length
STRING that represents the specified field in the corresponding Hive and
Oracle NoSQL Database tables. Therefore, you should use the type, structure, and expected
length or size of the corresponding Hive and Oracle NoSQL Database fields to determine the
appropriate value to specify for N when creating the Oracle external table.
here. Note that the Big Data Lite VM must be deployed
to a virtualization product such as
Oracle VirtualBox.
After installing the necessary environment and deploying the Big Data Lite VM,
you should verify that the necessary services are running in that VM and start those
that are not running. You can do this by executing the services utility
from the command line; which will display an interactive user interface with information
like that shown below. That is,
> services
| [*] | ORCL | Oracle Database 12c (on) |
| [*] | Zookeeper | zookeeper (on) |
| [*] | HDFS | namenode (on) datanode (on) ... |
| [*] | Hive | metastore (on) hive-server2 (on) |
| [] | Hue | Hue (off) |
| [] | Impala | impala-server (off) |
| [] | NoSQL | Oracle NoSQL Database (off) |
| [] | Oozie | oozie (of) |
| [] | Solar | solar-server (off) |
| [] | SpatialAndGraph | web-server (off) |
| [] | Sqoop2 | sqoop2-server (off) |
| [] | WebLogic-MovieDemo | WebLogic-MovieDemo (off) |
| [*] | YARN | resourcemanager (on) nodemanager (on) ... |
The services selected (marked with asterisks) are the services that are necessary
when walking through the examples presented in the following sections. In an effort
to improve boot performance, the Big Data Lite VM will typically come out of the
box running only the Zookeeper, HDFS, and YARN
services. As a result, you will need to select the ORCL and Hive
services for auto-start. Additionally, because the examples presented here employ an
externally deployed KVStore, there is no need to select the Oracle NoSQL Dabase
services for auto-start.
The examples presented here assume that you have satisfied the
prerequisites listed
previously in this document; which means that a KVStore
— non-secure,
secure, or both —
has been deployed to separate, external nodes unrelated to the Big Data Lite VM;
where the nodes are network reachable from that VM. Although it is possible to select the
Oracle NoSQL Database service and deploy a KVStore
(with KVHOME=/u01/nosql/kv-ee) on the Big Data Lite VM, it is
important to understand that typical deployments of the
Oracle Big Data Appliance
do not run the Hadoop services (HDFS, YARN, and Hive)
on the same nodes used to run the Oracle NoSQL Database services. As a result,
the examples presented here are run in a manner intended to be consistent with the more
typical deployment scenarios.
After satisfying the necessary prerequisites,
creating and populating a
vehicleTable
and rmvTable in a KVStore,
installing and deploying the Big Data Lite VM and verifying
that the necessary services are running on that VM, you are then ready to walk through the
Big Data SQL examples presented in the following sections.
non-secure
or
secure
—
is deployed with the expected tables, you can follow the steps presented in the next section to
configure the Big Data Lite VM environment for executing either Hive or
Big Data SQL queries against the data in those tables. But before proceeding
to that section, you should first become familiar with the directory structure and base
configuration of the Big Data SQL system; specifically, the following
directories and files:
/etc/hive/conf.bigdatalite
hive-env.sh
hive-site.xml
/u01
/bigdatasql_config
bigdata.properties
bigdata-log4j.properties
/bigdatalite
core-site.xml
hdfs-site.xml
hive-env.sh
hive-site.xml
mapred-site.xml
/hive_aux_jars
apache-xmlbeans.jar -> /u01/connectors/oxh/hive/lib/apache-xmlbeans.jar
hive-hcatalog-core.jar -> /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar
orai18n-collation.jar -> /u01/connectors/oxh/hive/lib/orai18n-collation.jar
orai18n.jar -> /u01/connectors/oxh/hive/lib/orai18n.jar
orai18n-mapping.jar -> /u01/connectors/oxh/hive/lib/orai18n-mapping.jar
orai18n-utility.jar -> /u01/connectors/oxh/hive/lib/orai18n-utility.jar
oxh-hive.jar -> /u01/connectors/oxh/hive/lib/oxh-hive.jar
oxh-mapreduce.jar -> /u01/connectors/oxh/hive/lib/oxh-mapreduce.jar
oxquery.jar -> /u01/connectors/oxh/hive/lib/oxquery.jar
stax2-api-3.1.1.jar -> /u01/connectors/oxh/hive/lib/stax2-api-3.1.1.jar
woodstox-core-asl-4.2.0.jar -> /u01/connectors/oxh/hive/lib/woodstox-core-asl-4.2.0.jar
xmlparserv2_sans_jaxp_services.jar -> /u01/connectors/oxh/hive/lib/xmlparserv2_sans_jaxp_services.jar
xqjapi.jar -> /u01/connectors/oxh/hive/lib/xqjapi.jar
/log
....
/nosql
kv-3.4.3
kv-ee -> kv-3.4.3
/scripts
....
You should also note that, as described in the
Big Data Lite VM documentation, the username oracle with password
welcome1 must be used when logging into the Big Data Lite VM.
— Configuring the Big Data Lite VM for Hive and Big Data SQL Queries —
Depending on the particular version of the Big Data Lite VM (or the
Big Data SQL system) you are using, you may need to upgrade the version
of Oracle NoSQL Database that is installed on the system; as well as add
one or more JAR files to the /u01/bigdatasql_config/hive_aux_jars
directory. That is, /u01/nosql/kv-ee should be linked to a version
of Oracle NoSQL Database that is greater than or equal to
the 3.4.3 version; if not, then you should install a version that
satisfies that criterion.
With respect to the contents of /u01/bigdatasql_config/hive_aux_jars,
if any of the JAR files referenced below are not contained in that directory, then
execute the following at the command line as appropriate:
> cd /u01/bigdatasql_config/hive_aux_jars > ln -s /u01/nosql/kv-ee/lib/kvclient.jar kvclient.jar > ln -s /u01/nosql/kv-ee/lib/oraclepki.jar.jar oraclepki.jar.jar > ln -s /u01/nosql/kv-ee/lib/osdt_cert.jar osdt_cert.jar > ln -s /u01/nosql/kv-ee/lib/osdt_core.jar osdt_core.jarNext, to support running Hive (and ultimately Big Data SQL) queries against table data stored in a secure KVStore, the
appropriate security artifacts,
along with a
server side JAR file
containing the necessary public credentials must be installed on the Big Data Lite VM.
This can be done by either copying the artifacts already
generated
for the Hive example described above,
or by regenerating the necessary artifacts on the Big Data Lite VM itself.
Either way, create the directory /u01/nosql/example-user/security
and place the artifacts in that location; that is,
/u01/nosql/example-user/security
client.trust
hive-nosql.login
hive-nosql-server.jar
example-user.passwd
/example-user-wallet.dir
cwallet.sso
In addition to installing the above artifacts under /u01/nosql/example-user/securtiy,
you must also add the necessary JAR files to the HIVE_AUX_JARS_PATH environment
variable. To do this, first install the hive-nosql-server.jar file in
/u01/bigdatasql_config/hive_aux_jars; that is,
> cd /u01/bigdatasql_config/hive_aux_jars > ln -s /u01/nosql/example-user/security/hive-nosql-server.jar hive-nosql-server.jarThen add the necessary JAR files to the value of the
HIVE_AUX_JARS_PATH
variable set in hive-env.sh. The particular instance of the
hive-env.sh script that you should modify is dependent on the version
of the Big Data Lite VM that you have installed. Modify /etc/hive/conf.bigdatalite/hive-env.sh
if the version is 4.2 or greater; otherwise, modify
/u01/bigdatasql_config/bigdatalite/hive-env.sh. Thus,
after changing to the appropriate directory, modify hive-env.sh in the
following way,
> edit hive-env.sh
HIVE_AUX_JARS_PATH=/u01/bigdatasql_config/hive_aux_jars/apache-xmlbeans.jar,\
/u01/bigdatasql_config/hive_aux_jars/hive-hcatalog-core.jar,\
....
/u01/bigdatasql_config/hive_aux_jars/kvclient.jar,\
/u01/bigdatasql_config/hive_aux_jars/oraclepki.jar,\
/u01/bigdatasql_config/hive_aux_jars/osdt_cert.jar,\
/u01/bigdatasql_config/hive_aux_jars/osdt_core.jar,\
/u01/bigdatasql_config/hive_aux_jars/hive-nosql-server.jar,\
....
/u01/bigdatasql_config/hive_aux_jars/xqjapi.jar
Finally, add the necessary JAR files to the value of the java.classpath.hive
variable set in /u01/bigdatasql_config/bigdata.properties; that is,
> cd /u01/bigdatasql_config
> edit bigdata.properties
java.classpath.hive=/usr/lib/hive/lib/*:\
/u01/bigdatasql_config/hive_aux_jars/hive-catalog-core.jar:\
/u01/bigdatasql_config/hive_aux_jars/hive-hcatalog-core.jar:\
....
/u01/bigdatasql_config/hive_aux_jars/kvclient.jar:\
/u01/bigdatasql_config/hive_aux_jars/oraclepki.jar:\
/u01/bigdatasql_config/hive_aux_jars/osdt_cert.jar:\
/u01/bigdatasql_config/hive_aux_jars/osdt_core.jar:\
/u01/bigdatasql_config/hive_aux_jars/hive-nosql-server.jar
Future versions of the Big Data Lite VM
may provide versions of hive-env.sh and bigdata.properties
that already contain the necessary references to kvclient.jar,
oraclepki.jar, osdt_cert.jar, and osdt_core.jar
described above. If so, then all you need to add to hive-env.sh and bigdata.properties is the reference to
hive-nosql-server.jar; which is only necessary if you wish to run the examples
presented below against a secure KVStore.
At this point, you can follow the steps presented in the next section to query the data in the store's vehicleTable or rmvTable, using either Hive HQL or Big Data SQL.
— Using Hive and Big Data SQL to Query Data in a KVStore —
In order to query data in a KVStore from a Big Data SQL system (for example,
a Big Data Lite VM), you must first create and map a Hive external table to
the table defined in the store; whether the table is located in a
non-secure store
or a secure store.
That is, given the configuration described in the previous section, login to the
Big Data Lite VM and if the Oracle NoSQL Database vehicleTable
and rmvTable are located in a
non-secure store,
then enter the Hive CLI and type the following commands:
> hive
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
vehicleTable (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT)
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "vehicleTable",
"oracle.kv.hadoop.hosts" = "bigdatalite.localdomain");
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
rmvTable (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING,
SSN BIGINT,
GENDER STRING,
LICENSE BINARY,
PHONEINFO MAP<STRING, STRING>,
ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>,
VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "rmvTable",
"oracle.kv.hadoop.hosts" = "bigdatalite.localdomain");
On the other hand, if the store in which vehicleTable and rmvTable are located is a
secure store,
then enter the Hive CLI and type commands like these:
> hive
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
vehicleTablePasswd (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT)
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "vehicleTable",
"oracle.kv.hadoop.hosts" = "bigdatalite.localdomain",
"oracle.kv.security" = "/u01/nosql/example-user/security/hive-nosql.login",
"oracle.kv.ssl.trustStore" = "/u01/nosql/example-user/security/client.trust",
"oracle.kv.auth.username" = "example-user",
"oracle.kv.auth.pwdfile.file" = "/u01/nosql/example-user/security/example-user.passwd");
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
vehicleTableWallet (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT)
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "vehicleTable",
"oracle.kv.hadoop.hosts" = "bigdatalite.localdomain",
"oracle.kv.security" = "/u01/nosql/example-user/security/hive-nosql.login",
"oracle.kv.ssl.trustStore" = "/u01/nosql/example-user/security/client.trust",
"oracle.kv.auth.username" = "example-user",
"oracle.kv.auth.wallet.dir" = "/u01/nosql/example-user/security/example-user-wallet.dir");
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
rmvTablePasswd (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING,
SSN BIGINT,
GENDER STRING,
LICENSE BINARY,
PHONEINFO MAP<STRING, STRING>,
ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>,
VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "rmvTable",
"oracle.kv.hadoop.hosts" = "bigdatalite.localdomain",
"oracle.kv.security" = "/u01/nosql/example-user/security/hive-nosql.login",
"oracle.kv.ssl.trustStore" = "/u01/nosql/example-user/security/client.trust",
"oracle.kv.auth.username" = "example-user",
"oracle.kv.auth.pwdfile.file" = "/u01/nosql/example-user/security/example-user.passwd");
hive> CREATE EXTERNAL TABLE IF NOT EXISTS
rmvTableWallet (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING,
SSN BIGINT,
GENDER STRING,
LICENSE BINARY,
PHONEINFO MAP<STRING, STRING>,
ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>,
VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler'
TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
"oracle.kv.hosts" = "kv-host-1:5000",
"oracle.kv.tableName" = "rmvTable",
"oracle.kv.hadoop.hosts" = "bigdatalite.localdomain",
"oracle.kv.security" = "/u01/nosql/example-user/security/hive-nosql.login",
"oracle.kv.ssl.trustStore" = "/u01/nosql/example-user/security/client.trust",
"oracle.kv.auth.username" = "example-user",
"oracle.kv.auth.wallet.dir" = "/u01/nosql/example-user/security/example-user-wallet.dir");
At this point, you can execute Hive queries against the data in both the
vehicleTable
and the rmvTable.
Note that although the tables in the KVStore are named vehicleTable and rmvTable,
the Hive tables that are created above are named, respectively, vehicleTable and rmvTable
if the KVStore is not configured for security, vehicleTablePasswd and rmvTablePasswd if the
KVStore is secure and you store the password in a password file, and finally, vehicleTableWallet
and rmvTableWallet if the KVStore is secure and you store the password in an Oracle Wallet.
Also note that the value of the oracle.kv.hadoop.hosts property is
bigdatalite.localdomain. This is the hostname of the
Big Data Lite VM; thus, it is the name of the host on which the Hadoop
DataNode(s) execute.
After you create and map the external Hive tables to their corresponding KVStore
tables, to then use Big Data SQL to query the data in those KVStore tables,
you must apply the data model mapping
presented above,
along with the schemas for the
vehicleTable
and rmvTable, to
create and map the corresponding Oracle RDBMS external tables to each of the Hive tables that you created.
But before doing this, you must first create a user in the RDBMS, with the appropriate
privileges; as explained in the next section.
— Creating an Example User in the RDBMS —
As of version 4.2, the Big Data Lite VM now employs the
multitenant feature of the Oracle RDBMS. As a result, Big Data SQL
commands and queries must be executed by a user with the appropriate privileges. Thus,
before creating any tables or executing any Big Data SQL queries,
you should first create an example user by executing commands like the following:
> sqlplus sys/welcome1@orcl as sysdba
SQL> CREATE TABLESPACE NOSQL_EXAMPLE_TBLSPACE
DATAFILE '/u01/app/oracle/oradata/cdb/orcl/nosql_example_tblspace.dbf'
SIZE 500M AUTOEXTEND ON MAXSIZE UNLIMITED
NOLOGGING
DEFAULT COMPRESS
EXTENT MANAGEMENT
LOCAL AUTOALLOCATE
SEGMENT SPACE MANAGEMENT
AUTO;
SQL> CREATE USER NOSQL_EXAMPLE_USER
IDENTIFIED BY welcome1
DEFAULT TABLESPACE NOSQL_EXAMPLE_TBLSPACE
QUOTA UNLIMITED ON NOSQL_EXAMPLE_TBLSPACE;
SQL> GRANT CREATE ANY DIRECTORY TO NOSQL_EXAMPLE_USER;
SQL> GRANT DROP ANY DIRECTORY TO NOSQL_EXAMPLE_USER;
SQL> GRANT CREATE TABLE TO NOSQL_EXAMPLE_USER;
SQL> GRANT CREATE VIEW TO NOSQL_EXAMPLE_USER;
SQL> GRANT CREATE MINING MODEL TO NOSQL_EXAMPLE_USER;
SQL> GRANT CREATE SESSION TO NOSQL_EXAMPLE_USER;
SQL> GRANT CREATE PROCEDURE TO NOSQL_EXAMPLE_USER;
SQL> GRANT ADVISOR TO NOSQL_EXAMPLE_USER;
SQL> GRANT EXECUTE,READ,WRITE ON DIRECTORY
DEFAULT_DIR TO NOSQL_EXAMPLE_USER
WITH GRANT OPTION;
SQL> GRANT EXECUTE,READ,WRITE ON DIRECTORY
ORACLE_BIGDATA_CONFIG TO NOSQL_EXAMPLE_USER
WITH GRANT OPTION;
SQL> exit
At this point, you can login as the new user and execute the Big Data SQL
commands and queries presented in the next section to query the data in the
associated KVStore tables.
— Using Big Data SQL to Query Data in a KVStore —
As described in the previous sections, to use Big Data SQL to query the
data in the KVStore, you must create and map Oracle RDBMS external tables to each Hive
table you created for the corresponding KVStore table. To do this, login to the
Oracle RDBMS as the new user and execute the commands and queries shown below at the
sqlplus command prompt:
> sqlplus NOSQL_EXAMPLE_USER/welcome1@orcl
SQL> CREATE TABLE vehicleTable (type VARCHAR2(10), make VARCHAR2(12), model VARCHAR2(20),
class VARCHAR2(40), color VARCHAR2(20), price NUMBER(8,2),
count NUMBER)
ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS (com.oracle.bigdata.debug=true
com.oracle.bigdata.log.opt=normal)) REJECT LIMIT UNLIMITED;
SQL> CREATE TABLE vehicleTablePasswd (type VARCHAR2(10), make VARCHAR2(12), model VARCHAR2(20),
class VARCHAR2(40), color VARCHAR2(20), price NUMBER(8,2),
count NUMBER)
ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS (com.oracle.bigdata.debug=true
com.oracle.bigdata.log.opt=normal)) REJECT LIMIT UNLIMITED;
SQL> CREATE TABLE vehicleTableWallet (type VARCHAR2(10), make VARCHAR2(12), model VARCHAR2(20),
class VARCHAR2(40), color VARCHAR2(20), price NUMBER(8,2),
count NUMBER)
ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS (com.oracle.bigdata.debug=true
com.oracle.bigdata.log.opt=normal)) REJECT LIMIT UNLIMITED;
SQL> CREATE TABLE rmvTable (ZIPCODE VARCHAR2(7), LASTNAME VARCHAR2(20), FIRSTNAME VARCHAR2(20),
SSN NUMBER, GENDER VARCHAR2(6), LICENSE VARCHAR2(9), PHONEINFO VARCHAR2(67), ADDRESS VARCHAR2(100),
VEHICLEINFO VARCHAR2(1000))
ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS (com.oracle.bigdata.debug=true
com.oracle.bigdata.log.opt=normal)) REJECT LIMIT UNLIMITED;
SQL> CREATE TABLE rmvTablePasswd (ZIPCODE VARCHAR2(7), LASTNAME VARCHAR2(20), FIRSTNAME VARCHAR2(20),
SSN NUMBER, GENDER VARCHAR2(6), LICENSE VARCHAR2(9), PHONEINFO VARCHAR2(67), ADDRESS VARCHAR2(100),
VEHICLEINFO VARCHAR2(1000))
ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS (com.oracle.bigdata.debug=true
com.oracle.bigdata.log.opt=normal)) REJECT LIMIT UNLIMITED;
SQL> CREATE TABLE rmvTableWallet (ZIPCODE VARCHAR2(7), LASTNAME VARCHAR2(20), FIRSTNAME VARCHAR2(20),
SSN NUMBER, GENDER VARCHAR2(6), LICENSE VARCHAR2(9), PHONEINFO VARCHAR2(67), ADDRESS VARCHAR2(100),
VEHICLEINFO VARCHAR2(1000))
ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS (com.oracle.bigdata.debug=true
com.oracle.bigdata.log.opt=normal)) REJECT LIMIT UNLIMITED;
Note that, except for the name specified for the table, the commands related to
vehicleTable are identical. Similarly, the commands related to
rmvTable are also identical except for the table name. This
is how an Oracle external table is mapped to the desired Hive table. If you want
to create an Oracle external table with a name that is different than the Hive table
to which it is mapped, then you can specify the com.oracle.bigdata.tablename
property in the ACCESS PARAMETERS; for example,
SQL> CREATE TABLE bigdataSQL_vehicleTableWallet (type VARCHAR2(10), make VARCHAR2(12), model VARCHAR2(20),
class VARCHAR2(40), color VARCHAR2(20), price NUMBER(8,2),
count NUMBER)
ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS (com.oracle.bigdata.tablename=vehicleTableWallet
com.oracle.bigdata.debug=true
com.oracle.bigdata.log.opt=normal)) REJECT LIMIT UNLIMITED;
Note that if you do not employ the com.oracle.bigdata.tablename
property, then the name you specify for the Oracle external table
must be indentical to the name of the corresponding
Hive external table. Specifically, if you specify a name for the Oracle external
table different than the name of the corresponding Hive table, and you do
not employ the com.oracle.bigdata.tablename property to
specify the name of the desired Hive table, then an error will occur.
After creating the Oracle external tables above, you can execute SQL queries against the corresponding KVStore table data. For example, to query the data contained in the KVStore's vehicleTable, you can execute queries such as:
SQL> set linesize 200; SQL> select * from vehicleTable; SQL> select count(*) from vehicleTable; SQL> select min(price) from vehicleTable; SQL> select * from vehicleTable where make='GM'; SQL> select * from vehicleTable where model='Equinox'; SQL> select * from vehicleTable where model='Camaro'; SQL> select * from vehicleTable where model='Silverado1500'; SQL> select * from vehicleTablePasswd; SQL> select count(*) from vehicleTablePasswd; SQL> select min(price) from vehicleTablePasswd; SQL> select * from vehicleTablePasswd where make='GM'; SQL> select * from vehicleTablePasswd where model='Equinox'; SQL> select * from vehicleTablePasswd where model='Camaro'; SQL> select * from vehicleTablePasswd where model='Silverado1500'; SQL> select * from vehicleTableWallet; SQL> select min(price) from vehicleTableWallet; SQL> select * from vehicleTableWallet where make='GM'; SQL> select * from vehicleTableWallet where model='Equinox'; SQL> select * from vehicleTableWallet where model='Camaro'; SQL> select * from vehicleTableWallet where model='Silverado1500';Similarly, to query the data contained in the KVStore's rmvTable, you can execute simple queries such as these:
SQL> set linesize 200; SQL> select * from rmvTable; SQL> select * from rmvTablePasswd; SQL> select * from rmvTableWallet; SQL> select lastname,firstname,gender,address from rmvTable; SQL> select lastname,firstname,gender,address from rmvTablePasswd; SQL> select lastname,firstname,gender,address from rmvTableWallet; SQL> select min(ssn) from rmvTable; SQL> select min(ssn) from rmvTablePasswd; SQL> select min(ssn) from rmvTableWallet; SQL> select count(*) from rmvTable; SQL> select count(*) from rmvTablePasswd; SQL> select count(*) from rmvTableWallet; SQL> select firstname,lastname,phoneinfo from rmvTable; SQL> select firstname,lastname,phoneinfo from rmvTablePasswd; SQL> select firstname,lastname,phoneinfo from rmvTableWallet; SQL> select vehicleinfo from rmvTable; SQL> select vehicleinfo from rmvTablePasswd; SQL> select vehicleinfo from rmvTableWallet;To achieve the more complicated query functionality demonstrated by the correspondng Hive queries that dereferenced individual field values of the non-primitive data types, you can employ either
Oracle Regular Expression Functions
such as REGEXP_LIKE and REGEXP_SUBSTR, or
Oracle SQL JSON Operators
such as JSON_QUERY and JSON_EXISTS (or a combination).
For example, consider the following queries that use
Oracle Regular Expressions:
SQL> select firstname,lastname,address,vehicleinfo from rmvTable where REGEXP_LIKE (lastname, '^[H].*');
SQL> select firstname,lastname,address,vehicleinfo from rmvTablePasswd where REGEXP_LIKE (lastname, '^[H].*');
SQL> select firstname,lastname,address,vehicleinfo from rmvTableWallet where REGEXP_LIKE (lastname, '^[H].*');
SQL> select firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary make: GM"
from rmvTable where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"make":"GM"');
SQL> select firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary make: GM"
from rmvTablePasswd where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"make":"GM"');
SQL> select firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary make: GM"
from rmvTableWallet where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"make":"GM"');
SQL> select firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Equinox"
from rmvTable where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Equinox"');
SQL> select firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Equinox"
from rmvTablePasswd where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Equinox"');
SQL> select firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Equinox"
from rmvTableWallet where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Equinox"');
SQL> select firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Camaro"
from rmvTable where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Camaro"');
SQL> select firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Camaro"
from rmvTablePasswd where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Camaro"');
SQL> select firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Camaro"
from rmvTableWallet where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Camaro"');
SQL> select firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Silverado"
from rmvTable where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Silverado');
SQL> select firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Silverado"
from rmvTablePasswd where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Silverado');
SQL> select firstname,lastname,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Silverado"
from rmvTableWallet where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Silverado');
SQL> select firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary fee NOT paid"
from rmvTable where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"paid":false');
SQL> select firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary fee NOT paid"
from rmvTablePasswd where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"paid":false');
SQL> select firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary fee NOT paid"
from rmvTableWallet where REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"paid":false');
Finally, consider the following example queries that employ
Oracle SQL JSON Operators:
SQL> select FIRSTNAME, LASTNAME, j.ADDRESS.street, j.ADDRESS.city, j.ADDRESS.state, j.VEHICLEINFO.model from rmvTable j;
SQL> select FIRSTNAME, LASTNAME, j.ADDRESS.street, j.ADDRESS.city, j.ADDRESS.state, j.VEHICLEINFO.model from rmvTablePasswd j;
SQL> select FIRSTNAME, LASTNAME, j.ADDRESS.street, j.ADDRESS.city, j.ADDRESS.state, j.VEHICLEINFO.model from rmvTableWallet j;
SQL> select JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) from rmvTable;
SQL> select JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) from rmvTablePasswd;
SQL> select JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) from rmvTableWallet;
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: GM"
from rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%';
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: GM"
from rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%';
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: GM"
from rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%';
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Equinox"
from rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Equinox%';
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Equinox"
from rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Equinox%';
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Equinox"
from rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Equinox%';
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Camaro"
from rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Camaro%';
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Camaro"
from rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Camaro%';
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Camaro"
from rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Camaro%';
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Silverado"
from rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Silverado%';
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Silverado"
from rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Silverado%';
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Silverado"
from rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Silverado%';
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(
VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Model",
JSON_QUERY(VEHICLEINFO, '$[0].value' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Value",
JSON_QUERY(VEHICLEINFO, '$[0].tax' WITH CONDITIONAL WRAPPER) AS "Tax Owed",
JSON_QUERY(VEHICLEINFO, '$[0].paid' WITH CONDITIONAL WRAPPER) AS "Tax Paid"
from rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%';
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(
VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Model",
JSON_QUERY(VEHICLEINFO, '$[0].value' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Value",
JSON_QUERY(VEHICLEINFO, '$[0].tax' WITH CONDITIONAL WRAPPER) AS "Tax Owed",
JSON_QUERY(VEHICLEINFO, '$[0].paid' WITH CONDITIONAL WRAPPER) AS "Tax Paid"
from rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%';
SQL> select FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(
VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Model",
JSON_QUERY(VEHICLEINFO, '$[0].value' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Value",
JSON_QUERY(VEHICLEINFO, '$[0].tax' WITH CONDITIONAL WRAPPER) AS "Tax Owed",
JSON_QUERY(VEHICLEINFO, '$[0].paid' WITH CONDITIONAL WRAPPER) AS "Tax Paid"
from rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%';
SQL> select JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street"
from rmvTable WHERE JSON_EXISTS(ADDRESS, '$.street');
SQL> select JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street"
from rmvTablePasswd WHERE JSON_EXISTS(ADDRESS, '$.street');
SQL> select JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street"
from rmvTableWallet WHERE JSON_EXISTS(ADDRESS, '$.street');
SQL> select JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street"
from rmvTable WHERE JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) LIKE '%High Street%';
SQL> select JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street"
from rmvTablePasswd WHERE JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) LIKE '%High Street%';
SQL> select JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street"
from rmvTableWallet WHERE JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) LIKE '%High Street%';
Appendix A
of the Hadoop/Table API Example, or start a KVStore configured for security
using the steps presented in
Appendix B
of that document. Once a KVStore has been deployed, the standalone Java program
LoadRmvTable can then be run against either type of store to create
a table with the name and schema required by the various example queries, and
then populate the table with rows of data consistent with that schema. Once
the table is created and populated with example data, queries such as those
presented in Example 1,
Example 2, or the
section on Big Data SQL
can be executed against the data in that table.
Example 2,
of this document a table named rmvTable — having the schema
shown in the table below — must be created in the KVStore that was
deployed for that example; where the data types specified in the schema are
defined by the Oracle NoSQL Database Table API (see
oracle.kv.table.FieldDef.Type).
| Field Name | Field Type | |||
|---|---|---|---|---|
| zipcode | STRING | |||
| lastname | STRING | |||
| firstname | STRING | |||
| ssn | LONG | |||
| gender | ENUM | |||
| license | FIXED_BINARY(9) | |||
| phoneinfo | MAP(STRING) | |||
| address | RECORD(field_name : field_type) | |||
| address Record Schema | number : INTEGER | |||
| street : STRING | ||||
| unit : INTEGER | ||||
| city : STRING | ||||
| state : STRING | ||||
| zip : INTEGER | ||||
| vehicleinfo | ARRAY(RECORD(field_name : field_type)) | |||
| vehicleinfo Element Record Schema | type : STRING | |||
| make : STRING | ||||
| model : STRING | ||||
| class : STRING | ||||
| color : STRING | ||||
| value : FLOAT | ||||
| tax : DOUBLE | ||||
| paid : BOOLEAN | ||||
| Primary Key Field Names | ||||
| zipcode | lastname | firstname | ssn | |
| Shard Key Field Names | ||||
| zipcode | ||||
Thus, rmvTable will consist of rows of data the Registry of Motor Vehicles might maintain about vehicle owners who have registered a primary vehicle and (optionally) a second and maybe a third vehicle. In addition to personal information about each owner — such as name, address, gender, phone number(s), etc. — each row of data also contains an array in which each element of the array is a record whose contents consists of information about each vehicle the owner registers. For example, in addition to vehicle attributes such as the make, model, color, etc., the record will also contain the vehicle's assessed value, registration fee (the tax), and whether or not the owner has paid the fee. Although the table schema presented above may seem a bit contrived, it is intended to demonstrate a broad spectrum of data types from the Oracle NoSQL Database Table API.
Although you can enter individual commands in the admin CLI to create a table with the above
schema, an alternative mechanism is to follow the instructions presented in the next sections
to compile and execute the LoadRmvTable program; which will use the
Oracle NoSQL Database Data Definition Language (DDL) to create as well as
populate the desired table.
— Compiling the LoadRmvTable Program —
After the KVStore
— either
non-secure
or
secure
—
has been deployed, before executing LoadRmvTable, that program must first
be compiled. To do this, type the following command from the OS command line:
> cd /opt/ondb/kv > javac -classpath lib/kvstore.jar:examples examples/hadoop/hive/table/LoadRmvTable.javawhich should produce the file:
/opt/ondb/kv/examples/hadoop/hive/table/LoadRmvTable.class
— Creating and Populating 'rmvTable' with Example Data in a Non-Secure KVStore —
To execute LoadRmvTable to create and then populate the
table named rmvTable with example data in a KVStore configured
for non-secure access, type the following at the command line of a
node that has network connectivity with a node running the admin service
(for example, kv-host-1 itself):
> cd /opt/ondb/kv
> java -classpath lib/kvstore.jar:examples hadoop.table.LoadRmvTable \
-store example-store -host kv-host-1 -port 5000 -nops 79 [-delete]
where the parameters -store, -host, -port,
and -nops are required.
In the example command line above, the argument -nops 79 specifies
that 79 rows be written to the rmvTable. If more or less
than that number of rows is desired, then the value of the -nops
parameter should be changed.
If LoadRmvTable is executed a second time and the
optional -delete parameter is specified, then all rows added by
any previous executions of LoadRmvTable are deleted from the
table prior to adding the new rows. Otherwise, all pre-existing rows are left
in place, and the number of rows in the table will be increased by the specified
-nops number of new rows.
— Creating and Populating 'rmvTable' with Example Data in a Secure KVStore —
To execute LoadRmvTable against a secure KVStore deployed
and provisioned with a non-administrative user employing the steps presented in
Appendix B
of the Hadoop/Table API Example, an additonal parameter must be added to
the command line above. That is, type the following:
> cd /opt/ondb/kv
> javac -classpath lib/kvclient.jar:LoadRmvTable examples/hadoop/hive/table/LoadRmvTable.java
> cp /opt/ondb/example-store/security/client.trust /tmp
> java -classpath lib/kvstore.jar:examples hadoop.table.LoadRmvTable \
-store example-store -host kv-host-1 -port 5000 -nops 79 \
-security /tmp/example-user-client-pwdfile.login
[-delete]
where the role of the single additonal -security parameter is identical
to the role of that parameter when executing the LoadVehicleTable program;
as explained in
Appendix B
of the Hadoop/Table API Example.
At this point, the rmvTable created in the specified KVStore (non-secure or secure) should be populated with the desired example data; which can then be queried via Hive HQL or Big Data SQL.
Table API Hive Integration classes
available to the Hive client's Java VM, as well as the Java VMs of the
DataNodes of the Hadoop Cluster, kvclient.jar must be added
to the classpaths of those respective VMs. To do this, kvclient.jar
must be added to the Hive client's HIVE_AUX_JARS_PATH environment
variable. One way to achieve this is to simply export a new value for
HIVE_AUX_JARS_PATH from the Hive client's command line; that is,
> export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,/opt/ondb/kv/lib/kvclient.jarThe other, more permanent (and preferable), way to achieve this is to modify the
hive-env.sh script located in the HIVE_CONF_DIR
of the Hive client. For example,
> cd $HIVE_CONF_DIR
> edit hive-env.sh
if [ -z "$HIVE_AUX_JARS_PATH" ]; then
export HIVE_AUX_JARS_PATH=/opt/ondb/kv/lib/kvclient.jar
else
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,/opt/ondb/kv/lib/kvclient.jar
fi
It is important to note that in both cases above, the separator that is used
is a comma, not a colon.
Appendix C
of the Hadoop/Table API Example; which describes the necessary artifacts in
detail — their purpose, as well as how to generate and install them.
— Generating the Login, Trust, and Password Artifacts —
To execute a Hive query against a secure KVStore, the necessary public and private
credentials must be incorporated in the definition of the Hive table that will be
queried. To do this, in a fashion similar to that presented in
Appendix C
of the Hadoop/Table API Example, you must create artifacts like those shown
below, and store them on the Hive client's local file system; for example,
/tmp
client.trust
hive-nosql.login
example-user.passwd
Or, if wallet storage will be employed:
/tmp
client.trust
hive-nosql.login
/example-user-wallet.dir
cwallet.sso
where hive-nosql.login is analogous (actually, identical)
to example-user-server.login in
Appendix C.
— Generating the Server Side JAR File —
After creating the files above, the following server side JAR file
should be generated so that it can be added to the HIVE_AUX_JARS_PATH
environment variable (see below). That is, type the following at the command line:
> cd /tmp > jar cvf /opt/ondb/kv/examples/hive-nosql-server.jar client.trust > jar uvf /opt/ondb/kv/examples/hive-nosql-server.jar hive-nosql.loginwhich produces the JAR file named
hive-nosql-server.jar, with contents
that include only public credentials and which look something like:
0 Mon May 04 13:01:04 PDT 2015 META-INF/
68 Mon May 04 13:01:04 PDT 2015 META-INF/MANIFEST.MF
508 Wed Apr 22 12:23:32 PDT 2015 client.trust
255 Mon May 04 11:30:54 PDT 2015 hive-nosql.login
— Setting the HIVE_AUX_JARS_PATH Environment Variable —
Recall that Appendix B
of this document described how to add kvclient.jar to the Hive client's
HIVE_AUX_JARS_PATH environment variable in order to run a Hive query
against a non-secure KVStore. In order to run a Hive query against a
secure KVStore, in addition to adding kvclient.jar, a number
of other JAR files must also be added to HIVE_AUX_JARS_PATH; depending
on how the password is stored.
If the password is stored in a password file, then in addition to kvclient.jar,
the hive-nosql-server.jar file must also be added to HIVE_AUX_JARS_PATH;
that is, do one of the following:
> export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/examples/hive-nosql-server.jaror
> cd HIVE_CONF_DIR
> edit hive-env.sh
if [ -z "$HIVE_AUX_JARS_PATH" ]; then
export HIVE_AUX_JARS_PATH=/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/examples/hive-nosql-server.jar
else
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/examples/hive-nosql-server.jar
fi
On the other hand, if the password is stored in an Oracle Wallet, then you would
also add three wallet related JAR files that are only provided with the Enterprise
Edition of Oracle NoSQL Database. That is, you would do one of the following:
> export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH, \
/opt/ondb/kv/lib/kvclient.jar, \
/opt/ondb/kv/lib/oraclepki.jar, \
/opt/ondb/kv/lib/osdt_cert.jar, \
/opt/ondb/kv/lib/osdt_core.jar, \
/opt/ondb/kv/examples/hive-nosql-server.jar
or
> cd HIVE_CONF_DIR
> edit hive-env.sh
if [ -z "$HIVE_AUX_JARS_PATH" ]; then
export HIVE_AUX_JARS_PATH=/opt/ondb/kv/lib/kvclient.jar, \
/opt/ondb/kv/lib/oraclepki.jar, \
/opt/ondb/kv/lib/osdt_cert.jar, \
/opt/ondb/kv/lib/osdt_core.jar, \
/opt/ondb/kv/examples/hive-nosql-server.jar
else
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH, \
/opt/ondb/kv/lib/kvclient.jar, \
/opt/ondb/kv/lib/oraclepki.jar, \
/opt/ondb/kv/lib/osdt_cert.jar, \
/opt/ondb/kv/lib/osdt_core.jar, \
/opt/ondb/kv/examples/hive-nosql-server.jar
fi
Once Hive has been configured for Oracle NoSQL Database security in the manner
just presented, you can then incorporate the necessary artifacts in your Hive
external table definition in the fashion presented previously in this document;
for vehicleTable in
Example 1,
and for rmvTable in
Example 2.
Note that specifying the security related artifacts in the manner presented
in those examples is akin to specifying that information on the command line
(rather than via classpath resources) when initiating a MapReduce job.Copyright (c) 2011, 2015 Oracle and/or its affiliates. All rights reserved.