hadoop.hive.table (Oracle NoSQL Database Examples)

Class Summary
Class Description

LoadRmvTable
Class that creates an example table in a given NoSQL Database store and then uses the Table API to populate the table with sample records.

Class Summary
Class	Description
LoadRmvTable	Class that creates an example table in a given NoSQL Database store and then uses the Table API to populate the table with sample records.

Package hadoop.hive.table Description

The Table API Hive Cookbook: documentation that describes how to run example Apache Hive or Oracle Big Data SQL queries against data written via the Oracle NoSQL Database Table API.

Introduction

With the introduction of the Oracle NoSQL Database Hadoop integration classes, which support running Hadoop MapReduce jobs against data stored in an Oracle NoSQL Database table, it was natural to also provide new interfaces and classes which support running Hive queries against such table data (since a typical Hive query generally results in the execution of a MapReduce job). In addition to describing the core interfaces and classes involved in running a Hive query against data from a table located in a given Oracle NoSQL Database store (a KVStore), the information presented below also walks through the steps to take to execute a given set of basic Hive queries against example table data contained in such a store; either secure or non-secure.

Prerequisites

Before attempting to execute the example that demonstrates the concepts presented in this document, you should first satisfy the following prerequisites:

Become familiar with both Apache Hive and its programming model; that is, become familiar with how to write and execute a Hive query.
Become familiar with Apache Hadoop; specifically, become familiar with how Hive and Hadoop interact.
Deploy a Hadoop cluster with 3 DataNodes running on machines with host names, dn-host-1, dn-host-2, and dn-host-3.
Install and configure Hive, with network access to the Hadoop cluster that was deployed; specifying either Derby (the default) or MySQL to store the Hive metadata.
Become familiar with the Hive Command Line Interface (the Hive CLI), and the Hive Query Language.
Become familiar with Oracle NoSQL Database and then install, start, and configure an Oracle NoSQL Database — with <KVHOME> equal to /opt/ondb/kv — that is network reachable from the nodes of the Hadoop cluster.
Deploy a KVStore — named example-store, with <KVROOT> equal to /opt/ondb/example-store — to 3 machines (real or virtual) with host names, kv-host-1, kv-host-2, and kv-host-3; where an admin service, listening on port 5000, is deployed on each host.
Become familiar with the Oracle NoSQL Database Security model and be able to configure the deployed store for secure access (optional).
If the deployed store is configured for secure access, become familiar with the Oracle NoSQL Database Administrative CLI, start it, and then follow the steps presented in Appendix B of the Hadoop/Table API Example to securely connect to the store's admin service and create a KVStore user named example-user; along with the appropriate security artifacts (login file, trust file, and either password file or Oracle Wallet [Enterprise Edition only]).
Obtain and install the separate distribution containing the Oracle NoSQL Database example code. Although you are free to install that package in any location on your system, to make the command lines presented here more readable, this document assumes the example code is installed under the directory /opt/apps.
Become familiar with the supporting Java classes from the Hadoop/Table API Example, and then follow the steps presented in that example to create and populate a table named vehicleTable with example data consisting of only primitive data types.
Become familiar with the LoadRmvTable program provided with this example, and then follow the steps presented in the sections of Appendix A of this document to create and populate a table named rmvTable with example data consisting of both primitive and non-primitive data types.

Using specific values for items such as the <KVHOME> and <KVROOT> environment variables, as well as the store name, host names, admin port, and example code location described above should allow you to more easily understand and use the example commands presented below. Combined with the information contained in the Oracle NoSQL Database Getting Started Guide, along with the Oracle NoSQL Database Admin Guide, the Oracle NoSQL Database Security Guide and the Hadoop/Table API Example, you should then be able to generalize and extend these examples to your own particular development scenario; substituting the values specific to the given environment where necessary.

Note that detailed instructions for deploying a non-secure KVStore are provided in Appendix A of the Hadoop/Table API Example. Similarly, Appendix B of the Hadoop/Table API Example provides instructions for deploying a KVStore configured for security.

A Brief Hive Primer

Paraphrasing wikipedia, Apache Hive is a data warehouse infrastructure built on top of Apache Hadoop that facilitates querying datasets residing in distributed file systems or data stores such as Hadoop HDFS or Amazon S3. Additionally, Hive also provides a pluggable programming model that allows you to specify custom interfaces and classes that support querying data residing in data sources other than HDFS and S3; in particular, data written to an Oracle NoSQL Database table. To access and analyze data stored in these data sources, Hive provides a mechanism to project structure onto the data and query the data using a SQL-like language called the Hive Query Language, or HQL. Depending on the complexity of a given Hive query, the Hive infrastructure may construct and deploy a set of MapReduce jobs to retrieve and process the data or, when possible, it may simply satisfy the query via the metadata stored in the Hive metastore (Derby or MySQL).

In addition to the Hive infrastructure itself, Hive also provides a convenient client-side command line interface (the Hive CLI); which allows you to interact with the Hive infrastructure to create a Hive external table and then map it to the data located in a source like those described above.

As indicated above, a set of interfaces and classes that satisfy the Hive programming model are provided by Oracle NoSQL Database which support running Hive queries against table data contained in a KVStore — either secure or non-secure. These classes are located in the oracle.kv.hadoop.hive.table package, and consist of the following Hive and Hadoop types:

A subclass of the Hive class, org.apache.hadoop.hive.ql.metadata.HiveStorageHandler, which is the mechanism (pluggable interface) to use to specify the location of the data the Hive infrastructure should process, as well as how to process that data; and which consists of the following components:
- A subclass of the Hadoop MapReduce version 1 class, org.apache.hadoop.mapred.InputFormat, which specifies how the associated MapReduce job reads its input data (from an Oracle NoSQL Database table).
- A subclass of the Hadoop MapReduce version 1 class, org.apache.hadoop.mapred.OutputFormat, which specifies how the associated MapReduce job writes its output.
- A subclass of the Hive class, org.apache.hadoop.hive.serde2.SerDe, which is used to deserialize the table data that is retrieved and sent to the Hive infrastructure and/or Hadoop MapReduce job for processing, and to serialize data input to Hive that will be written to the table (currently not supported).
- Metadata hooks for keeping an external catalog in sync with the Hive metastore.
- Rules for setting up the configuration properties on MapReduce jobs run against the data being processed.
- An implementation of the org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler interface; which, as described in Appendix D, supports the decomposition of a query's WHERE clause (the predicates) into information that can be passed to the database so that some/all search processing can be performed in the database itself rather than on the client side.
A subclass of the Hadoop version 1 class, org.apache.hadoop.mapred.RecordReader, which specifies how the mapped keys and values are located and retrieved during any MapReduce processing performed as part of executing a Hive query.
A subclass of the Hadoop version 1 class, org.apache.hadoop.mapred.InputSplit, which represents the data to be processed by an individual Mapper that operates during the MapReduce processing performed as part of executing a Hive query.

As described in the sections below, it is through the specific implementation of the HiveStorageHandler provided in the Oracle NoSQL Database distribution that the Hive infrastructure obtains access to a given KVStore and the table data on which to run the desired Hive query.

The Oracle NoSQL Database Table API Hive Integration Classes

To support running Hive queries against data stored in a table of an Oracle NoSQL Database KVStore, the following core classes are employed:

oracle.kv.hadoop.hive.table.TableStorageHandler
oracle.kv.hadoop.hive.table.TableHiveInputFormat
oracle.kv.hadoop.hive.table.TableHiveInputSplit
oracle.kv.hadoop.hive.table.TableHiveRecordReader
oracle.kv.hadoop.hive.table.TableSerDe
Implementations — specific to Oracle NoSQL Database — of the Hive org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector interface that support deserialization of the primitive and non-primitive data types defined by the Oracle NoSQL Database Table API (see below).

For more detail about the semantics of the classes listed above, refer to the javadoc of each respective class.

Mapping the Hive Data Model to the Oracle NoSQL Database Table API Data Model

As the examples below demonstrate, in order to execute a Hive query against data stored in an Oracle NoSQL Database table, a Hive external table must be created with a schema mapped from the schema of the desired Oracle NoSQL Database table. This is accomplished by applying the mapping described here, in which the following implementations of the Hive ObjectInspector interface are used in the deserialization process to convert the associated data type defined by the Oracle NoSQL Database Table API to its corresponding type in the Hive data model:

Note that Hive's data model specifies many more types than the types defined by the Oracle NoSQL Database Table API. As a result, the data model defined by the Oracle NoSQL Database Table API (see oracle.kv.table.FieldDef.Type) is mapped to only a subset of the types defined by Hive's data model. Specifically, when creating a Hive external table so that you can query the data in a given Oracle NoSQL Database table, the Hive table must be created with a schema consistent with the mappings shown in the following table:

Oracle NoSQL Database-to-Hive Data Type Mappings
Oracle NoSQL Database Table API	Hive
FieldDef.Type.STRING	STRING
	VARCHAR
	CHAR
FieldDef.Type.BOOLEAN	BOOLEAN
FieldDef.Type.BINARY	BINARY
FieldDef.Type.FIXED_BINARY	BINARY
	TINYINT
	SMALLINT
FieldDef.Type.INTEGER	INT
FieldDef.Type.LONG	BIGINT
FieldDef.Type.FLOAT	FLOAT
	DECIMAL
FieldDef.Type.DOUBLE	DOUBLE
FieldDef.Type.ENUM	STRING
	TIMESTAMP
	DATE
FieldDef.Type.ARRAY	ARRAY
FieldDef.Type.MAP	MAP<STRING, data_type>
FieldDef.Type.RECORD	STRUCT<col_name : data_type, ...>
	UNIONTYPE<data_type, data_type, ...>

It is important to understand that when using Hive to query data in an Oracle NoSQL Database table, the schema of the Hive external table you create is dependent on the schema of the corresponding Oracle NoSQL Database table you wish to query. Thus, if you create a Hive external table with a schema that includes a Hive data type that is not mapped from an Oracle NoSQL Database FieldDef.Type, then an error will occur when any attempt is made to query the table.

— YARN versus MapReduce version 1 —

Currently, Hadoop deployments include two versions of MapReduce. The first version (referred to as MRv1) is the original version of MapReduce; and consists of interfaces and classes from the Java package org.apache.hadoop.mapred. The newer version of MapReduce is referred to as YARN (Yet Another Resource Negotiator) or, more generally, MRv2; and resides in the package org.apache.hadoop.mapreduce. Unfortunately, the Table API Hive integration classes must address the existence of both versions of MapReduce, for the following reasons:

Hive currently employs MRv1, whereas the Oracle NoSQL Database Table API Hadoop integration classes employ MRv2.
MRv1 and MRv2 are source incompatible.

As a result, the core classes listed above must be subclasses of the previously described MRv1 classes. For example, rather than subclassing the MRv2 based Table API Hadoop integration class TableInputFormat (which would be preferred), because of the incompatability between MRv1 and MRv2, the Table API Hive integration class described above (TableHiveInputFormat) actually subclasses the Hadoop MRv1 class org.apache.hadoop.mapred.InputFormat. Thus, to exploit and reuse the mechanisms provided by the Table API Hadoop integration classes, the Table API Hive integration classes presented here internally create, manage, and delegate to subclasses of the appropriate MRv2 based classes.

It is also important to note that because the Table API Hadoop integration classes do not currently support writing data from a MapReduce job into a KVStore, the Table API Hive integration classes do not support queries that modify the contents of a table in a KVStore.

Example 1: Executing Hive Queries Against Primitive Data Types

This first example demonstrates how to execute various Hive queries against a simple Oracle NoSQL Database table; defined with a schema consisting of primitive data types only. To run the queries described in this example, after satisfying the prerequisites listed above, if not done so already, you should deploy either a non-secure or a secure KVStore (or both), and then create and populate a table named vehicleTable in each such store. If any of the stores you deploy are configured for secure access, then you must also follow the steps provided in Appendix C of this document to configure Hive with the artifacts and environment necessary to interact with a secure KVStore. Once these initial steps are performed, you can then create a Hive external table mapped to the KVStore vehicleTable, and execute the set of example Hive queries (described below) against the data stored in that table.

Create a Hive External Table Mapped to the Oracle NoSQL Database 'vehicleTable'

After satisfying the above prerequisites related to the Hive programming model and its CLI, you can create a Hive external table mapped to the Oracle NoSQL Database 'vehicleTable' located in the desired KVStore, and then query the data in that table. To do this, first execute the Hive interactive CLI by typing the following at the command line of the Hive client node:

  > hive

which should then present the following Hive CLI command prompt:

  hive>

At this point, the desired Hive commands (table creation and/or queries) can be executed.

— Creating a Hive External Table Mapped to 'vehicleTable' in a Non-Secure KVStore —

If you will be running your Hive query against a non-secure KVStore deployed in the manner described in Appendix A of the Hadoop/Table API Example, after adding kvclient.jar and threetenbp.jar to the HIVE_AUX_JARS_PATH environment variable of the Hive client (as descibed in Appendix B of this document), you can create the external Hive table required by this example by typing the following at the Hive CLI command prompt:

  hive> CREATE EXTERNAL TABLE IF NOT EXISTS 
          vehicleTable (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT) 
          STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' 
          TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
                         "oracle.kv.hosts" = "kv-host-1:5000",
                         "oracle.kv.tableName" = "vehicleTable",
                         "oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3");

The command above applies the required data model mapping to create a Hive table named vehicleTable with columns whose types are consistent with the corresponding fields of the Oracle NoSQL Database table created and populated in the Hadoop/Table API Example; which was also named vehicleTable. Additionally, you should note the following points:

The contents of the example command above are displayed on separate lines in this document for readability, but in practice, because the Hive command interpreter may have trouble handling such multi-line commands, it is generally best to enter a single, continuous command with no line breaks.

The Hive table name is not required to be the same as the name of the table created in Oracle NoSQL Database; they are the same in the above example only for convenience.

If the KVStore is configured with multiple administrative hosts, then any subset of the names of those hosts can be included in the value of the "oracle.kv.hosts" property; as long as at least one valid administrative host (and port) is included.

With respect to the "oracle.kv.hadoop.hosts" property in the example command above:

This property is currently optional for all systems except the Oracle Big Data SQL system.
If this property is specified on a system that does not require it, it will have no effect.
If it is specified on a system that does require it, its value must contain the names of all of the DataNodes making up the Hadoop cluster.

— Creating a Hive External Table Mapped to 'vehicleTable' in a Secure KVStore —

If you will be running your Hive query against a secure KVStore deployed in the manner described in Appendix B of the Hadoop/Table API Example, and if your password storage employs a password file instead of an Oracle Wallet, then you can create the external Hive table required by this example by typing the following at the Hive CLI command prompt:

  hive> CREATE EXTERNAL TABLE IF NOT EXISTS 
          vehicleTablePasswd (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT) 
          STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' 
          TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
                         "oracle.kv.hosts" = "kv-host-1:5000",
                         "oracle.kv.tableName" = "vehicleTable",
                         "oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3",
                         "oracle.kv.security" = "/tmp/hive-nosql.login", 
                         "oracle.kv.ssl.trustStore" = "/tmp/client.trust", 
                         "oracle.kv.auth.username" = "example-user", 
                         "oracle.kv.auth.pwdfile.file" = "/tmp/example-user.passwd");

On the other hand, if your password storage employs an Oracle Wallet instead of a password file, then type the following at the command prompt to create the required Hive table,

  hive> CREATE EXTERNAL TABLE IF NOT EXISTS 
          vehicleTableWallet (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT) 
          STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' 
          TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
                         "oracle.kv.hosts" = "kv-host-1:5000",
                         "oracle.kv.tableName" = "vehicleTable",
                         "oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3",
                         "oracle.kv.security" = "/tmp/hive-nosql.login", 
                         "oracle.kv.ssl.trustStore" = "/tmp/client.trust", 
                         "oracle.kv.auth.username" = "example-user", 
                         "oracle.kv.auth.wallet.dir" = "/tmp/example-user-wallet.dir");

where the commands above each apply the required data model mapping to create a Hive table named vehicleTablePasswd and vehicleTableWallet respectively; each with the same structure, schema, and attributes as the vehicleTable created in the non-secure case. And where the same additional points noted for the non-secure case also apply in this case.

Note the additional properties specified in the TBLPROPERTIES directive; that is, oracle.kv.security, oracle.kv.ssl.trustStore, oracle.kv.auth.username, and oracle.kv.auth.pwdfile.file (or oracle.kv.auth.wallet.dir). Each of these additional properties corresponds to one of the artifacts required to access a secure KVStore; where the nature of each property and its corresponding value is described in detail in Appendix C of the Hadoop/Table API Example.

Execute Example Hive Queries Against the Oracle NoSQL Database 'vehicleTable'

After creating and mapping the Hive tables described above to the Oracle NoSQL Database 'vehicleTable', the data in that table can be queried in the manner shown below by querying each Hive table. Note that because three Hive tables were created (vehicleTable, vehicleTablePasswd, and vehicleTableWallet), where the only difference is whether the KVStore is non-secure or secure and, if secure, whether the password is stored in a password file or a wallet, for each type of query that is demonstrated, three instances of the given example query are shown; one for each of the three Hive tables. Additionally, note that for any given query, the output of the query will be similar whether it is executed against the Hive vehicleTable, vehicleTablePasswd, or vehicleTableWallet. Therefore, to save space, for each example query that is presented for the three Hive tables, sample output is shown only once.

Thus, from the Hive command prompt, type the following queries for each Hive table and observe that the output of each query you execute, if successful, looks something like the corresponding sample output shown.

— List the Contents of Each Row in 'vehicleTable' —

  hive> SELECT * FROM vehicleTable;
  hive> SELECT * FROM vehicleTablePasswd;
  hive> SELECT * FROM vehicleTableWallet;

  OK
  auto  Chrysler Imperial FrontWheelDrive white  20743.943359375 5
  auto  GM       Impala   4WheelDrive     black  20743.91015625  46
  auto  GM       Impala   FrontWheelDrive yellow 20743.5390625   28
  truck Ford     F250     AllWheelDrive   blue   31115.759765625 47
  ..........

Note that unlike the more complicated example queries below, the query above does not result in the execution of a MapReduce job. This is because there is enough metadata in the Hive metastore to satisfy the query.

— Count the Total Number of Rows in 'vehicleTable' —

  hive> SELECT count(type) FROM vehicleTable;
  hive> SELECT count(type) FROM vehicleTablePasswd;
  hive> SELECT count(type) FROM vehicleTableWallet;

  Launching Job 1 out of 1
  ..........
  Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 1
  2014-12-12 12:04:18,403 Stage-1 map = 0%,  reduce = 0%
  2014-12-12 12:05:12,431 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 2.26 sec
  2014-12-12 12:05:13,816 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 6.7 sec
  2014-12-12 12:05:15,201 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 6.87 sec
  2014-12-12 12:05:16,594 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 14.16 sec
  2014-12-12 12:05:17,980 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 14.16 sec
  2014-12-12 12:05:19,364 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 14.16 sec
  2014-12-12 12:05:20,754 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 15.24 sec
  2014-12-12 12:05:22,140 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 15.24 sec
  ..........
  Job 0: Map: 6  Reduce: 1   Cumulative CPU: 15.24 sec   HDFS Read: 4532 HDFS Write: 3 SUCCESS
  Total MapReduce CPU Time Spent: 15 seconds 240 msec
  OK
  79
  Time taken: 89.359 seconds, Fetched: 1 row(s)

— For All Vehicles in 'vehicleTable' Find the Lowest Price —

  hive> SELECT min(price) FROM vehicleTable;
  hive> SELECT min(price) FROM vehicleTablePasswd;
  hive> SELECT min(price) FROM vehicleTableWallet;

  Launching Job 1 out of 1
  ..........
  Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 1
  2014-12-12 12:11:10,924 Stage-1 map = 0%,  reduce = 0%
  2014-12-12 12:12:06,213 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 6.77 sec
  2014-12-12 12:12:07,606 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 6.77 sec
  2014-12-12 12:12:09,076 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 14.16 sec
  2014-12-12 12:12:10,464 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 14.3 sec
  2014-12-12 12:12:11,849 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 14.3 sec
  2014-12-12 12:12:13,238 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 14.3 sec
  2014-12-12 12:12:14,629 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 15.38 sec
  2014-12-12 12:12:16,031 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 15.38 sec
  ..........
  Job 0: Map: 6  Reduce: 1   Cumulative CPU: 15.38 sec   HDFS Read: 4532 HDFS Write: 16 SUCCESS
  Total MapReduce CPU Time Spent: 15 seconds 380 msec
  OK
  20743.244140625
  Time taken: 89.615 seconds, Fetched: 1 row(s)

— List All GM Vehicles in 'vehicleTable' —

  hive> SELECT * FROM vehicleTable WHERE make LIKE "%GM%"; 
  hive> SELECT * FROM vehicleTablePasswd WHERE make LIKE "%GM%"; 
  hive> SELECT * FROM vehicleTableWallet WHERE make LIKE "%GM%"; 

  Launching Job 1 out of 1
  ..........
  Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
  2014-12-12 12:19:24,269 Stage-1 map = 0%,  reduce = 0%
  2014-12-12 12:20:18,239 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 2.43 sec
  2014-12-12 12:20:19,622 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 4.81 sec
  2014-12-12 12:20:21,006 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 7.14 sec
  2014-12-12 12:20:22,395 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 13.09 sec
  2014-12-12 12:20:23,777 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 16.06 sec
  2014-12-12 12:20:25,162 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 16.06 sec
  ..........
  Job 0: Map: 6   Cumulative CPU: 16.06 sec   HDFS Read: 4532 HDFS Write: 1491 SUCCESS
  Total MapReduce CPU Time Spent: 16 seconds 60 msec
  OK
  suv   GM Equinox  4WheelDrive      yellow 41486.78125     37
  truck GM Sierra   4WheelDrive      black  31115.224609375 87
  auto  GM Corvette FrontWheelDrive  yellow 20743.84375     7
  auto  GM Impala   4WheelDrive      black  20743.91015625  46
  ..........

— List All Silverado (1500 & 2500) Trucks in 'vehicleTable' —

  hive> SELECT * FROM vehicleTable WHERE model LIKE "%Silverado%"; 
  hive> SELECT * FROM vehicleTablePasswd WHERE model LIKE "%Silverado%"; 
  hive> SELECT * FROM vehicleTableWallet WHERE model LIKE "%Silverado%"; 

  Launching Job 1 out of 1
  ..........
  Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
  2014-12-12 13:36:24,284 Stage-1 map = 0%,  reduce = 0%
  2014-12-12 13:37:19,528 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 9.35 sec
  2014-12-12 13:37:20,910 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 11.98 sec
  2014-12-12 13:37:22,296 Stage-1 map = 97%,  reduce = 0%, Cumulative CPU 15.31 sec
  2014-12-12 13:37:23,681 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 15.58 sec
  2014-12-12 13:37:25,069 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 15.58 sec
  ..........
  Job 0: Map: 6   Cumulative CPU: 15.58 sec   HDFS Read: 4532 HDFS Write: 496 SUCCESS
  Total MapReduce CPU Time Spent: 15 seconds 580 msec
  OK
  truck GM Silverado2500 4WheelDrive           blue          31115.548828125 68
  truck GM Silverado2500 4WheelDrive-4cylinder blue-on-green 31114.91015625  17
  truck GM Silverado2500 AllWheelDrive         white         31115.275390625 36
  truck GM Silverado2500 AllWheelDrive         yellow        31114.796875    73
  truck GM Silverado1500 4WheelDrive           black         31114.98046875  64
  truck GM Silverado1500 4WheelDrive           green         31115.078125    38
  truck GM Silverado1500 RearWheelDrive        blue          31115.640625    55
  truck GM Silverado1500 RearWheelDrive        white         31115.517578125 37
  ..........
  Time taken: 83.589 seconds, Fetched: 8 row(s)

Example 2: Executing Hive Queries Against Non-Primitive Data Types

This second example demonstrates how to execute various Hive queries against an Oracle NoSQL Database table defined with a more complex schema than that employed in Example 1. In this example, a schema is employed that consists of a variety of data types; both primitive and non-primitive. And in the same manner as Example 1, you must take the following initial steps to run the queries presented in this example:

Satisfy the necessary prerequisites.
Deploy either a non-secure or a secure KVStore (or both).
Follow the instructions presented in Appendix A of this document to create and populate a table named rmvTable in each store that you deploy.
If any of the stores you deploy are configured for secure access, then follow the steps provided in Appendix C of this document to configure Hive with the artifacts and environment necessary to interact with a secure KVStore.

After performing each of these initial steps, you can then create a Hive external table mapped to the KVStore rmvTable, and execute the set of example Hive queries (described below) against the data stored in that table.

Create a Hive External Table Mapped to the Oracle NoSQL Database 'rmvTable'

After performing the initial steps listed in the previous section, use the Hive CLI to create a Hive external table mapped to the Oracle NoSQL Database rmvTable located in each KVStore that was deployed. That is, type the following at the command line of the Hive client node:

  > hive

which should present the following Hive CLI command prompt:

  hive>

At this point, you can execute the Hive commands presented in the next sub-sections for the non-secure KVStore or the secure KVStore (or both).

— Creating a Hive External Table Mapped to 'rmvTable' in a Non-Secure KVStore —

  hive> CREATE EXTERNAL TABLE IF NOT EXISTS 
          rmvTable (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING, 
                    SSN BIGINT, 
                    GENDER STRING, 
                    LICENSE BINARY, 
                    PHONEINFO MAP<STRING, STRING>, 
                    ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>, 
                    VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
          COMMENT 'Hive table rmvTable <---> KVStore table rmvTable' 
          STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' 
          TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
                         "oracle.kv.hosts" = "kv-host-1:5000",
                         "oracle.kv.tableName" = "rmvTable",
                         "oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3");

Noting the same additional points that were noted for each of the cases presented in Example 1, the command above applies the required Oracle NoSQL Database-to-Hive Data Model Mappings to create a Hive table named rmvTable with columns whose types are consistent with the corresponding fields of the Oracle NoSQL Database table — also named rmvTable — that you created and populated when you followed the steps outlined in Appendix A of this document.

Note that it is important to know the specific schema of the Oracle NoSQL Database table you wish to query before specifying the schema of that table's corresponding Hive external table. For example, observe that the field named gender in the schema of the Oracle NoSQL Database rmvTable presented in Appendix C is defined as type FieldDef.Type.ENUM, whereas the GENDER field specified in the Hive external table created above is defined as type STRING; as specified by the required data model mapping. Thus, the combination of the Oracle NoSQL Database-to-Hive Data Model Mapping with the given Oracle NoSQL Database table's specific schema allows you to determine whether a STRING field in a Hive table is mapped from an Oracle NoSQL Database FieldDef.Type.STRING or FieldDef.Type.ENUM.

— Creating a Hive External Table Mapped to 'rmvTable' in a Secure KVStore —

  hive> CREATE EXTERNAL TABLE IF NOT EXISTS 
          rmvTablePasswd (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING, 
                    SSN BIGINT, 
                    GENDER STRING, 
                    LICENSE BINARY, 
                    PHONEINFO MAP<STRING, STRING>, 
                    ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>, 
                    VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
          COMMENT 'Hive table rmvTablePasswd <---> KVStore table rmvTable' 
          STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' 
          TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
                         "oracle.kv.hosts" = "kv-host-1:5000",
                         "oracle.kv.tableName" = "rmvTable",
                         "oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3",
                         "oracle.kv.security" = "/tmp/hive-nosql.login", 
                         "oracle.kv.ssl.trustStore" = "/tmp/client.trust", 
                         "oracle.kv.auth.username" = "example-user", 
                         "oracle.kv.auth.pwdfile.file" = "/tmp/example-user.passwd");

On the other hand, if your password storage employs an Oracle Wallet instead of a password file, then type the following at the command prompt to create the required Hive table,

  hive> CREATE EXTERNAL TABLE IF NOT EXISTS 
          rmvTableWallet (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING, 
                    SSN BIGINT, 
                    GENDER STRING, 
                    LICENSE BINARY, 
                    PHONEINFO MAP<STRING, STRING>, 
                    ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>, 
                    VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
          COMMENT 'Hive table rmvTableWallet <---> KVStore table rmvTable' 
          STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' 
          TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
                         "oracle.kv.hosts" = "kv-host-1:5000",
                         "oracle.kv.tableName" = "rmvTable",
                         "oracle.kv.hadoop.hosts" = "dn-host-1,dn-host-2,dn-host-3",
                         "oracle.kv.security" = "/tmp/hive-nosql.login", 
                         "oracle.kv.ssl.trustStore" = "/tmp/client.trust", 
                         "oracle.kv.auth.username" = "example-user", 
                         "oracle.kv.auth.wallet.dir" = "/tmp/example-user-wallet.dir");

where the commands above each combine the required data model mapping with the schema of the rmvTable that you created to produce a Hive external table named rmvTablePasswd and rmvTableWallet respectively; each with the same structure, schema, and attributes as the Hive rmvTable produced in the non-secure case. And where the same additional points noted for the non-secure case also apply in this case.

Execute Example Hive Queries Against the Oracle NoSQL Database 'rmvTable'

After creating and mapping the Hive tables described above to the Oracle NoSQL Database rmvTable, the data in that table can be queried in the manner shown below by querying each Hive table. Note that because three Hive tables were created (rmvTable, rmvTablePasswd, and rmvTableWallet), where the only difference is whether the KVStore is non-secure or secure and, if secure, whether the password is stored in a password file or a wallet, for each type of query that is demonstrated, three instances of the given example query are shown; one for each of the three Hive tables. Additionally, note that for any given query, the output of the query will be similar whether it is executed against the Hive rmvTable, rmvTablePasswd, or rmvTableWallet. Therefore, to save space, for each example query that is presented for the three Hive tables, sample output is shown only once.

— List the Contents of Each Row in the 'rmvTable' —

  hive> SELECT * FROM rmvTable;
  hive> SELECT * FROM rmvTablePasswd;
  hive> SELECT * FROM rmvTableWallet;

  OK
  49027  GOMEZ CHRISTOPHER  509367447  male  S57428836  
  {"cell":"616-351-0185","home":"213-630-2419","work":"617-227-9840"}
  {"number":88072,"street":"Fifth Avenue","unit":6,"city":"Cambridge","state":"OK","zip":49027}
  [{"type":"auto","make":"Ford","model":"Taurus","class":"AllWheelDrive","color":"blue","value":20743.234,"tax":566.290283203125,"paid":false},
   {"type":"auto","make":"Ford","model":"Taurus","class":"FrontWheelDrive","color":"blue","value":20743.559,"tax":566.2991333007812,"paid":true}]
  40719  ROSARIO  ANNE  448406765  female  S04809975
  {"cell":"303-804-1660","home":"408-630-2412","work":"415-804-9515"}
  {"number":96581,"street":"Third Avenue","unit":7,"city":"Springfield","state":"RI","zip":40719}
  [{"type":"truck","make":"Chrysler","model":"Ram3500","class":"RearWheelDrive","color":"blue","value":31115.26,"tax":849.4465942382812,"paid":true},
   {"type":"truck","make":"Chrysler","model":"Ram1500","class":"AllWheelDrive","color":"blue","value":31114.873,"tax":849.43603515625,"paid":false},
   {"type":"auto","make":"Ford","model":"Edge","class":"RearWheelDrive","color":"yellow","value":20743.889,"tax":566.3081665039062,"paid":true}]
  ..........

— List the Name, Gender, and Address of Each Vehicle Owner in 'rmvTable' —

  hive> SELECT lastname,firstname,gender,address FROM rmvTable;
  hive> SELECT lastname,firstname,gender,address FROM rmvTablePasswd;
  hive> SELECT lastname,firstname,gender,address FROM rmvTableWallet;

  Launching Job 1 out of 1
  ..........
  Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
  2015-06-25 05:39:04,940 Stage-1 map = 0%,  reduce = 0%
  2015-06-25 05:39:43,752 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 8.38 sec
  2015-06-25 05:39:45,242 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 21.73 sec
  2015-06-25 05:39:46,727 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 26.72 sec
  ..........
  Job 0: Map: 6   Cumulative CPU: 26.72 sec   HDFS Read: 4760 HDFS Write: 4702 SUCCESS
  Total MapReduce CPU Time Spent: 26 seconds 720 msec
  OK
  SNIDER FRANK  male {"number":33512,"street":"Summer Street","unit":1,"city":"Arlington","state":"TN","zip":89150}
  MILLER ROCH  male {"number":25698,"street":"Mullberry Street","unit":6,"city":"Madison","state":"VA","zip":5740}
  TATE BENJAMIN  male {"number":2894,"street":"View Street","unit":-1,"city":"Clinton","state":"KY","zip":57466}
  ..........
  Time taken: 87.327 seconds, Fetched: 79 row(s)

— List the Name and Phone Number (Home, Cell, or Work) of Each Vehicle Owner in 'rmvTable' —

  hive> SELECT firstname,lastname,phoneinfo["home"] FROM rmvTable;
  hive> SELECT firstname,lastname,phoneinfo["cell"] FROM rmvTablePasswd;
  hive> SELECT firstname,lastname,phoneinfo["work"] FROM rmvTableWallet;

  Launching Job 1 out of 1
  ..........
  Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
  2015-06-25 06:08:09,885 Stage-1 map = 0%,  reduce = 0%
  2015-06-25 06:08:41,094 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 18.11 sec
  2015-06-25 06:08:42,582 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 18.11 sec
  ..........
  Job 0: Map: 6   Cumulative CPU: 18.11 sec   HDFS Read: 4724 HDFS Write: 2141 SUCCESS
  Total MapReduce CPU Time Spent: 18 seconds 110 msec
  OK
  CHRISTOPHER GOMEZ  213-630-2419
  ANNE ROSARIO  408-630-2412
  MEGAN PHELPS  978-541-5710
  MICHAEL BRADLEY  313-351-4580
  ..........
  Time taken: 69.173 seconds, Fetched: 79 row(s)

— Count the Total Number of Rows in 'rmvTable' —

  hive> SELECT count(vehicleinfo[0].type) FROM rmvTable;
  hive> SELECT count(vehicleinfo[0].type) FROM rmvTablePasswd;
  hive> SELECT count(vehicleinfo[0].type) FROM rmvTableWallet;

  Launching Job 1 out of 1
  ..........
  Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 1
  2015-06-25 05:53:31,200 Stage-1 map = 0%,  reduce = 0%
  2015-06-25 05:54:09,856 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 12.12 sec
  2015-06-25 05:54:11,346 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 24.45 sec
  2015-06-25 05:54:12,835 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 24.45 sec
  2015-06-25 05:54:14,336 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 25.51 sec
  2015-06-25 05:54:15,825 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 25.51 sec
  ..........
  Job 0: Map: 6  Reduce: 1   Cumulative CPU: 25.51 sec   HDFS Read: 4760 HDFS Write: 3 SUCCESS
  Total MapReduce CPU Time Spent: 25 seconds 510 msec
  OK
  79
  Time taken: 82.824 seconds, Fetched: 1 row(s)

— For Each Owner's Primary Vehicle in 'rmvTable', Find the Minimum Assessed Value —

  hive> SELECT min(vehicleinfo[0].value) FROM rmvTable;
  hive> SELECT min(vehicleinfo[0].value) FROM rmvTablePasswd;
  hive> SELECT min(vehicleinfo[0].value) FROM rmvTableWallet;

  Launching Job 1 out of 1
  ..........
  Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 1
  2014-12-12 12:11:10,924 Stage-1 map = 0%,  reduce = 0%
  2014-12-12 12:12:06,213 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 6.77 sec
  2014-12-12 12:12:07,606 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 6.77 sec
  2014-12-12 12:12:09,076 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 14.16 sec
  2014-12-12 12:12:10,464 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 14.3 sec
  2014-12-12 12:12:11,849 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 14.3 sec
  2014-12-12 12:12:13,238 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 14.3 sec
  2014-12-12 12:12:14,629 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 15.38 sec
  2014-12-12 12:12:16,031 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 15.38 sec
  ..........
  Job 0: Map: 6  Reduce: 1   Cumulative CPU: 15.38 sec   HDFS Read: 4532 HDFS Write: 16 SUCCESS
  Total MapReduce CPU Time Spent: 15 seconds 380 msec
  OK
  20743.244140625
  Time taken: 89.615 seconds, Fetched: 1 row(s)

— List all Vehicle Information for the Primary (or Second or Third) Vehicle of Each Owner in 'rmvTable' —

  hive> SELECT vehicleinfo[0] FROM rmvTable;
  hive> SELECT vehicleinfo[1] FROM rmvTablePasswd;
  hive> SELECT vehicleinfo[2] FROM rmvTableWallet;

  Launching Job 1 out of 1
  ..........
  Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
  2015-06-25 06:31:54,779 Stage-1 map = 0%,  reduce = 0%
  2015-06-25 06:32:33,430 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 4.59 sec
  2015-06-25 06:32:34,919 Stage-1 map = 95%,  reduce = 0%, Cumulative CPU 27.33 sec
  2015-06-25 06:32:36,416 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 27.89 sec
  2015-06-25 06:32:37,903 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 27.89 sec
  ..........
  Job 0: Map: 6   Cumulative CPU: 27.89 sec   HDFS Read: 4760 HDFS Write: 5681 SUCCESS
  Total MapReduce CPU Time Spent: 27 seconds 890 msec
  OK
  {"type":"suv","make":"GM","model":"Tahoe","class":"4WheelDrive","color":"black","value":41487.242,"tax":1132.6016845703125,"paid":true}
  {"type":"auto","make":"Chrysler","model":"Imperial","class":"AllWheelDrive","color":"red","value":20743.926,"tax":566.3092041015625,"paid":true}
  {"type":"auto","make":"Ford","model":"Taurus","class":"RearWheelDrive","color":"blue","value":20744.076,"tax":566.3132934570312,"paid":true}
  {"type":"truck","make":"Ford","model":"F150","class":"AllWheelDrive","color":"green","value":31115.299,"tax":849.4476928710938,"paid":false}
  ..........
  Time taken: 82.056 seconds, Fetched: 79 row(s)

— List the Name, Address, and Primary Vehicle Information of All Owners in 'rmvTable' Whose Last Name Starts with 'H' —

  hive> SELECT firstname,lastname,address,vehicleinfo[0] FROM rmvTable WHERE lastname RLIKE "^[H].*";
  hive> SELECT firstname,lastname,address,vehicleinfo[0] FROM rmvTablePasswd WHERE lastname RLIKE "^[H].*";
  hive> SELECT firstname,lastname,address,vehicleinfo[0] FROM rmvTableWallet WHERE lastname RLIKE "^[H].*";

  Launching Job 1 out of 1
  ..........
  Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
  2015-06-25 06:41:35,578 Stage-1 map = 0%,  reduce = 0%
  2015-06-25 06:42:14,310 Stage-1 map = 15%,  reduce = 0%
  2015-06-25 06:42:15,803 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 9.46 sec
  2015-06-25 06:42:17,290 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 23.29 sec
  2015-06-25 06:42:18,778 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 27.78 sec
  2015-06-25 06:42:20,271 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 27.78 sec
  ..........
  Job 0: Map: 6   Cumulative CPU: 27.78 sec   HDFS Read: 4760 HDFS Write: 1143 SUCCESS
  Total MapReduce CPU Time Spent: 27 seconds 780 msec
  OK
  CINDY HODGES {"number":56758,"street":"Vaughan Avenue","unit":-1,"city":"Madison","state":"NH","zip":79623}
  {"type":"truck","make":"Chrysler","model":"Ram1500","class":"RearWheelDrive","color":"black","value":31115.129,"tax":849.4430541992188,"paid":true}
  JULIA HOLDEN {"number":56209,"street":"Main Street","unit":1,"city":"Georgetown","state":"CA","zip":62154}
  {"type":"auto","make":"Ford","model":"Taurus","class":"FrontWheelDrive","color":"blue","value":20743.8,"tax":566.3057861328125,"paid":true}
  PHYLLIS HOGAN {"number":47358,"street":"Park Street","unit":6,"city":"Arlington","state":"AL","zip":77333}
  {"type":"suv","make":"Chrysler","model":"Journey","class":"RearWheelDrive","color":"yellow","value":41486.79,"tax":1132.58935546875,"paid":false}
  ..........
  Time taken: 81.171 seconds, Fetched: 9 row(s)

— List the Name, Address, and Vehicle Information of Each Owner in 'rmvTable' Whose Second Vehicle is a GM Vehicle —

  hive> SELECT firstname,lastname,address,vehicleinfo[1] FROM rmvTable WHERE vehicleinfo[1].make LIKE "%GM%";
  hive> SELECT firstname,lastname,address,vehicleinfo[1] FROM rmvTablePasswd WHERE vehicleinfo[1].make LIKE "%GM%";
  hive> SELECT firstname,lastname,address,vehicleinfo[1] FROM rmvTableWallet WHERE vehicleinfo[1].make LIKE "%GM%";

  Launching Job 1 out of 1
  ..........
  Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
  2015-06-25 06:57:01,117 Stage-1 map = 0%,  reduce = 0%
  2015-06-25 06:57:32,506 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 9.29 sec
  2015-06-25 06:57:34,000 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 18.8 sec
  2015-06-25 06:57:35,484 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 18.8 sec
  ..........
  Job 0: Map: 6   Cumulative CPU: 18.8 sec   HDFS Read: 4724 HDFS Write: 2087 SUCCESS
  Total MapReduce CPU Time Spent: 18 seconds 800 msec
  OK
  NANCY STOUT {"number":31126,"street":"Cedar Street","unit":8,"city":"Arlington","state":"MO","zip":731}
  {"type":"suv","make":"GM","model":"Equinox","class":"AllWheelDrive","color":"red","value":41486.43,"tax":1132.57958984375,"paid":true}
  DONALD COOK {"number":34287,"street":"Third Avenue","unit":1,"city":"Cambridge","state":"NV","zip":67995}
  {"type":"suv","make":"GM","model":"Tahoe","class":"AllWheelDrive","color":"black","value":41487.168,"tax":1132.5997314453125,"paid":false}
  RANDY MCDOWELL {"number":18391,"street":"Lane Avenue","unit":8,"city":"Concord","state":"NH","zip":42540}
  {"type":"auto","make":"GM","model":"Corvette","class":"FrontWheelDrive","color":"black","value":20744.035,"tax":566.3121337890625,"paid":false}
  ..........
  Time taken: 71.794 seconds, Fetched: 17 row(s)

— List the Name, Address, and Vehicle Information of Each Owner in 'rmvTable' Whose Primary Vehicle is a Silverado Truck —

  hive> SELECT firstname,lastname,address,vehicleinfo[0] FROM rmvTable WHERE vehicleinfo[0].model LIKE "%Silverado%";
  hive> SELECT firstname,lastname,address,vehicleinfo[0] FROM rmvTablePasswd WHERE vehicleinfo[0].model LIKE "%Silverado%";
  hive> SELECT firstname,lastname,address,vehicleinfo[0] FROM rmvTableWallet WHERE vehicleinfo[0].model LIKE "%Silverado%";

  Launching Job 1 out of 1
  ..........
  Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
  2015-06-25 07:05:40,301 Stage-1 map = 0%,  reduce = 0%
  2015-06-25 07:06:18,907 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 9.6 sec
  2015-06-25 07:06:20,406 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 27.65 sec
  2015-06-25 07:06:21,894 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 27.65 sec
  ..........
  Job 0: Map: 6   Cumulative CPU: 27.65 sec   HDFS Read: 4760 HDFS Write: 1185 SUCCESS
  Total MapReduce CPU Time Spent: 27 seconds 650 msec
  OK
  PHYLLIS MARTINEZ {"number":29978,"street":"Park Street","unit":7,"city":"Burlington","state":"NM","zip":28564}
  {"type":"truck","make":"GM","model":"Silverado2500","class":"4WheelDrive","color":"black","value":31115.295,"tax":849.4475708007812,"paid":false}
  SHANNON KELLEY {"number":98370,"street":"Maple Street","unit":9,"city":"Boston","state":"TX","zip":16166}
  {"type":"truck","make":"GM","model":"Silverado2500","class":"FrontWheelDrive","color":"yellow","value":31115.209,"tax":849.4451904296875,"paid":true}
  FRANCES HOLLOWAY {"number":2650,"street":"Second Avenue","unit":6,"city":"Arlington","state":"KY","zip":7182}
  {"type":"truck","make":"GM","model":"Silverado2500","class":"FrontWheelDrive","color":"yellow","value":31115.31,"tax":849.447998046875,"paid":true}
  ..........
  Time taken: 79.539 seconds, Fetched: 9 row(s)

— List the Name, Address, Model, Assessed Value and Registration Fee (Paid or Not) for Each Chrysler Primary Vehicle in 'rmvTable' —

  hive> SELECT firstname,lastname,address,vehicleinfo[0].model,vehicleinfo[0].value,vehicleinfo[0].tax,vehicleinfo[0].paid FROM rmvTable WHERE vehicleinfo[0].make LIKE "%Chrysler%";
  hive> SELECT firstname,lastname,address,vehicleinfo[0].model,vehicleinfo[0].value,vehicleinfo[0].tax,vehicleinfo[0].paid FROM rmvTablePasswd WHERE vehicleinfo[0].make LIKE "%Chrysler%";
  hive> SELECT firstname,lastname,address,vehicleinfo[0].model,vehicleinfo[0].value,vehicleinfo[0].tax,vehicleinfo[0].paid FROM rmvTableWallet WHERE vehicleinfo[0].make LIKE "%Chrysler%";

  Launching Job 1 out of 1
  ..........
  Hadoop job information for Stage-1: number of mappers: 6; number of reducers: 0
  2015-06-25 07:19:55,601 Stage-1 map = 0%,  reduce = 0%
  2015-06-25 07:20:26,973 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 18.15 sec
  2015-06-25 07:20:28,454 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 18.15 sec
  ..........
  Job 0: Map: 6   Cumulative CPU: 18.15 sec   HDFS Read: 4724 HDFS Write: 2164 SUCCESS
  Total MapReduce CPU Time Spent: 18 seconds 150 msec
  OK
  ANNE ROSARIO {"number":96581,"street":"Third Avenue","unit":7,"city":"Springfield","state":"RI","zip":40719}
  Ram3500 31115.26 849.4465942382812 true
  MEGAN PHELPS {"number":12713,"street":"MAC Avenue","unit":4,"city":"Salem","state":"MS","zip":76554}
  Ram1500 31115.309 849.4479370117188 true
  BRIAN ROWLAND {"number":37868,"street":"First Street","unit":3,"city":"Salem","state":"GA","zip":98106}
  Imperial 20744.156 566.3154907226562 true
  ..........
  Time taken: 72.163 seconds, Fetched: 23 row(s)

Example 3: Executing Big Data SQL Queries

Oracle Big Data SQL (for the purposes of this document, referred to as Big Data SQL) began as a system consisting of proprietary hardware and software; including an Oracle Big Data Appliance, Oracle Exadata Database Machine, custom software extended to support Big Data SQL functionality, and of course, Oracle NoSQL Database itself. In addition to providing Big Data SQL in the form of one of these so-called "fully engineered hardware systems", Oracle now also provides Big Data SQL functionality in the form of a software solution that can be installed on other Hadoop systems; with no requirement for an Oracle Exadata Database Machine.

The goal of the Big Data SQL product is to allow users to employ SQL to manage and manipulate data stored in a number of different locations. Specifically, Big Data SQL is designed to provide SQL access to data stored in Hadoop HDFS, various NoSQL databases — including Oracle NoSQL Database — as well as various relational databases. Big Data SQL achieves this by presenting Hadoop HDFS, Oracle NoSQL Database, and other NoSQL data sources as enhanced Oracle External Tables (external tables) of the Oracle Relational Database Management System (RDBMS); mapping the external semantics of accessing data from those sources — horizontal parallelism, location, and schema — to Oracle RDBMS internals.

Because you may not have access to either form of an Oracle Big Data SQL system, the Oracle Big Data Lite Virtual Machine (referred to in this document as the Big Data Lite VM), is provided to allow you to evaluate and experiment with Big Data SQL without requiring you to purchase, install, and manage the real system. Although detailed documentation is provided when you purchase a Big Data SQL system, as a convenience, this section walks you through an example that employs the Big Data Lite VM to demonstrate how to use Big Data SQL to query data stored in a KVStore — both secure and non-secure. After working with this example on the Big Data Lite VM, you should be able to extend the concepts that are demonstrated, and apply that knowledge should you ever have access to a real Big Data SQL system.

A Brief Big Data SQL Primer

As stated above, Big Data SQL allows SQL access to various external data sources — specifically, a KVStore — by presenting such data sources as Oracle External Tables. To achieve this, a mechanism referred to as an access driver is employed to access data as if it were a table in the Oracle relational database provided in the system. Big Data SQL extends the access driver mechanism of external tables by specifying new access driver types; one for each type of external data source that will be accessed. Prior to the introduction of Big Data SQL, the Oracle RDBMS external tables mechanism defined only two access driver types:

ORACLE_LOADER: for reading data from flat files.
ORACLE_DATAPUMP: for migrating data between Oracle databases in a proprietary format.

With the introduction of Big Data SQL, the following new access driver types are defined:

ORACLE_HDFS: for accessing data stored in the Hadoop Distributed File System (HDFS).
ORACLE_HIVE: for accessing data stored in Hive tables or Oracle NoSQL Database tables.
ORACLE_BIGDATA: for providing extensibility to access other data stores.

Both the ORACLE_HDFS and ORACLE_HIVE access drivers require the specification of a number of classes that satisfy the Hadoop MapReduce programming model. Some of those classes are required by both access driver types, whereas some are required by only the ORACLE_HIVE access driver. The class types required by both ORACLE_HDFS and ORACLE_HIVE are the same as those described in the Hadoop/Table API Example; whereas the class types required by ORACLE_HIVE but not by ORACLE_HDFS are:

An instance of org.apache.hadoop.hive.ql.metadata.HiveStorageHandler; for example, the Oracle NoSQL Database TableStorageHandler.
An instance of org.apache.hadoop.hive.serde2.SerDe; for example, the Oracle NoSQL Database TableSerDe.
Various instances of org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; for example, instances of the Oracle NoSQL Database classes described previously in this document.

Note that the ORACLE_HDFS access driver can only read data stored in HDFS files, whereas the ORACLE_HIVE access driver can read data stored not only in HDFS files, but in other locations as well; for example, a KVStore. As you will see below, the integration of Oracle NoSQL Database with Hive — as described above — plays a prominent role in the integration of Oracle NoSQL Database with Big Data SQL.

Mapping the Oracle RDBMS Data Model to the Oracle NoSQL Database Table API Data Model

As the examples in this section demonstrate, in order to execute a Big Data SQL query against data stored in an Oracle NoSQL Database table, a Hive external table must first be created with a schema mapped from the schema of the desired Oracle NoSQL Database table, and then a corresponding Oracle RDBMS external table must be created with a schema mapped from the schema of the Hive table. This is accomplished by applying the mappings shown in the following table:

Oracle NoSQL Database-to-Hive-to-RDBMS Data Type Mappings
Oracle NoSQL Database Table API	Hive	Oracle RDBMS
FieldDef.Type.STRING	STRING	VARCHAR2(N)
	VARCHAR
	CHAR
FieldDef.Type.BOOLEAN	BOOLEAN	VARCHAR2(5)
FieldDef.Type.BINARY	BINARY	VARCHAR2(N)
FieldDef.Type.FIXED_BINARY	BINARY	VARCHAR2(N)
	TINYINT
	SMALLINT
FieldDef.Type.INTEGER	INT	NUMBER
FieldDef.Type.LONG	BIGINT	NUMBER
FieldDef.Type.FLOAT	FLOAT	NUMBER
	DECIMAL
FieldDef.Type.DOUBLE	DOUBLE	NUMBER
FieldDef.Type.ENUM	STRING	VARCHAR2(N)
	TIMESTAMP
	DATE
FieldDef.Type.ARRAY	ARRAY	VARCHAR2(N)
FieldDef.Type.MAP	MAP<STRING, data_type>	VARCHAR2(N)
FieldDef.Type.RECORD	STRUCT<col_name : data_type, ...>	VARCHAR2(N)
	UNIONTYPE<data_type, data_type, ...>

It is important to understand that when using Big Data SQL to query data in an Oracle NoSQL Database table, the schema of the Oracle external table you create is dependent on the schema of the corresponding Hive external table; which, in turn, is dependent on the schema of the Oracle NoSQL Database table you wish to query. Thus, if either type of external table is created using a schema that includes a data type that does not belong to one of the end-to-end mappings presented above, then an error will occur when any attempt is made to query the table.

Note that for fields in the Oracle external table specified as VARCHAR2(N), the value of N is the maximum number of characters of the variable length STRING that represents the specified field in the corresponding Hive and Oracle NoSQL Database tables. Therefore, you should use the type, structure, and expected length or size of the corresponding Hive and Oracle NoSQL Database fields to determine the appropriate value to specify for N when creating the Oracle external table.

Installing the Big Data Lite VM (the BDL)

Before proceeding with this example you should obtain, install, and deploy the Big Data Lite VM as described here. Note that the Big Data Lite VM must be deployed to a virtualization product such as Oracle VirtualBox. After installing the necessary environment and deploying the Big Data Lite VM, you should verify that the necessary services are running in that VM and start those that are not running. You can do this by executing the services utility from the command line; which will display an interactive user interface with information like that shown below. That is,

  > services

Use arrow keys to navigate, space bar to select/deselect services to start/stop, enter key to save changes
[*]	ORCL	Oracle Database 12c (on)
[*]	Zookeeper	zookeeper (on)
[*]	HDFS	namenode (on) datanode (on) ...
[ ]	HBase	master (off) regionserver (off) thrift (off)
[*]	Hive	metastore (on) hive-server2 (on)
[]	Hue	Hue (off)
[]	Impala	impala-server (off)
[]	NoSQL	Oracle NoSQL Database (off)
[]	Oozie	oozie (of)
[]	ORDS-Apex	Oracle Rest Data Service-Apex (off)
[]	Solar	solar-server (off)
[]	Sqoop2	sqoop2-server (off)
[]	WebLogic-MovieDemo	WebLogic-MovieDemo (off)
[*]	YARN	resourcemanager (on) nodemanager (on) ...

The services selected (marked with asterisks) are the services that are necessary when walking through the examples presented in the following sections. In an effort to improve boot performance, the latest version of the Big Data Lite VM will typically come out of the box running only the Zookeeper, HDFS, Hive, and YARN services. As a result, to run the examples presented here, you will need to select the ORCL service for auto-start. Additionally, because those examples are designed to work with an externally deployed KVStore, there is no need to select the Oracle NoSQL Dabase service for auto-start.

The examples presented here assume that you have satisfied the prerequisites listed previously in this document; which means that a KVStore — non-secure, secure, or both — has been deployed to separate, external nodes unrelated to the Big Data Lite VM; where the nodes are network reachable from that VM. Although it is possible to select the Oracle NoSQL Database service and deploy a KVStore (with KVHOME=/u01/nosql/kv-ee) on the Big Data Lite VM, it is important to understand that typical deployments of the Oracle Big Data Appliance do not run the Hadoop services (HDFS, YARN, and Hive) on the same nodes used to run the Oracle NoSQL Database service. As a result, the examples presented here are run in a manner intended to be consistent with the more typical deployment scenarios.

After satisfying the necessary prerequisites, creating and populating a vehicleTable and rmvTable in a KVStore, installing and deploying the Big Data Lite VM and verifying that the necessary services are running on that VM, you are then ready to walk through the Big Data SQL examples presented in the following sections.

Executing a Big Data SQL Query Against a KVStore Table

Once the KVStore — non-secure or secure — is deployed with the expected tables, you can follow the steps presented in the next section to configure the Big Data Lite VM environment for executing either Hive or Big Data SQL queries against the data in those tables. But before proceeding to that section, you should first become familiar with the directory structure and base configuration of the Big Data SQL system; specifically, consider the following directories and files from the 4.5.0 release of the Big Data Lite VM:

/etc/hive/conf.bigdatalite
  hive-env.sh
  hive-site.xml
/opt/hive-aux-jars
  apache-xmlbeans.jar -> /u01/bigdatasql_config/hive_aux_jars/apache-xmlbeans.jar
  hive-hcatalog-core.jar -> /u01/bigdatasql_config/hive_aux_jars/hive-hcatalog-core.jar
  kvclient.jar -> /u01/nosql-ee/lib/kvclient.jar
  threetenbp.jar -> /u01/nosql-ee/lib/threetenbp.jar
  ojdbc7.jar -> /u01/orahivedp/jlib/ojdbc7.jar
  ....
  xqjapi.jar -> /u01/bigdatasql_config/hive_aux_jars/xqjapi.jar
/u01
  /bigdatasql_config
    bigdata.properties
    bigdata-log4j.properties
    /bigdatalite
      core-site.xml
      hdfs-site.xml
      hive-env.sh
      hive-site.xml
      mapred-site.xml
    /hive_aux_jars
      apache-xmlbeans.jar -> /u01/connectors/oxh/hive/lib/apache-xmlbeans.jar
      hive-hcatalog-core.jar -> /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar
      orai18n-collation.jar -> /u01/connectors/oxh/hive/lib/orai18n-collation.jar
      ....
      xqjapi.jar -> /u01/connectors/oxh/hive/lib/xqjapi.jar
      /log
        ....
  /kv-ee
    /doc
    /lib
    LICENSE.txt
    README.txt
  /nosql
    kv-ee -> /u01/kv-ee
    /scripts
      ....

Note that as of version 4.3.0 of Oracle NoSQL Database, the threetenbp.jar third party library must belong to the set of Hive auxilary jar files (referenced in the directory /opt/hive-aux-jars). As a result, if your version of the Big Data SQL system does not include the link to /u01/nosql-ee/lib/threetenbp.jar in /opt/hive-aux-jars shown above, then you should create such a link; for example,

  > cd /opt/hive-aux-jars
  > sudo ln -s /u01/nosql-ee/lib/threetenbp.jar threetenbp.jar

You should also note that, as described in the Big Data Lite VM documentation, the username oracle with password welcome1 must be used when logging into the Big Data Lite VM.

Configuring the Big Data Lite VM for Hive and Big Data SQL Queries

If you wish to run queries — Hive queries or Big Data SQL queries — against a secure KVStore, then depending on the particular version of the Big Data Lite VM (or the Big Data SQL system) you are using, you may need to add one or more security related JAR files to both the classpath used by Hive and the classpath used by Big Data SQL.

— Configuring the BDL's Hive Classpath for Oracle Wallet Usage —

If you are using Hive (rather than Big Data SQL) to query a secure KVStore, and you wish to use an Oracle Wallet to store your password, then the classpath employed by Hive must include the JAR files that provide access to the Oracle Wallet mechanism. To achieve this, add the necessary JAR files to the /opt/hive-aux-jars directory by typing the following commands:

  > cd /opt/hive-aux-jars

  > sudo ln -s /u01/nosql/kv-ee/lib/oraclepki.jar.jar oraclepki.jar.jar
  > sudo ln -s /u01/nosql/kv-ee/lib/osdt_cert.jar osdt_cert.jar
  > sudo ln -s /u01/nosql/kv-ee/lib/osdt_core.jar osdt_core.jar

Note that in prior versions of the Big Data Lite VM, the JAR files above were added to the Hive classpath by adding those files to the HIVE_AUX_JARS_PATH variable specified in the file, /etc/hive/conf.bigdatalite/hive-env.sh. As of version 4.5.0, HIVE_AUX_JARS_PATH now includes the contents of the /opt/hive-aux-jars directory. Thus, rather than modifying /etc/hive/conf.bigdatalite/hive-env.sh, you now simply need to add the necessary JAR files to the /opt/hive-aux-jars directory, as described above.

— Configuring the BDL's Big Data SQL Classpath for Oracle Wallet Usage —

As previously explained, the installation of the Oracle Wallet JAR files into /opt/hive-aux-jars will add those JAR files to the classpath used by Hive, but not Big Data SQL. Thus, if you are using Big Data SQL to query a secure KVStore, and you wish to use an Oracle Wallet to store the password needed to access that store, then the Oracle Wallet JAR files must be added to the classpath employed by Big Data SQL. To do this, edit the file /u01/bigdatasql_config/bigdata.properties and add the necessary JAR files to the property named, java.classpath.oracle; that is,

  > edit /u01/bigdatasql_config/bigdata.properties

    java.classpath.oracle=/u01/app/oracle/product/12.1.0.2/dbhome_1/jlib/oracle-hadoop-sql.jar:\
                       /u01/app/oracle/product/12.1.0.2/dbhome_1/jlib/oraloader.jar:\
                       ....
                       /u01/bigdatasql_config/hive_aux_jars/kvclient.jar:\
                       /u01/bigdatasql_config/hive_aux_jars/threetenbp.jar:\
                       /u01/bigdatasql_config/hive_aux_jars/oraclepki.jar:\
                       /u01/bigdatasql_config/hive_aux_jars/osdt_cert.jar:\
                       /u01/bigdatasql_config/hive_aux_jars/osdt_core.jar

With these changes, along with the changes described in the next section, Big Data SQL queries can be run against a secure store when an Oracle Wallet is used to store the user's password.

Note that when using future versions of Big Data SQL, the installation of the Oracle Wallet JAR files described above should not be necessary; as it is expected that those JAR files will be pre-installed, and simply work out-of-the-box.

Configuring the Big Data Lite VM for Queries Against a Secure Store

Although the installation of the Oracle Wallet JAR files described in the previous sections is a necessary condition for using an Oracle Wallet when running Hive and/or Big Data SQL queries against a secure KVStore, it is not a sufficient condition. In addition to those Oracle-provided JAR files, there are a number of other security artifacts that also must be installed and added to the Hive and Big Data SQL classpaths. Unlike the Oracle Wallet JAR files, which are installed only once and will (in future releases) always be available, these additional artifacts are typically generated and installed on a case-by-case basis, depending on the configuration of the given KVStore.

Appendix C describes the additional security artifacts that must be generated and installed to support executing queries (both Hive and Big Data SQL) against table data stored in a secure KVStore; including the login, trust, and password artifacts as well as the server side JAR file that contains the necessary public credentials. Each of those artifacts should be placed in a user-specific directory; for example, place them in a directory like the following:

  /u01/nosql/example-user/security
    client.trust
    hive-nosql.login
    hive-nosql-server.jar
    example-user.passwd
    /example-user-wallet.dir
      cwallet.sso

In addition to installing the above artifacts under /u01/nosql/example-user/securtiy, you must also add the server side JAR file to both the classpath employed by Hive, and the classpath employed by Big Data SQL. This is necessary so that the table data associated with the username and password specified in those credentials can be successfully queried.

— Configuring the BDL's Hive Classpath for Queries Against a Secure Store —

If you are using Hive (rather than Big Data SQL) to query a secure KVStore, then the classpath employed by Hive must include the server side JAR file. This is accomplished by adding that JAR file to the /opt/hive-aux-jars directory; that is,

  > cd /opt/hive-aux-jars
  > sudo ln -s /u01/nosql/example-user/security/hive-nosql-server.jar hive-nosql-server.jar

Thus, like the Oracle Wallet JAR files, as of version 4.5.0 of the Big Data Lite VM, it is no longer necessary to add the server side JAR file to the HIVE_AUX_JARS_PATH in /etc/hive/conf.bigdatalite/hive-env.sh.

— Configuring the BDL's Big Data SQL Classpath for Queries Against a Secure Store —

If you are using Big Data SQL rather than Hive to query a secure KVStore, then the server side JAR file must be added to the Big Data SQL classpath. This is accomplished by editing the file /u01/bigdatasql_config/bigdata.properties, and adding that JAR file to the property named, java.classpath.oracle; that is,

  > edit /u01/bigdatasql_config/bigdata.properties

    java.classpath.oracle=/u01/app/oracle/product/12.1.0.2/dbhome_1/jlib/oracle-hadoop-sql.jar:\
                       /u01/app/oracle/product/12.1.0.2/dbhome_1/jlib/oraloader.jar:\
                       ....
                       /u01/bigdatasql_config/hive_aux_jars/kvclient.jar:\
                       /u01/bigdatasql_config/hive_aux_jars/threetenbp.jar:
                       /u01/bigdatasql_config/hive_aux_jars/oraclepki.jar:\
                       /u01/bigdatasql_config/hive_aux_jars/osdt_cert.jar:\
                       /u01/bigdatasql_config/hive_aux_jars/osdt_core.jar:\
                       /u01/nosql/example-user/security/hive-nosql-server.jar

At this point, after configuring the Big Data Lite VM in the manner described above, you can follow the steps presented in the next sections to execute Hive and/or Big Data SQL queries against the table data in the desired KVStore; either non-secure or secure.

Using Hive and Big Data SQL to Query Data in a KVStore

In order to query data in a KVStore from a Big Data SQL system (for example, a Big Data Lite VM), you must first create and map a Hive external table to the table defined in the store; whether the table is located in a non-secure store or a secure store. That is, given the configuration described in the previous section, login to the Big Data Lite VM and if the Oracle NoSQL Database vehicleTable and rmvTable are located in a non-secure store, then enter the Hive CLI and type the following commands:

  > hive

  hive> CREATE EXTERNAL TABLE IF NOT EXISTS 
          vehicleTable (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT) 
          STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' 
          TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
                         "oracle.kv.hosts" = "kv-host-1:5000",
                         "oracle.kv.tableName" = "vehicleTable",
                          "oracle.kv.hadoop.hosts" = "bigdatalite.localdomain");

  hive> CREATE EXTERNAL TABLE IF NOT EXISTS 
          rmvTable (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING, 
                    SSN BIGINT, 
                    GENDER STRING, 
                    LICENSE BINARY, 
                    PHONEINFO MAP<STRING, STRING>, 
                    ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>, 
                    VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
          STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' 
          TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
                         "oracle.kv.hosts" = "kv-host-1:5000",
                         "oracle.kv.tableName" = "rmvTable",
                          "oracle.kv.hadoop.hosts" = "bigdatalite.localdomain");

On the other hand, if the store in which vehicleTable and rmvTable are located is a secure store, then enter the Hive CLI and type commands like these:

  > hive

  hive> CREATE EXTERNAL TABLE IF NOT EXISTS 
          vehicleTablePasswd (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT) 
          STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' 
          TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
                         "oracle.kv.hosts" = "kv-host-1:5000",
                         "oracle.kv.tableName" = "vehicleTable",
                         "oracle.kv.hadoop.hosts" = "bigdatalite.localdomain",
                         "oracle.kv.security" = "/u01/nosql/example-user/security/hive-nosql.login", 
                         "oracle.kv.ssl.trustStore" = "/u01/nosql/example-user/security/client.trust", 
                         "oracle.kv.auth.username" = "example-user", 
                         "oracle.kv.auth.pwdfile.file" = "/u01/nosql/example-user/security/example-user.passwd");

  hive> CREATE EXTERNAL TABLE IF NOT EXISTS 
          vehicleTableWallet (TYPE STRING, MAKE STRING, MODEL STRING, CLASS STRING, COLOR STRING, PRICE DOUBLE, COUNT INT) 
          STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' 
          TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
                         "oracle.kv.hosts" = "kv-host-1:5000",
                         "oracle.kv.tableName" = "vehicleTable",
                         "oracle.kv.hadoop.hosts" = "bigdatalite.localdomain",
                         "oracle.kv.security" = "/u01/nosql/example-user/security/hive-nosql.login", 
                         "oracle.kv.ssl.trustStore" = "/u01/nosql/example-user/security/client.trust", 
                         "oracle.kv.auth.username" = "example-user", 
                         "oracle.kv.auth.wallet.dir" = "/u01/nosql/example-user/security/example-user-wallet.dir");

  hive> CREATE EXTERNAL TABLE IF NOT EXISTS 
          rmvTablePasswd (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING, 
                    SSN BIGINT, 
                    GENDER STRING, 
                    LICENSE BINARY, 
                    PHONEINFO MAP<STRING, STRING>, 
                    ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>, 
                    VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
          STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' 
          TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
                         "oracle.kv.hosts" = "kv-host-1:5000",
                         "oracle.kv.tableName" = "rmvTable",
                         "oracle.kv.hadoop.hosts" = "bigdatalite.localdomain",
                         "oracle.kv.security" = "/u01/nosql/example-user/security/hive-nosql.login", 
                         "oracle.kv.ssl.trustStore" = "/u01/nosql/example-user/security/client.trust", 
                         "oracle.kv.auth.username" = "example-user", 
                         "oracle.kv.auth.pwdfile.file" = "/u01/nosql/example-user/security/example-user.passwd");

  hive> CREATE EXTERNAL TABLE IF NOT EXISTS 
          rmvTableWallet (ZIPCODE STRING, LASTNAME STRING, FIRSTNAME STRING, 
                    SSN BIGINT, 
                    GENDER STRING, 
                    LICENSE BINARY, 
                    PHONEINFO MAP<STRING, STRING>, 
                    ADDRESS STRUCT<NUMBER:INT, STREET:STRING, UNIT:INT, CITY:STRING, STATE:STRING, ZIP:INT>, 
                    VEHICLEINFO ARRAY<STRUCT<TYPE:STRING, MAKE:STRING, MODEL:STRING, CLASS:STRING, COLOR:STRING, VALUE:FLOAT, TAX:DOUBLE, PAID:BOOLEAN>>)
          STORED BY 'oracle.kv.hadoop.hive.table.TableStorageHandler' 
          TBLPROPERTIES ("oracle.kv.kvstore" = "example-store",
                         "oracle.kv.hosts" = "kv-host-1:5000",
                         "oracle.kv.tableName" = "rmvTable",
                         "oracle.kv.hadoop.hosts" = "bigdatalite.localdomain",
                         "oracle.kv.security" = "/u01/nosql/example-user/security/hive-nosql.login", 
                         "oracle.kv.ssl.trustStore" = "/u01/nosql/example-user/security/client.trust", 
                         "oracle.kv.auth.username" = "example-user", 
                         "oracle.kv.auth.wallet.dir" = "/u01/nosql/example-user/security/example-user-wallet.dir");

At this point, you can execute Hive queries against the data in both the vehicleTable and the rmvTable.

Note that although the tables in the KVStore are named vehicleTable and rmvTable, the Hive tables that are created above are named, respectively, vehicleTable and rmvTable if the KVStore is not configured for security, vehicleTablePasswd and rmvTablePasswd if the KVStore is secure and you store the password in a password file, and finally, vehicleTableWallet and rmvTableWallet if the KVStore is secure and you store the password in an Oracle Wallet. Also note that the value of the oracle.kv.hadoop.hosts property is bigdatalite.localdomain. This is the hostname of the Big Data Lite VM; thus, it is the name of the host on which the Hadoop DataNode(s) execute.

After you create and map the external Hive tables to their corresponding KVStore tables, to then use Big Data SQL to query the data in those KVStore tables, you must apply the data model mapping presented previously, along with the schemas for the vehicleTable and rmvTable, to create and map the corresponding Oracle RDBMS external tables to each of the Hive tables that you created. But before doing this, you must first create a user in the RDBMS, with the appropriate privileges; as explained in the next section.

— Creating an Example User in the RDBMS —

As of version 4.2, the Big Data Lite VM now employs the multitenant feature of the Oracle RDBMS. As a result, Big Data SQL commands and queries must be executed by a user with the appropriate privileges. Thus, before creating any tables or executing any Big Data SQL queries, you should first create an example user by executing commands like the following:

  > sqlplus sys/welcome1@orcl as sysdba

  SQL> CREATE TABLESPACE NOSQL_EXAMPLE_TBLSPACE
         DATAFILE '/u01/app/oracle/oradata/cdb/orcl/nosql_example_tblspace.dbf'
           SIZE 500M AUTOEXTEND ON MAXSIZE UNLIMITED
         NOLOGGING
         DEFAULT COMPRESS
         EXTENT MANAGEMENT
           LOCAL AUTOALLOCATE
         SEGMENT SPACE MANAGEMENT
           AUTO;

  SQL> CREATE USER NOSQL_EXAMPLE_USER
         IDENTIFIED BY welcome1
         DEFAULT TABLESPACE NOSQL_EXAMPLE_TBLSPACE
         QUOTA UNLIMITED ON NOSQL_EXAMPLE_TBLSPACE;

  SQL> GRANT CREATE ANY DIRECTORY TO NOSQL_EXAMPLE_USER;
  SQL> GRANT DROP ANY DIRECTORY TO NOSQL_EXAMPLE_USER;
  SQL> GRANT CREATE TABLE TO NOSQL_EXAMPLE_USER;
  SQL> GRANT CREATE VIEW TO NOSQL_EXAMPLE_USER;
  SQL> GRANT CREATE MINING MODEL TO NOSQL_EXAMPLE_USER;
  SQL> GRANT CREATE SESSION TO NOSQL_EXAMPLE_USER;
  SQL> GRANT CREATE PROCEDURE TO NOSQL_EXAMPLE_USER;
  SQL> GRANT ADVISOR TO NOSQL_EXAMPLE_USER;
  SQL> GRANT EXECUTE,READ,WRITE ON DIRECTORY
         DEFAULT_DIR TO NOSQL_EXAMPLE_USER
         WITH GRANT OPTION;
  SQL> GRANT EXECUTE,READ,WRITE ON DIRECTORY
         ORACLE_BIGDATA_CONFIG TO NOSQL_EXAMPLE_USER
         WITH GRANT OPTION;
  SQL> exit;

At this point, you can login as the new user and execute the Big Data SQL commands and queries presented in the next section to query the data in the associated KVStore tables.

— Using Big Data SQL to Query Data in a KVStore —

As described in the previous sections, to use Big Data SQL to query the data in the KVStore, you must create and map Oracle RDBMS external tables to each Hive table you created for the corresponding KVStore table. To do this, login to the Oracle RDBMS as the new user and execute the commands and queries shown below at the sqlplus command prompt:

  > sqlplus NOSQL_EXAMPLE_USER/welcome1@orcl

  SQL> CREATE TABLE vehicleTable (type VARCHAR2(10), make VARCHAR2(12), model VARCHAR2(20),
                                      class VARCHAR2(40), color VARCHAR2(20), price NUMBER(8,2),
                                      count NUMBER)
                        ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
                                               ACCESS PARAMETERS (com.oracle.bigdata.debug=true
                                               com.oracle.bigdata.log.opt=normal)) REJECT LIMIT UNLIMITED;

  SQL> CREATE TABLE vehicleTablePasswd (type VARCHAR2(10), make VARCHAR2(12), model VARCHAR2(20),
                                            class VARCHAR2(40), color VARCHAR2(20), price NUMBER(8,2),
                                            count NUMBER) 
                        ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
                                               ACCESS PARAMETERS (com.oracle.bigdata.debug=true
                                               com.oracle.bigdata.log.opt=normal)) REJECT LIMIT UNLIMITED;

  SQL> CREATE TABLE vehicleTableWallet (type VARCHAR2(10), make VARCHAR2(12), model VARCHAR2(20), 
                                            class VARCHAR2(40), color VARCHAR2(20), price NUMBER(8,2), 
                                            count NUMBER) 
                        ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
                                               ACCESS PARAMETERS (com.oracle.bigdata.debug=true
                                               com.oracle.bigdata.log.opt=normal)) REJECT LIMIT UNLIMITED;

  SQL> CREATE TABLE rmvTable (ZIPCODE VARCHAR2(7), LASTNAME VARCHAR2(20), FIRSTNAME VARCHAR2(20),
                                            SSN NUMBER, GENDER VARCHAR2(6), LICENSE VARCHAR2(9), PHONEINFO VARCHAR2(67), ADDRESS VARCHAR2(100),
                                            VEHICLEINFO VARCHAR2(1000))
                        ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
                                               ACCESS PARAMETERS (com.oracle.bigdata.debug=true
                                               com.oracle.bigdata.log.opt=normal)) REJECT LIMIT UNLIMITED;

  SQL> CREATE TABLE rmvTablePasswd (ZIPCODE VARCHAR2(7), LASTNAME VARCHAR2(20), FIRSTNAME VARCHAR2(20),
                                            SSN NUMBER, GENDER VARCHAR2(6), LICENSE VARCHAR2(9), PHONEINFO VARCHAR2(67), ADDRESS VARCHAR2(100),
                                            VEHICLEINFO VARCHAR2(1000))
                        ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
                                               ACCESS PARAMETERS (com.oracle.bigdata.debug=true
                                               com.oracle.bigdata.log.opt=normal)) REJECT LIMIT UNLIMITED;

  SQL> CREATE TABLE rmvTableWallet (ZIPCODE VARCHAR2(7), LASTNAME VARCHAR2(20), FIRSTNAME VARCHAR2(20),
                                            SSN NUMBER, GENDER VARCHAR2(6), LICENSE VARCHAR2(9), PHONEINFO VARCHAR2(67), ADDRESS VARCHAR2(100),
                                            VEHICLEINFO VARCHAR2(1000))
                        ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
                                               ACCESS PARAMETERS (com.oracle.bigdata.debug=true
                                               com.oracle.bigdata.log.opt=normal)) REJECT LIMIT UNLIMITED;

Note that, except for the name specified for the table, the commands related to vehicleTable are identical. Similarly, the commands related to rmvTable are also identical except for the table name. This is how an Oracle external table is mapped to the desired Hive table. If you want to create an Oracle external table with a name that is different than the Hive table to which it is mapped, then you can specify the com.oracle.bigdata.tablename property in the ACCESS PARAMETERS; for example,

  SQL> CREATE TABLE bigdataSQL_vehicleTableWallet (type VARCHAR2(10), make VARCHAR2(12), model VARCHAR2(20), 
                                            class VARCHAR2(40), color VARCHAR2(20), price NUMBER(8,2), 
                                            count NUMBER) 
                        ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
                                               ACCESS PARAMETERS (com.oracle.bigdata.tablename=vehicleTableWallet 
                                               com.oracle.bigdata.debug=true
                                               com.oracle.bigdata.log.opt=normal)) REJECT LIMIT UNLIMITED;

Note that if you do not employ the com.oracle.bigdata.tablename property, then the name you specify for the Oracle external table must be indentical to the name of the corresponding Hive external table. Specifically, if you specify a name for the Oracle external table different than the name of the corresponding Hive table, and you do not employ the com.oracle.bigdata.tablename property to specify the name of the desired Hive table, then an error will occur.

After creating the Oracle external tables above, you can execute SQL queries against the corresponding KVStore table data. For example, to query the data contained in the KVStore's vehicleTable, you can execute queries such as:

  SQL> set linesize 200;

  SQL> SELECT * FROM vehicleTable;
  SQL> SELECT count(*) FROM vehicleTable;
  SQL> SELECT min(price) FROM vehicleTable;
  SQL> SELECT * FROM vehicleTable WHERE make='GM';
  SQL> SELECT * FROM vehicleTable WHERE model='Equinox';
  SQL> SELECT * FROM vehicleTable WHERE model='Camaro';
  SQL> SELECT * FROM vehicleTable WHERE model='Silverado1500';

  SQL> SELECT * FROM vehicleTablePasswd;
  SQL> SELECT count(*) FROM vehicleTablePasswd;
  SQL> SELECT min(price) FROM vehicleTablePasswd;
  SQL> SELECT * FROM vehicleTablePasswd WHERE make='GM';
  SQL> SELECT * FROM vehicleTablePasswd WHERE model='Equinox';
  SQL> SELECT * FROM vehicleTablePasswd WHERE model='Camaro';
  SQL> SELECT * FROM vehicleTablePasswd WHERE model='Silverado1500';

  SQL> SELECT * FROM vehicleTableWallet;
  SQL> SELECT min(price) FROM vehicleTableWallet;
  SQL> SELECT * FROM vehicleTableWallet WHERE make='GM';
  SQL> SELECT * FROM vehicleTableWallet WHERE model='Equinox';
  SQL> SELECT * FROM vehicleTableWallet WHERE model='Camaro';
  SQL> SELECT * FROM vehicleTableWallet WHERE model='Silverado1500';

  SQL> set linesize 200;

  SQL> SELECT * FROM rmvTable;
  SQL> SELECT * FROM rmvTablePasswd;
  SQL> SELECT * FROM rmvTableWallet;

  SQL> SELECT lastname,firstname,gender,address FROM rmvTable; 
  SQL> SELECT lastname,firstname,gender,address FROM rmvTablePasswd; 
  SQL> SELECT lastname,firstname,gender,address FROM rmvTableWallet; 

  SQL> SELECT min(ssn) FROM rmvTable;
  SQL> SELECT min(ssn) FROM rmvTablePasswd;
  SQL> SELECT min(ssn) FROM rmvTableWallet;

  SQL> SELECT count(*) FROM rmvTable;
  SQL> SELECT count(*) FROM rmvTablePasswd;
  SQL> SELECT count(*) FROM rmvTableWallet;

  SQL> SELECT firstname,lastname,phoneinfo FROM rmvTable; 
  SQL> SELECT firstname,lastname,phoneinfo FROM rmvTablePasswd; 
  SQL> SELECT firstname,lastname,phoneinfo FROM rmvTableWallet; 

  SQL> SELECT vehicleinfo FROM rmvTable;
  SQL> SELECT vehicleinfo FROM rmvTablePasswd;
  SQL> SELECT vehicleinfo FROM rmvTableWallet;

To achieve the more complicated query functionality demonstrated by the correspondng Hive queries that dereferenced individual field values of the non-primitive data types, you can employ either Oracle Regular Expression Functions such as REGEXP_LIKE and REGEXP_SUBSTR, or Oracle SQL JSON Operators such as JSON_QUERY and JSON_EXISTS (or a combination). For example, consider the following queries that use Oracle Regular Expressions:

  SQL> SELECT firstname,lastname,address,vehicleinfo FROM rmvTable WHERE REGEXP_LIKE (lastname, '^[H].*');
  SQL> SELECT firstname,lastname,address,vehicleinfo FROM rmvTablePasswd WHERE REGEXP_LIKE (lastname, '^[H].*');
  SQL> SELECT firstname,lastname,address,vehicleinfo FROM rmvTableWallet WHERE REGEXP_LIKE (lastname, '^[H].*');

  SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary make: GM" 
       FROM rmvTable WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"make":"GM"');
  SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary make: GM" 
       FROM rmvTablePasswd WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"make":"GM"');
  SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary make: GM" 
       FROM rmvTableWallet WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"make":"GM"');

  SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Equinox" 
       FROM rmvTable WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Equinox"');
  SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Equinox" 
       FROM rmvTablePasswd WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Equinox"');
  SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Equinox" 
       FROM rmvTableWallet WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Equinox"');

  SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Camaro" 
       FROM rmvTable WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Camaro"');
  SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Camaro" 
       FROM rmvTablePasswd WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Camaro"');
  SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Camaro" 
       FROM rmvTableWallet WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Camaro"');

  SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Silverado" 
       FROM rmvTable WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Silverado');
  SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Silverado" 
       FROM rmvTablePasswd WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Silverado');
  SQL> SELECT firstname,lastname,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary model: Silverado" 
       FROM rmvTableWallet WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"model":"Silverado');

  SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary fee NOT paid" 
       FROM rmvTable WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"paid":false');
  SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary fee NOT paid" 
       FROM rmvTablePasswd WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"paid":false');
  SQL> SELECT firstname,lastname,address,REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1) "Primary fee NOT paid" 
       FROM rmvTableWallet WHERE REGEXP_LIKE (REGEXP_SUBSTR(VEHICLEINFO, '\{([[:alnum:]":,\.]+){1,3}\}',1,1), '"paid":false');

Finally, consider the following example queries that employ Oracle SQL JSON Operators:

  SQL> SELECT FIRSTNAME, LASTNAME, j.ADDRESS.street, j.ADDRESS.city, j.ADDRESS.state, j.VEHICLEINFO.model FROM rmvTable j;
  SQL> SELECT FIRSTNAME, LASTNAME, j.ADDRESS.street, j.ADDRESS.city, j.ADDRESS.state, j.VEHICLEINFO.model FROM rmvTablePasswd j;
  SQL> SELECT FIRSTNAME, LASTNAME, j.ADDRESS.street, j.ADDRESS.city, j.ADDRESS.state, j.VEHICLEINFO.model FROM rmvTableWallet j;

  SQL> SELECT JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) FROM rmvTable;
  SQL> SELECT JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) FROM rmvTablePasswd;
  SQL> SELECT JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) FROM rmvTableWallet;

  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: GM" 
       FROM rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%';
  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: GM" 
       FROM rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%';
  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: GM" 
       FROM rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%';

  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Equinox"
       FROM rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Equinox%';
  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Equinox"
       FROM rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Equinox%';
  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Equinox"
       FROM rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Equinox%';

  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Camaro"
       FROM rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Camaro%';
  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Camaro"
       FROM rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Camaro%';
  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Camaro"
       FROM rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Camaro%';

  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Silverado"
       FROM rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Silverado%';
  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Silverado"
       FROM rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Silverado%';
  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(VEHICLEINFO, '$[0]' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle: Silverado"
       FROM rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) LIKE '%Silverado%';

  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(
         VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Model",
           JSON_QUERY(VEHICLEINFO, '$[0].value' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Value",
           JSON_QUERY(VEHICLEINFO, '$[0].tax' WITH CONDITIONAL WRAPPER) AS "Tax Owed",
           JSON_QUERY(VEHICLEINFO, '$[0].paid' WITH CONDITIONAL WRAPPER) AS "Tax Paid" 
       FROM rmvTable WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%';
  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(
         VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Model",
           JSON_QUERY(VEHICLEINFO, '$[0].value' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Value",
           JSON_QUERY(VEHICLEINFO, '$[0].tax' WITH CONDITIONAL WRAPPER) AS "Tax Owed",
           JSON_QUERY(VEHICLEINFO, '$[0].paid' WITH CONDITIONAL WRAPPER) AS "Tax Paid" 
       FROM rmvTablePasswd WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%';
  SQL> SELECT FIRSTNAME,LASTNAME,ADDRESS,JSON_QUERY(
         VEHICLEINFO, '$[0].model' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Model",
           JSON_QUERY(VEHICLEINFO, '$[0].value' WITH CONDITIONAL WRAPPER) AS "Primary Vehicle Value",
           JSON_QUERY(VEHICLEINFO, '$[0].tax' WITH CONDITIONAL WRAPPER) AS "Tax Owed",
           JSON_QUERY(VEHICLEINFO, '$[0].paid' WITH CONDITIONAL WRAPPER) AS "Tax Paid" 
       FROM rmvTableWallet WHERE JSON_QUERY(VEHICLEINFO, '$[0].make' WITH CONDITIONAL WRAPPER) LIKE '%GM%';

  SQL> SELECT JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street"
       FROM rmvTable WHERE JSON_EXISTS(ADDRESS, '$.street');
  SQL> SELECT JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street"
       FROM rmvTablePasswd WHERE JSON_EXISTS(ADDRESS, '$.street');
  SQL> SELECT JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street"
       FROM rmvTableWallet WHERE JSON_EXISTS(ADDRESS, '$.street');

  SQL> SELECT JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street"
       FROM rmvTable WHERE JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) LIKE '%High Street%';
  SQL> SELECT JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street"
       FROM rmvTablePasswd WHERE JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) LIKE '%High Street%';
  SQL> SELECT JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) AS "Street"
       FROM rmvTableWallet WHERE JSON_QUERY(ADDRESS, '$.street' WITH CONDITIONAL WRAPPER) LIKE '%High Street%';

Appendix A: Create and Populate 'rmvTable' with Example Data In order to run any of the example Hive or Big Data SQL queries presented in this document, a KVStore — either secure or non-secure — must first be deployed, and a table must be created and populated with data. Thus, before attempting to execute any queries, either deploy a non-secure KVStore using the steps outlined in `Appendix A` of the Hadoop/Table API Example, or start a KVStore configured for security using the steps presented in `Appendix B` of that document. Once a KVStore has been deployed, the standalone Java program `LoadRmvTable` can then be run against either type of store to create a table with the name and schema required by the various example queries, and then populate the table with rows of data consistent with that schema. Once the table is created and populated with example data, queries such as those presented in `Example 1`, `Example 2`, or the section on `Big Data SQL` can be executed against the data in that table.

Schema for the Example 'rmvTable'

To execute any of the example queries presented in Example 2, of this document a table named rmvTable — having the schema shown in the table below — must be created in the KVStore that was deployed for that example; where the data types specified in the schema are defined by the Oracle NoSQL Database Table API (see oracle.kv.table.FieldDef.Type).

'rmvTable' Schema
Field Name	Field Type
zipcode	STRING
lastname	STRING
firstname	STRING
ssn	LONG
gender	ENUM
license	FIXED_BINARY(9)
phoneinfo	MAP(STRING)
address	RECORD(field_name : field_type)
address Record Schema	number : INTEGER
	street : STRING
	unit : INTEGER
	city : STRING
	state : STRING
	zip : INTEGER
vehicleinfo	ARRAY(RECORD(field_name : field_type))
vehicleinfo Element Record Schema	type : STRING
	make : STRING
	model : STRING
	class : STRING
	color : STRING
	value : FLOAT
	tax : DOUBLE
	paid : BOOLEAN
Primary Key Field Names
zipcode	lastname	firstname	ssn
Shard Key Field Names
zipcode

Thus, rmvTable will consist of rows of data the Registry of Motor Vehicles might maintain about vehicle owners who have registered a primary vehicle and (optionally) a second and maybe a third vehicle. In addition to personal information about each owner — such as name, address, gender, phone number(s), etc. — each row of data also contains an array in which each element of the array is a record whose contents consists of information about each vehicle the owner registers. For example, in addition to vehicle attributes such as the make, model, color, etc., the record will also contain the vehicle's assessed value, registration fee (the tax), and whether or not the owner has paid the fee. Although the table schema presented above may seem a bit contrived, it is intended to demonstrate a broad spectrum of data types from the Oracle NoSQL Database Table API.

Although you can enter individual commands in the admin CLI to create a table with the above schema, an alternative mechanism is to follow the instructions presented in the next sections to compile and execute the LoadRmvTable program; which will use the Oracle NoSQL Database Data Definition Language (DDL) to create as well as populate the desired table.

— Compiling the LoadRmvTable Program —

After the KVStore — either non-secure or secure — has been deployed, before executing LoadRmvTable, that program must first be compiled. Thus, assuming you installed the separate example distribution under the directory /opt/apps, you would type the following command at the OS command line:

  > cd /opt/apps/kv
  > javac -classpath /opt/ondb/kv/lib/kvstore.jar:examples examples/hadoop/hive/table/LoadRmvTable.java

which should produce the file:

  /opt/apps/kv/examples/hadoop/hive/table/LoadRmvTable.class

— Creating and Populating 'rmvTable' with Example Data in a Non-Secure KVStore —

To execute LoadRmvTable to create and then populate the table named rmvTable with example data in a KVStore configured for non-secure access, type the following at the command line of a node that has network connectivity with a node running the admin service (for example, kv-host-1 itself):

  > cd /opt/apps/kv
  > java -classpath /opt/ondb/kv/lib/kvstore.jar:examples hadoop.table.LoadRmvTable \
             -store example-store -host kv-host-1 -port 5000 -nops 79 [-delete]

where the parameters -store, -host, -port, and -nops are required.

In the example command line above, the argument -nops 79 specifies that 79 rows be written to the rmvTable. If more or less than that number of rows is desired, then the value of the -nops parameter should be changed.

If LoadRmvTable is executed a second time and the optional -delete parameter is specified, then all rows added by any previous executions of LoadRmvTable are deleted from the table prior to adding the new rows. Otherwise, all pre-existing rows are left in place, and the number of rows in the table will be increased by the specified -nops number of new rows.

— Creating and Populating 'rmvTable' with Example Data in a Secure KVStore —

To execute LoadRmvTable against a secure KVStore deployed and provisioned with a non-administrative user employing the steps presented in Appendix B of the Hadoop/Table API Example, an additional parameter must be added to the command line above. That is, type the following:

  > cd /opt/apps/kv
  > javac -classpath /opt/ondb/kv/lib/kvclient.jar:/opt/ondb/kv/lib/threetenbp.jar:examples examples/hadoop/hive/table/LoadRmvTable.java
  > cp /opt/ondb/example-store/security/client.trust /tmp
  > java -classpath /opt/ondb/kv/lib/kvstore.jar:examples hadoop.table.LoadRmvTable \
             -store example-store -host kv-host-1 -port 5000 -nops 79 \
             -security /tmp/example-user-client-pwdfile.login
             [-delete]

where the role of the single additonal -security parameter is identical to the role of that parameter when executing the /LoadVehicleTable program; as explained in Appendix B of the Hadoop/Table API Example.

At this point, the rmvTable created in the specified KVStore (non-secure or secure) should be populated with the desired example data; which can then be queried via Hive HQL or Big Data SQL.

Appendix B: Setting the HIVE_AUX_JARS_PATH Environment Variable

In order to make the Table API Hive Integration classes available to the Hive client's Java VM, as well as the Java VMs of the DataNodes of the Hadoop Cluster, kvclient.jar and threetenbp.jar must be added to the classpaths of those respective VMs. To do this, both kvclient.jar and threetenbp.jar must be added to the Hive client's HIVE_AUX_JARS_PATH environment variable. One way to achieve this is to simply export a new value for HIVE_AUX_JARS_PATH from the Hive client's command line; that is,

  > export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar

The other, more permanent (and preferable), way to achieve this is to modify the hive-env.sh script located in the HIVE_CONF_DIR of the Hive client. For example,

  > cd $HIVE_CONF_DIR
  > edit hive-env.sh

    if [ -z "$HIVE_AUX_JARS_PATH" ]; then
        export HIVE_AUX_JARS_PATH=/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar
    else
        export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar
    fi

It is important to note that in both cases above, the separator that is used is a comma, not a colon.

Appendix C: Hive and Oracle NoSQL Security

With respect to running Hive queries against table data contained in a secure KVStore, a particularly important issue to address involves the configuration of the Hive client environment, as well as the creation and installation of a set of artifacts; which together support the communication of user credentials to the various components that participate in the execution of a query. Because the execution of a Hive query can result in the initiation of a MapReduce job, before proceeding you should familiarize yourself with the build model presented in Appendix C of the Hadoop/Table API Example; which describes the necessary artifacts in detail — their purpose, as well as how to generate and install them.

— Generating the Login, Trust, and Password Artifacts —

To execute a Hive query against a secure KVStore, the necessary public and private credentials must be incorporated in the definition of the Hive table that will be queried. To do this, in a fashion similar to that presented in Appendix C of the Hadoop/Table API Example, you must create artifacts like those shown below, and store them on the Hive client's local file system; for example,

  /tmp
    client.trust
    hive-nosql.login
    example-user.passwd

Or, if wallet storage will be employed:

  /tmp
    client.trust
    hive-nosql.login
    /example-user-wallet.dir
      cwallet.sso

where hive-nosql.login is analogous (actually, identical) to example-user-server.login in Appendix C.

— Generating the Server Side JAR File —

After creating the files above, the following server side JAR file should be generated so that it can be added to the HIVE_AUX_JARS_PATH environment variable (see below). That is, type the following at the command line:

  > cd /tmp
  > jar cvf /opt/apps/kv/examples/hive-nosql-server.jar client.trust
  > jar uvf /opt/apps/kv/examples/hive-nosql-server.jar hive-nosql.login

which produces the JAR file named hive-nosql-server.jar, with contents that include only public credentials and which look something like:

     0 Mon May 04 13:01:04 PDT 2015 META-INF/
    68 Mon May 04 13:01:04 PDT 2015 META-INF/MANIFEST.MF
   508 Wed Apr 22 12:23:32 PDT 2015 client.trust
   255 Mon May 04 11:30:54 PDT 2015 hive-nosql.login

— Setting the HIVE_AUX_JARS_PATH Environment Variable —

Recall that Appendix B of this document described how to add kvclient.jar and threetenbp.jar to the Hive client's HIVE_AUX_JARS_PATH environment variable in order to run a Hive query against a non-secure KVStore. In order to run a Hive query against a secure KVStore, in addition to adding kvclient.jar and threetenbp.jar, a number of other JAR files must also be added to HIVE_AUX_JARS_PATH; depending on how the password is stored.

If the password is stored in a password file, then in addition to kvclient.jar and threetenbp.jar, the hive-nosql-server.jar file must also be added to HIVE_AUX_JARS_PATH; that is, do one of the following:

  > export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar,/opt/apps/kv/examples/hive-nosql-server.jar

  > cd HIVE_CONF_DIR
  > edit hive-env.sh

    if [ -z "$HIVE_AUX_JARS_PATH" ]; then
        export HIVE_AUX_JARS_PATH=/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar,/opt/apps/kv/examples/hive-nosql-server.jar
    else
        export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,/opt/ondb/kv/lib/kvclient.jar,/opt/ondb/kv/lib/threetenbp.jar,/opt/apps/kv/examples/hive-nosql-server.jar
    fi

On the other hand, if the password is stored in an Oracle Wallet, then you would also add three wallet related JAR files that are only provided with the Enterprise Edition of Oracle NoSQL Database. That is, you would do one of the following:

  > export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH, \
                                  /opt/ondb/kv/lib/kvclient.jar,  \
                                  /opt/ondb/kv/lib/threetenbp.jar,  \
                                  /opt/ondb/kv/lib/oraclepki.jar, \
                                  /opt/ondb/kv/lib/osdt_cert.jar, \
                                  /opt/ondb/kv/lib/osdt_core.jar, \
                                  /opt/apps/kv/examples/hive-nosql-server.jar

  > cd HIVE_CONF_DIR
  > edit hive-env.sh

    if [ -z "$HIVE_AUX_JARS_PATH" ]; then
        export HIVE_AUX_JARS_PATH=/opt/ondb/kv/lib/kvclient.jar,  \
                                  /opt/ondb/kv/lib/threetenbp.jar,  \
                                  /opt/ondb/kv/lib/oraclepki.jar, \
                                  /opt/ondb/kv/lib/osdt_cert.jar, \
                                  /opt/ondb/kv/lib/osdt_core.jar, \
                                  /opt/apps/kv/examples/hive-nosql-server.jar
    else
        export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH, \
                                  /opt/ondb/kv/lib/kvclient.jar,  \
                                  /opt/ondb/kv/lib/threetenbp.jar,  \
                                  /opt/ondb/kv/lib/oraclepki.jar, \
                                  /opt/ondb/kv/lib/osdt_cert.jar, \
                                  /opt/ondb/kv/lib/osdt_core.jar, \
                                  /opt/apps/kv/examples/hive-nosql-server.jar
    fi

Once Hive has been configured for Oracle NoSQL Database security in the manner just presented, you can then incorporate the necessary artifacts in your Hive external table definition in the fashion presented previously in this document; for vehicleTable in Example 1, and for rmvTable in Example 2. Note that specifying the security related artifacts in the manner presented in those examples is akin to specifying that information on the command line (rather than via classpath resources) when initiating a MapReduce job.

Appendix D: Support for Predicate Pushdown

To improve query performance, Apache Hive and Oracle Big Data SQL both support a form of what is referred to as predicate pushdown; in which the client side frontend processing of a given query decomposes the WHERE clause (the predicate) of the query into column information and corresponding comparison operators, passing (pushing) the resulting components to the database where search processing (filtering) can be performed on the database's server side backend. To achieve analogous performance improvements, the Oracle NoSQL Database Table API Hive Integration classes support similar predicate pushdown functionality when executing Hive or Big Data SQL queries against data in an Oracle NoSQL Database table. For example, consider the following query executed against the example vehicleTable whose schema was described previously in this document:

  SELECT * FROM vehicleTable WHERE type = 'auto' AND make = 'Chrysler' AND model >= 'Imperial' AND model < 'Sebring';

This query will return all rows corresponding to automobiles (rather than trucks or SUVs) made by Chrysler; whose model is 'Imperial', 'Lebaron', or 'PTCruiser', but not 'Sebring'. If predicate pushdown is not employed when executing this query, then all rows from the vehicleTable will be retrieved from the KVStore's backend database and returned to the frontend client, where the predicate information will be applied to search the returned rows for the desired matches. On the other hand, if predicate pushdown is employed, then the information in the WHERE clause is sent to the KVStore and all filtering is performed in the database itself, so that only the rows of the vehicleTable that match the predicate are returned to the client. It should be clear then that predicate pushdown, when it can be employed, can result in significant performance improvements.

As Examples 1, 2, and 3 of this document demonstrate, the variety of predicates that can be employed when querying a table can be virtually unlimited. So it is important to understand that the predicates that can actually be pushed to the backend KVStore database are restricted to a finite subset of all possible predicates. This is because the predicates that can be supported by Oracle NoSQL Database are not only dependent on what the Hive and Big Data SQL predicate pushdown mechanisms support, but the semantics of the Oracle NoSQL Database Table API as well. As a result, the operators that are supported by the predicate pushdown mechanism of the Table API Hive Integration classes are currently limited to:

  =   <   <=>   >=   AND   OR   IN

In addition to the above set of operators, the semantics of the Table API can also affect how the table's fields (columns) will be handled during predicate pushdown. Specifically, for a given query's predicate, if a valid PrimaryKey, IndexKey and/or FieldRange (as defined by the Table API) cannot be formed from all or a subset of that predicate's fields, and no part of the predicate can be pushed to the server using the SQL for Oracle NoSQL Database mechanism, then the query's predicate will not be decomposed and sent to the database for backend filtering. Instead, the system will fallback to the default mechanism, and perform all filtering on the client side; applying the predicate to all the rows in the given table.

For example, consider the example query presented above. For that query, each component of the predicate satisfies the necessary criteria for pushdown, and so the whole predicate will be pushed to the database for search processing. To understand this, first observe that the operators referenced in the query's predicate belong to the set described above; that is, '=', 'AND', '>=', '<'. Next, based on the schema of vehicleTable, the fields named type and make form a valid PrimaryKey for performing a table scan; and the predicate components referencing the field named 'model' form a valid FieldRange. Compare this with a query such as,

  SELECT * FROM vehicleTable WHERE make = 'Chrysler' AND model >= 'Imperial' AND model < 'Sebring';

Assuming there is no index of the form (make,model), for this query, although the absence of the key's first field prevents the construction of a valid PrimaryKey as required by the semantics of the Table API, the predicate can still be pushed to the backend store because it is considered valid for filtering by SQL For Oracle NoSQL Database. Finally, consider a query such as,

  SELECT * FROM vehicleTable WHERE model LIKE "%Silverado%";

For this query, predicate pushdown will be bypassed and all filtering will be performed on the client side. This is because the predicate employs the LIKE operator, which is not eligible for predicate pushdown.

Note that the initial two example queries that were presented each result in the whole predicate being pushed and all filtering being performed on the backend; whereas the third example query results in no predicate pushdown and all filtering being performed on the client side. But this does not mean that predicate pushdown will always be handled in such an all-or-nothing manner. On the contrary, for many queries, only part of the predicate will be pushed to the database to produce initial filtered results, which are then further filtered on the client side using the remaining — residual — part of the predicate. For example, consider a query that wishes to find each '4WheelDrive' vehicle in the database that is either a 'Ford' or a 'GM', is 'blue', 'red', or 'yellow', and has a model name that begins with the letter 'E' (GM Equinox, Ford Expedition, or Ford Explorer). Such a query would look like the following:

  SELECT * FROM vehicleTable WHERE class = '4WheelDrive' AND (make = 'Ford' OR make = 'GM') AND color IN ('blue', 'red', 'yellow') AND model LIKE 'E%';

Based on the criteria presented in the next section, the only component of the query's predicate that cannot be pushed to the backend is the component that employs the LIKE operator (model LIKE 'E%'); whereas all other components in the query can be pushed. Thus, when executing the given query, the part of the predicate consisting of the components "class = '4WheelDrive' AND (make = 'Ford' OR make = 'GM') AND color IN ('blue', 'red', 'yellow')" will be pushed to the backend, producing rows referencing all Ford and GM vehicles that are blue, red, or yellow and four wheel drive; after which the client will apply the model LIKE 'E%' residual predicate to the results from the backend, to select and return only those rows with model name beginning with the letter 'E'.

— Predicate Pushdown Criteria —

When processing a given query that includes a predicate, the mechanism provided by the Table API Hive Integration classes will analyze the query's predicate and apply the following criteria to determine whether all, part, or none of the predicate can be pushed to the database for filtering on the backend.

If the query's predicate includes components (column, operator, value) with comparison operators from the set {=, >, >=, <, <="}, as well as zero or more combinations of the AND conjunction, the OR conjunction, and/or one or more IN lists, then the predicate is eligible for predicate pushdown.
Each combination of predicate components that form a valid (as defined by the Table API) PrimaryKey, IndexKey, or FieldRange is eligible for predicate pushdown; using mechanisms that optimize for scale.
If the query's predicate is found to contain multiple combinations that are eligible for predicate pushdown, then the combination resulting in optimal performance and scale will be pushed to the server; where if more than one of those combinations is found to be optimal, then the first such combination will be pushed.
Each component of the query's original predicate that is not eligible/included in the predicate to push will be added to the residual predicate for client side filtering.
If all of the predicate components are found to be ineligible for predicate pushdown, then predicate pushdown will not be performed, and the system will fallback to the original mechanism, using the query's original predicate to perform only client side filtering.

It is important to understand the criteria listed above in conjunction with the data model and search patterns you expect to employ when you define the PrimaryKey, along with any Index, for a given KVStore table that will be queried. That is, although the predicate pushdown mechanism will be employed automatically — without user intervention or special configuration — how you define your table and indexes can impact how well common queries will perform and scale.

Note that predicate pushdown is employed automatically with no obvious indication (other than improved performance) that it is "on and working". As a result, if you wish to verify that the mechanism is indeed operating as described above, you can set the Level of the Loggers named oracle.kv.hadoop.hive.table.TableStorageHandlerBase and oracle.kv.hadoop.hive.table.TableHiveInputFormat to the DEBUG value (see Appendix E for more detail). After setting those loggers to DEBUG, you can run a query and then observe how the predicate pushdown mechanism processes the query's predicate by analyzing the contents of the logger output.

Appendix E: Debug Logging

The Table API Hive Integration classes employ a number of loggers that can be used for debugging. Two of the loggers are associated with the Table API Hadoop Integration classes and descend from the parent logger named oracle.kv.hadoop.table. The remaining loggers are specified by the Table API Hive Integration classes and descend from the oracle.kv.hadoop.hive.table parent logger. In order to enable debug logging, set the logger corresponding to the desired component to DEBUG; or TRACE for more verbose output. To do this, add or change the appropriate items in the Hive logger configuration; for example,

log4j.logger.oracle.kv.hadoop.table.TableInputFormatBase=DEBUG
log4j.logger.oracle.kv.hadoop.table.TableRecordReaderBase=INFO

log4j.logger.oracle.kv.hadoop.hive.table.TableStorageHandlerBase=DEBUG
log4j.logger.oracle.kv.hadoop.hive.table.TableHiveInputFormat=DEBUG
log4j.logger.oracle.kv.hadoop.hive.table.TableHiveRecordReader=INFO
log4j.logger.oracle.kv.hadoop.hive.table.TableSerDeBase=INFO
log4j.logger.oracle.kv.hadoop.hive.table.TableSerDe=INFO
log4j.logger.oracle.kv.hadoop.hive.table.TableFieldTypeEnum=INFO
log4j.logger.oracle.kv.hadoop.hive.table.V1V2TableUtil=INFO

— Hive Debug Logging —

If you are using Hive (rather than Big Data SQL) to query data stored in an Oracle NoSQL Database table and you wish to enable debug logging for the query processing that is performed, then the loggers presented in the previous section must be specified in the Hive logging configuration; and that configuration must be included in the Hive classpath. Since the Hive logging configuration is generally loaded as a Java resource rather than specified as a system property on the command line, there are a couple of ways to add/modify the desired loggers and levels to the Hive logging configuration and then add that configuration to the Hive classpath.

One way to add new logger elements to the Hive configuration is to:

Extract the file named hive-log4j.properties from the JAR file, hive-common-<version>.jar.
Edit the extracted logging configuration file and add the desired loggers and/or change the desired logger levels.
Replace the old version of hive-log4j.properties in hive-common-<version>.jar with the new, modified version of that file.

For example, consider version 4.5.0 of the Big Data Lite VM; which uses the 1.1.0-cdh5.7.0 version of the Cloudera distribution of Hive. To enable debug logging for predicate pushdown processing that is performed when executing a Hive query against a KVStore, you would do something like the following:

> cd /usr/lib/hive/lib
> sudo jar xvf hive-common-1.1.0-cdh5.7.0.jar hive-log4j.properties

> edit hive-log4j.properties

log4j.logger.oracle.kv.hadoop.hive.table.TableStorageHandlerBase=DEBUG
log4j.logger.oracle.kv.hadoop.hive.table.TableHiveInputFormat=DEBUG

> sudo jar uvf hive-common-1.1.0-cdh5.7.0.jar hive-log4j.properties

An alternative way to add new logger elements to the Hive configuration is to create your own hive-log4j.properties file with the desired logging configuration, add it to your own JAR file — for example, hive-conf.jar — and then include that JAR file in the specification of the HIVE_AUX_JARS_PATH environment variable set in the hive-env.sh script located in the HIVE_CONF_DIR of the Hive client.

— Big Data SQL Debug Logging —

If you are using Big Data SQL to query data stored in an Oracle NoSQL Database table and you wish to enable debug logging for the query processing that is performed by the Table API Hive Integration classes, then the loggers employed by those classes must be specified in the file named bigdata-log4j.properties that is provided with each Big Data SQL system. For example, in a manner similar to what was done above for Hive debug logging, if you wish to enable debug logging for predicate pushdown processing when a Big Data SQL query is executed against data in a KVStore table, you would do the following (on the Big Data Lite VM):

> cd /u01/bigdatasql_config

> edit bigdata.properties

bigdatasql.enabled=true

> edit bigdata-log4j.properties

log4j.logger.oracle.kv.hadoop.hive.table=ALL, file
log4j.logger.oracle.kv.hadoop.hive.table.TableStorageHandlerBase=DEBUG
log4j.logger.oracle.kv.hadoop.hive.table.TableHiveInputFormat=DEBUG

Note that for versions of the Big Data Lite VM beyond 4.5.0, it should no longer be necessary to enable the Big Data SQL predicate pushdown mechanism by setting the bigdatasql.enabled property to true; as that property should be set by default in the later versions.