A Brief Primer on Oracle Big Data SQL
As stated in the Introduction to Integration with Oracle Big Data SQL, Oracle Big Data SQL allows SQL access to various external data sources such as an Oracle NoSQL Database table by presenting the data source as an Oracle external table. To achieve this, a mechanism referred to as an access driver is employed to access data as if it were a table in the Oracle relational database running in the Oracle Big Data SQL system. Oracle Big Data SQL extends the access driver mechanism of external tables by specifying a new access driver type for each data source that will be accessed. Prior to the introduction of Oracle Big Data SQL, the Oracle Database external tables mechanism defined only two access driver types:
- The
ORACLE_LOADER
access driver, for reading from flat files. - The
ORACLE_DATAPUMP
access driver, for migrating data between Oracle databases in a proprietary format.
With the introduction of Big Data SQL, the following new access driver types are defined:
- The
ORACLE_HDFS
access driver, for accessing data stored in the Apache Hadoop Distributed File System. - The
ORACLE_HIVE
access driver, for accessing data stored in Apache Hive tables or Oracle NoSQL Database tables. - The
ORACLE_BIGDATA
access driver, for accessing files stored in an object store.
Both the ORACLE_HDFS
and ORACLE_HIVE
access drivers require the specification of a number of classes that satisfy the Apache Hadoop MapReduce programming model. Some of those classes are required by both access driver types, whereas some are required by only the ORACLE_HIVE
access driver. The class types required by both ORACLE_HDFS
and ORACLE_HIVE
are:
- An instance of
org.apache.hadoop.mapreduce.InputFormat
- An instance of
org.apache.hadoop.mapreduce.OutputFormat
- An instance of
org.apache.hadoop.mapreduce.RecordReader
- An instance of
org.apache.hadoop.mapreduce.InputSplit
See package org.apache.hadoop.mapreduce.
The class types required by ORACLE_HIVE
but not ORACLE_HDFS
are:
- An instance of
org.apache.hadoop.hive.ql.metadata.HiveStoarageHandler
; for example, the Oracle NoSQL DatabaseTableStorageHandler
. - An instance of
org.apache.hadoop.hive.serde2.AbstractSerDe
; for example, the Oracle NoSQL DatabaseTableSerDe
. - An instance of
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector
; for example, the variousObjectInspector
implementations defined by Oracle NoSQL Database and described in the Integration with Apache Hive section. - An instance of
org.apache.hadoop.hadoop.hive.ql.metadata.HiveStoragePredicateHandler
.
See Hive API and package oracle.kv.hadoop.hive.table.
The ORACLE_HDFS
access driver can only read data stored in HDFS files, whereas the ORACLE_HIVE
access driver can read data stored not only in HDFS files, but data stored in other locations as well; for example an Oracle NoSQL Database table. As explained in the following sections, the integration of Oracle NoSQL Database with Apache Hive plays a prominent role in the integration of Oracle NoSQL Database with Oracle Big Data SQL.