A Brief Primer on Oracle Big Data SQL

As stated in the Introduction to Integration with Oracle Big Data SQL, Oracle Big Data SQL allows SQL access to various external data sources such as an Oracle NoSQL Database table by presenting the data source as an Oracle external table. To achieve this, a mechanism referred to as an access driver is employed to access data as if it were a table in the Oracle relational database running in the Oracle Big Data SQL system. Oracle Big Data SQL extends the access driver mechanism of external tables by specifying a new access driver type for each data source that will be accessed. Prior to the introduction of Oracle Big Data SQL, the Oracle Database external tables mechanism defined only two access driver types:

  • The ORACLE_LOADER access driver, for reading from flat files.
  • The ORACLE_DATAPUMP access driver, for migrating data between Oracle databases in a proprietary format.

With the introduction of Big Data SQL, the following new access driver types are defined:

  • The ORACLE_HDFS access driver, for accessing data stored in the Apache Hadoop Distributed File System.
  • The ORACLE_HIVE access driver, for accessing data stored in Apache Hive tables or Oracle NoSQL Database tables.
  • The ORACLE_BIGDATA access driver, for accessing files stored in an object store.

Both the ORACLE_HDFS and ORACLE_HIVE access drivers require the specification of a number of classes that satisfy the Apache Hadoop MapReduce programming model. Some of those classes are required by both access driver types, whereas some are required by only the ORACLE_HIVE access driver. The class types required by both ORACLE_HDFS and ORACLE_HIVE are:

  • An instance of org.apache.hadoop.mapreduce.InputFormat
  • An instance of org.apache.hadoop.mapreduce.OutputFormat
  • An instance of org.apache.hadoop.mapreduce.RecordReader
  • An instance of org.apache.hadoop.mapreduce.InputSplit

See package org.apache.hadoop.mapreduce.

The class types required by ORACLE_HIVE but not ORACLE_HDFS are:

  • An instance of org.apache.hadoop.hive.ql.metadata.HiveStoarageHandler ; for example, the Oracle NoSQL Database TableStorageHandler.
  • An instance of org.apache.hadoop.hive.serde2.AbstractSerDe ; for example, the Oracle NoSQL Database TableSerDe.
  • An instance of org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector ; for example, the various ObjectInspector implementations defined by Oracle NoSQL Database and described in the Integration with Apache Hive section.
  • An instance of org.apache.hadoop.hadoop.hive.ql.metadata.HiveStoragePredicateHandler .

See Hive API and package oracle.kv.hadoop.hive.table.

The ORACLE_HDFS access driver can only read data stored in HDFS files, whereas the ORACLE_HIVE access driver can read data stored not only in HDFS files, but data stored in other locations as well; for example an Oracle NoSQL Database table. As explained in the following sections, the integration of Oracle NoSQL Database with Apache Hive plays a prominent role in the integration of Oracle NoSQL Database with Oracle Big Data SQL.