Changes in Oracle Big Data SQL 4.0

The following are new features and updates in Oracle Big Data SQL Release 4.0.

Support for Oracle Database 18c as well as Backward Compatibility for Oracle Database 12.2 and 12.1

To take Oracle advantage of the new capabilities in Oracle Big Data SQL 4.0, you need use Oracle Database 18c or later. However, use of Oracle Database 12.1 and 12.2 is fully supported in this release (even though you can't leverage the new 4.0 capabilities with these database versions). This backward compatibility enables you to install and administer release 4.0 in a mixed environment that includes both Oracle Database 18c and 12c.

Support for Cloudera Enterprise 6

In addition to Cloudera Enterprise Release 5, this release of Oracle Big Data SQL also fully supports Cloudera Enterprise Release 6. If Oracle Big Data SQL is installed on Cloudera Enterprise Release 6, the Jaguar installer detects this release level and configures the installation accordingly. No additional steps are required.

New ORACLE_BIGDATA Driver for Accessing Object Stores

In addition to ORACLE_HIVE and ORACLE_HDFS, release 4.0 also includes the new ORACLE_BIGDATA driver. This driver enables you to create external tables over data within object stores in the cloud. Currently Oracle Object Store and Amazon S3 are supported. You can create external tables over Parquet, Avro, and text files in these stores. For development and testing, you can also use ORACLE_BIGDATA to access local data files though Oracle Database directory objects. The driver is written in C. It does not execute any Java code and can therefore work with non-Java environments

In release 4.0, ORACLE_BIGDATA supports the return of scalar fields from Parquet files. More complex data types as well as multi-part Parquet files are not supported at this time. Because the reader does not support complex data types in the Parquet file, the column list generated omits any complex columns from the external table definition. Most types stored in Parquet files are not directly supported as types for columns in Oracle tables.

Oracle Big Data SQL's Smart Scan performance optimizations, including the new aggregation offload capability (also introduced in this section), work with object stores by offloading data from the object store to processing cells on the Hadoop cluster where Oracle Big Data SQL is installed.

Authentication against object stores is accomplished through a credential object that you create using the DBMS_CREDENTIAL package. You include the name of the credential object as well as a location URI as parameters of the external table create statement.

See Also:

Creating an Oracle External Table for Object Store Access in the Oracle Big Data SQL User's Guide provides create statement examples as well as conversion tables for Parquet and Avro data types to Oracle data types.

Oracle Big Data SQL Query Server

Query Server is a lightweight, zero-maintenance 18c Oracle Database that you install directly on a Hadoop cluster edge node. It gives you an easy way query data in Hadoop without the need for a full-fledged Oracle Database system. The service consists of the Oracle SQL query engine only. It provides no persistent storage except for certain categories of metadata that are useful to retain across sessions.

Installs Easily and Requires no Maintenance

Big Data SQL Query Server is included as part of the standard Oracle Big Data SQL installation. The only additional things you need to do in order to use this optional component are download and unpack the "extras" package from the same site where you download Oracle Big Data SQL. There are also some Query Server-related parameters that must be set in the Jaguar installer's configuration file, including the address of a dedicated edge node where you would like to run this service. Then, when you install Oracle Big Data SQL, the Query Server installation is fully automated and requires no post-installation configuration.
Provides Single and Multi-User Modes

The service provides two modes – single-user and multi-user. Single-user mode utilizes a single user for accessing the Query Server. All users connect to the Query Server as the BDSQL user with the password specified during the installation. In multi-user mode Hadoop cluster users log into the Query Server using their Kerberos principal.
Works with Kerberos, Automatically Imports Kerberos Principals

A Kerberos-secured cluster can support both single user and multi-user mode.

During installation on a secured cluster, the installer automatically queries the KDC to identify the Kerberos principals and then sets up externally identified users based on the principals. After installation, the administrator can manually add or remove principals.
Resets to Initial State After Each Query Server Restart

Each time Big Data SQL Query Server is restarted, the database instance is reset to the original state. This also happens if a fatal error occurs. The reset enables you to start again from a “clean slate.” However, a restart does preserve external tables. ORACLE_HIVE, ORACLE_HDFS, and ORACLE_BIGDATA external tables, associated statistics, and user-defined views are preserved. For ORACLE _BIGDATA external tables, it also preserves object store access credentials created with DBMS_CREDENTIAL. All of this external data persists after a restart. Regular tables containing user data are not preserved in a restart.
Can be Managed Through Hortonworks Ambari or Cloudera Manager

Big Data SQL Query Service is automatically set up as a service in Ambari or Cloudera Manager. You can use these administrative tools to monitor and stop/start the process, view warning, error, and informational messages, and perform some Big Data SQL Query Service operations such as statistics gathering and Hive metadata import.

Query Server is provided under a limited use license described in Oracle Big Data SQL Licensing.

See Also:

Working With Query Server in the Oracle Big Data SQL User's Guide walks you through the process of setting up and using Query Server.

Works with Kerberos, Automatically Imports Kerberos Principals
A Kerberos-secured cluster can support both single user and multi-user mode.

During installation on a secured cluster, the installer automatically queries the KDC to identify the Kerberos principals and then sets up externally identified users based on the principals. After installation, the administrator can manually add or remove principals.
Resets to Initial State After Each Query Server Restart
Each time Big Data SQL Query Server is restarted, the database instance is reset to the original state. This also happens if a fatal error occurs. The reset enables you to start again from a “clean slate.” However, a restart does preserve external tables. ORACLE_HIVE, ORACLE_HDFS, and ORACLE_BIGDATA external tables, associated statistics, and user-defined views are preserved. For ORACLE _BIGDATA external tables, it also preserves object store access credentials created with DBMS_CREDENTIAL. All of this external data persists after a restart. Regular tables containing user data are not preserved in a restart.
Can be Managed Through Hortonworks Ambari or Cloudera Manager
Big Data SQL Query Service is automatically set up as a service in Ambari or Cloudera Manager. You can use these administrative tools to monitor and stop/start the process, view warning, error, and informational messages, and perform some Big Data SQL Query Service operations such as statistics gathering and Hive metadata import.

Query Server is provided under a limited use license described in Oracle Big Data SQL Licensing.

See Also:

Working With Query Server Working With Query Server in the Oracle Big Data SQL User's Guide walks you through the process of setting up and using Query Server.

Aggregation Offload

Oracle Big Data SQL can now utilize Oracle In-Memory technology to offload aggregations to the Oracle Big Data SQL cells. Oracle Big Data SQL leverages the processing power of the Hadoop cluster to distribute aggregations across the cluster nodes.

The performance gains achieved through aggregation offload can be very substantial, especially when there are a moderate number of summary groupings.

Oracle Big Data SQL cells support single table and multi-table aggregations (for example, dimension tables joining to a fact table). For multi-table aggregations, the Oracle Database uses the key vector transform optimization in which the key vectors are pushed to the cells for the aggregation process. This transformation type is useful for star join sql queries that use typical aggregation operators (for example, SUM, MIN, MAX, and COUNT) which are common in business queries.

See Also:

About Aggregation Offload in the Oracle Big Data SQL User's Guide

Sentry Authorization in Oracle Big Data SQL

In addition to supporting authorization for HDFS file access, Oracle Big Data SQL supports Sentry policies,which authorize access to Hive metadata. Sentry enables fine-grained control over user access, down to the column level.

See Also:

Sentry Authorization in Oracle Big Data SQL.

Installer Improvements

The Jaguar installer provides easy installation of the optional Query Server database. Several new parameters are added to the Jaguar configuration file for the installation of this component.
Oracle Big Data SQL now includes its own JDK. You no longer need to download it from the Oracle Technology Network. Other versions of the JDK may be present, but do not change the JDK path that Oracle Big Data SQL uses.
The installer now validates principals entered in the Kerberos section of the configuration file against the corresponding keytab file and flags an error if these do no match.
Cluster edge nodes are automatically excluded from the requirements pre-check.
In the installation pre-check, hardware factors (cores and memory) are validated only on nodes where Oracle Big Data SQL processing cells will be installed.
On the database side, the install now validates the subnet (for InfiniBand connections), the LD_LIBRARY_PATH, and the hostnames of Hadoop systems on the other side of the connection.
In an uninstall on the database side, the operation now removes all Oracle Big Data SQL artifacts from the database server and reverts all changes to cellinit.*ora and database parameters.
The Jaguar updatenodes operation is deprecated in this release. Use reconfigure instead to change cluster settings, create database-side install bundles, and expand or shrink the configuration.
Two new scripts to help predetermine readiness for installation.
Prior to installing the Hadoop side of Oracle Big Data SQL, you can run bds_node_check.sh on each DataNode of the cluster to check if the node meets the installation prerequisites.

Prior to installing on the Oracle Database system, you can run bds-validate-grid-patches.sh to ensure that Oracle Grid includes the patches required by the Oracle Big Data SQL release.
The script bds_cluster_node_helper.sh, which you can run on each Hadoop node, provides status on the Oracle Big Data SQL installation on the node and also collects log data and other information useful for maintenance. There are three options for the scope of the log data collection.