Changes in Oracle Big Data SQL 4.0

The following are new features and updates in Oracle Big Data SQL Release 4.0.

Support for Oracle Database 18c as well as Backward Compatibility for Oracle Database 12.2 and 12.1

To take advantage of the new capabilities in Oracle Big Data SQL 4.0, you need use Oracle Database 18c or later. However, use of Oracle Database 12.1 and 12.2 is fully supported (even though you can't leverage the new 4.0 capabilities with these database versions). This backward compatibility enables you to install and administer release 4.0 in a mixed environment that includes both Oracle Database 18c and 12c.

Big Data SQL Query Server

Big Data SQL Query Server is a lightweight, zero-maintenance Oracle Database. It gives you an easy way to query data in Hadoop without the need for a full-fledged Oracle Database service. The services consist of the Oracle SQL query engine only. It provides no persistent storage except for certain categories of metadata that are useful to retain across sessions.

  • Installs Automatically and Requires no Maintenance

    Big Data SQL Query Server is included as part of the standard Oracle Big Data SQL installation. The only thing you need to provide is the address of an edge node where you would like the service installed. The installation itself is fully automated and requires no post-installation configuration.

  • Provides Single and Multi-User Modes

    The service provides two modes – single-user and multi-user. Single-user mode utilizes a single user for accessing the Query Server. All users connect to the Query Server as the BDSQL user with the password specified during the installation. In multi-user mode Hadoop cluster users log into the Query Server using their Kerberos principal.

  • Works with Kerberos, Automatically Imports Kerberos Principals

    A Kerberos-secured cluster can support both single user and multi-user mode.

    During installation on a secured cluster, the installer automatically queries the KDC to identify the Kerberos principals and then sets up externally identified users based on the principals. After installation, the administrator can manually add or remove principals.

  • Resets to Initial State After Each Query Server Restart

    Each time Big Data SQL Query Server is restarted, the database instance is reset to the original state. This also happens if a fatal error occurs. This reset enables you to start again from a “clean slate.” A restart preserves external tables (both ORACLE_HIVE and HDFS types), associated statistics, and user-defined views. A restart deletes regular tables containing user data

  • Can be Managed Through Hortonworks Ambari or Cloudera Manager

    Big Data SQL Query Service is automatically set up as a service in Ambari or Cloudera Manager. You can use these administrative tools to monitor and stop/start the process, view warning, error, and informational messages, and perform some Big Data SQL Query Service operations such as statistics gathering and Hive metadata import.

Query Server is provided under a limited use license described in Oracle Big Data SQL Licensing in Oracle Big Data SQL Installation Guide.

New ORACLE_BIGDATA Driver for Accessing Object Stores

In addition to ORACLE_HIVE and ORACLE_HDFS, release 4.0 also includes the new ORACLE_BIGDATA driver. This driver enables you to create external tables over data within object stores in the cloud. Currently Oracle Object Store and Amazon S3 are supported. ORACLE_BIGDATA enables you to create external tables over Parquet, Avro, and text files in these environments. For development and testing, you can also use it to access local data files through Oracle Database directory objects. The driver is written in C and does not execute any Java code.

In release 4.0, ORACLE_BIGDATA supports the return of scalar fields from Parquet files. More complex data types as well as multi-part Parquet files are not supported at this time. Because the reader does not support complex data types in the Parquet file, the column list generated omits any complex columns from the external table definition. Most types stored in Parquet files are not directly supported as types for columns in Oracle tables.

Oracle Big Data SQL's Smart Scan, including the new aggregation offload capability, work with object stores by offloading data from the object store to processing cells on the Hadoop cluster where Oracle Big Data SQL is installed.

Authentication against object stores is accomplished through a credential object that you create using the DBMS_CREDENTIAL package. You include the name of the credential object as well as a location URI as parameters of the external table create statement.

See Also:

Create an Oracle External Table for Object Store Access which provides create statement examples as well as conversion tables for Parquet and Avro data types to Oracle data types.

Aggregation Offload

Oracle Big Data SQL can now utilize Oracle In-Memory technology to offload aggregations to the Oracle Big Data SQL cells. Oracle Big Data SQL leverages the processing power of the Hadoop cluster to distribute aggregations across the cluster nodes. The performance gains can be significantly greater than for aggregations that do not offload, especially when there are a moderate number of summary groupings.

Oracle Big Data SQL cells support single table and multi-table aggregations (for example, dimension tables joining to a fact table). For multi-table aggregations, the Oracle Database uses the key vector transform optimization in which the key vectors are pushed to the cells for the aggregation process. This transformation type is useful for star join sql queries that use typical aggregation operators (for example, SUM, MIN, MAX, and COUNT) which are common in business queries.

Sentry Authorization in Oracle Big Data SQL

In addition to supporting authorization for HDFS file access, Oracle Big Data SQL supports Sentry policies,which authorize access to Hive metadata. Sentry enables fine-grained control over user access, down to the column level.

See Also:

Sentry Authorization in Big Data SQL in the Oracle Big Data SQL Installation Guide.

Installer Improvements

  • The Jaguar installer provides easy installation of the optional Query Server database. Several new parameters are added to the Jaguar configuration file for the installation of this component.

  • Oracle Big Data SQL now includes its own JDK. You no longer need to download it from the Oracle Technology Network. Other versions of the JDK may be present, but do not change the JDK path that Oracle Big Data SQL uses.

  • The installer now validates principals entered in the Kerberos section of the configuration file against the corresponding keytab file and flags an error if these do no match.

  • Cluster edge nodes are automatically excluded from the requirements pre-check.

  • In the installation pre-check, hardware factors (cores and memory) are validated only on nodes where Oracle Big Data SQL processing cells will be installed.

  • On the database side, the install now validates the subnet (for InfiniBand connections), the LD_LIBRARY_PATH, and the hostnames of Hadoop systems on the other side of the connection.

  • In an uninstall on the database side, the operation now removes all Oracle Big Data SQL artifacts from the database server and reverts all changes to cellinit.*ora and database parameters.

  • The Jaguar updatenodes operation is deprecated in this release. Use reconfigure instead to change cluster settings, create database-side install bundles, and expand or shrink the configuration.

  • Two new scripts to help predetermine readiness for installation.

    Prior to installing the Hadoop side of Oracle Big Data SQL, you can run bds_node_check.sh on each DataNode of the cluster to check if the node meets the installation prerequisites.

    Prior to installing on the Oracle Database system, you can run bds-validate-grid-patches.sh to ensure that Oracle Grid includes the patches required by the Oracle Big Data SQL release.

  • The script bds_cluster_node_helper.sh, which you can run on each Hadoop node, provides status on the Oracle Big Data SQL installation on the node and also collects log data and other information useful for maintenance. There are three options for the scope of the log data collection.