Preface

The Oracle Big Data SQL User's Guide describes how to use and manage the Oracle Big Data SQL product.

Audience

This guide is intended for administrators and users of Oracle Big Data SQL, including:

  • Application developers

  • Data analysts

  • Data scientists

  • Database administrators

  • System administrators

The guide assumes that the reader has basic knowledge of Oracle Database single-node and multinode systems, the Hadoop framework, the Linux operating system, and networking concepts.

Documentation Accessibility

For information about Oracle's commitment to accessibility, visit the Oracle Accessibility Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc.

Access to Oracle Support

Oracle customers that have purchased support have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.

Related Documents

See the Oracle Big Data SQL Installation Guide for instructions on installing the product.

See the Oracle Big Data Appliance Owner's Guide for information about using the Oracle Big Data SQL with Oracle Big Data Appliance.

Conventions

The following text conventions are used in this document:

Convention Meaning

boldface

Boldface type indicates graphical user interface elements associated with an action, or terms defined in text or the glossary.

italic

Italic type indicates book titles, emphasis, or placeholder variables for which you supply particular values.

monospace

Monospace type indicates commands within a paragraph, URLs, code in examples, text that appears on the screen, or text that you enter.

# prompt

The pound (#) prompt indicates a command that is run as the Linux root user.

Backus-Naur Form Syntax

The syntax in this reference is presented in a simple variation of Backus-Naur Form (BNF) that uses the following symbols and conventions:

Symbol or Convention Description

[ ]

Brackets enclose optional items.

{ }

Braces enclose a choice of items, only one of which is required.

|

A vertical bar separates alternatives within brackets or braces.

...

Ellipses indicate that the preceding syntactic element can be repeated.

delimiters

Delimiters other than brackets, braces, and vertical bars must be entered as shown.

boldface

Words appearing in boldface are keywords. They must be typed as shown. (Keywords are case-sensitive in some, but not all, operating systems.) Words that are not in boldface are placeholders for which you must substitute a name or value.

Changes in Oracle Big Data SQL 4.0

The following are new features and updates in Oracle Big Data SQL Release 4.0.

Support for Oracle Database 18c as well as Backward Compatibility for Oracle Database 12.2 and 12.1

To take advantage of the new capabilities in Oracle Big Data SQL 4.0, you need use Oracle Database 18c or later. However, use of Oracle Database 12.1 and 12.2 is fully supported (even though you can't leverage the new 4.0 capabilities with these database versions). This backward compatibility enables you to install and administer release 4.0 in a mixed environment that includes both Oracle Database 18c and 12c.

Big Data SQL Query Server

Big Data SQL Query Server is a lightweight, zero-maintenance Oracle Database. It gives you an easy way to query data in Hadoop without the need for a full-fledged Oracle Database service. The services consist of the Oracle SQL query engine only. It provides no persistent storage except for certain categories of metadata that are useful to retain across sessions.

  • Installs Automatically and Requires no Maintenance

    Big Data SQL Query Server is included as part of the standard Oracle Big Data SQL installation. The only thing you need to provide is the address of an edge node where you would like the service installed. The installation itself is fully automated and requires no post-installation configuration.

  • Provides Single and Multi-User Modes

    The service provides two modes – single-user and multi-user. Single-user mode utilizes a single user for accessing the Query Server. All users connect to the Query Server as the BDSQL user with the password specified during the installation. In multi-user mode Hadoop cluster users log into the Query Server using their Kerberos principal.

  • Works with Kerberos, Automatically Imports Kerberos Principals

    A Kerberos-secured cluster can support both single user and multi-user mode.

    During installation on a secured cluster, the installer automatically queries the KDC to identify the Kerberos principals and then sets up externally identified users based on the principals. After installation, the administrator can manually add or remove principals.

  • Resets to Initial State After Each Query Server Restart

    Each time Big Data SQL Query Server is restarted, the database instance is reset to the original state. This also happens if a fatal error occurs. This reset enables you to start again from a “clean slate.” A restart preserves external tables (both ORACLE_HIVE and HDFS types), associated statistics, and user-defined views. A restart deletes regular tables containing user data

  • Can be Managed Through Hortonworks Ambari or Cloudera Manager

    Big Data SQL Query Service is automatically set up as a service in Ambari or Cloudera Manager. You can use these administrative tools to monitor and stop/start the process, view warning, error, and informational messages, and perform some Big Data SQL Query Service operations such as statistics gathering and Hive metadata import.

Query Server is provided under a limited use license described in Oracle Big Data SQL Licensing in Oracle Big Data SQL Installation Guide.

New ORACLE_BIGDATA Driver for Accessing Object Stores

In addition to ORACLE_HIVE and ORACLE_HDFS, release 4.0 also includes the new ORACLE_BIGDATA driver. This driver enables you to create external tables over data within object stores in the cloud. Currently Oracle Object Store and Amazon S3 are supported. ORACLE_BIGDATA enables you to create external tables over Parquet, Avro, and text files in these environments. For development and testing, you can also use it to access local data files through Oracle Database directory objects. The driver is written in C and does not execute any Java code.

In release 4.0, ORACLE_BIGDATA supports the return of scalar fields from Parquet files. More complex data types as well as multi-part Parquet files are not supported at this time. Because the reader does not support complex data types in the Parquet file, the column list generated omits any complex columns from the external table definition. Most types stored in Parquet files are not directly supported as types for columns in Oracle tables.

Oracle Big Data SQL's Smart Scan, including the new aggregation offload capability, work with object stores by offloading data from the object store to processing cells on the Hadoop cluster where Oracle Big Data SQL is installed.

Authentication against object stores is accomplished through a credential object that you create using the DBMS_CREDENTIAL package. You include the name of the credential object as well as a location URI as parameters of the external table create statement.

See Also:

Creating an Oracle External Table for Object Store Access which provides create statement examples as well as conversion tables for Parquet and Avro data types to Oracle data types.

Aggregation Offload

Oracle Big Data SQL can now utilize Oracle In-Memory technology to offload aggregations to the Oracle Big Data SQL cells. Oracle Big Data SQL leverages the processing power of the Hadoop cluster to distribute aggregations across the cluster nodes. The performance gains can be significantly greater than for aggregations that do not offload, especially when there are a moderate number of summary groupings.

Oracle Big Data SQL cells support single table and multi-table aggregations (for example, dimension tables joining to a fact table). For multi-table aggregations, the Oracle Database uses the key vector transform optimization in which the key vectors are pushed to the cells for the aggregation process. This transformation type is useful for star join sql queries that use typical aggregation operators (for example, SUM, MIN, MAX, and COUNT) which are common in business queries.

Sentry Authorization in Oracle Big Data SQL

In addition to supporting authorization for HDFS file access, Oracle Big Data SQL supports Sentry policies,which authorize access to Hive metadata. Sentry enables fine-grained control over user access, down to the column level.

See Also:

Sentry Authorization in Big Data SQL in the Oracle Big Data SQL Installation Guide.

Installer Improvements

  • The Jaguar installer provides easy installation of the optional Query Server database. Several new parameters are added to the Jaguar configuration file for the installation of this component.

  • Oracle Big Data SQL now includes its own JDK. You no longer need to download it from the Oracle Technology Network. Other versions of the JDK may be present, but do not change the JDK path that Oracle Big Data SQL uses.

  • The installer now validates principals entered in the Kerberos section of the configuration file against the corresponding keytab file and flags an error if these do no match.

  • Cluster edge nodes are automatically excluded from the requirements pre-check.

  • In the installation pre-check, hardware factors (cores and memory) are validated only on nodes where Oracle Big Data SQL processing cells will be installed.

  • On the database side, the install now validates the subnet (for InfiniBand connections), the LD_LIBRARY_PATH, and the hostnames of Hadoop systems on the other side of the connection.

  • In an uninstall on the database side, the operation now removes all Oracle Big Data SQL artifacts from the database server and reverts all changes to cellinit.*ora and database parameters.

  • The Jaguar updatenodes operation is deprecated in this release. Use reconfigure instead to change cluster settings, create database-side install bundles, and expand or shrink the configuration.

  • Two new scripts to help predetermine readiness for installation.

    Prior to installing the Hadoop side of Oracle Big Data SQL, you can run bds_node_check.sh on each DataNode of the cluster to check if the node meets the installation prerequisites.

    Prior to installing on the Oracle Database system, you can run bds-validate-grid-patches.sh to ensure that Oracle Grid includes the patches required by the Oracle Big Data SQL release.

  • The script bds_cluster_node_helper.sh, which you can run on each Hadoop node, provides status on the Oracle Big Data SQL installation on the node and also collects log data and other information useful for maintenance. There are three options for the scope of the log data collection.