Changes in Oracle Big Data SQL 3.1

Expanded Deployments

Release 3.1 broadens support for Oracle Big Data SQL connectivity between Oracle Engineered Systems and commodity servers.

In earlier Oracle Big Data SQL releases, the following Oracle Database/Hadoop connections are possible:

Oracle Exadata Database Machine to Oracle Big Data Appliance.
Oracle Database on commodity servers to commodity Hadoop systems.

As of Release 3.1, Oracle Big Data SQL supports all of the following Oracle Database/Hadoop system connections:

Oracle Database on commodity servers to Oracle Big Data Appliance.
Oracle Database on commodity servers to commodity Hadoop systems.
Oracle Exadata Database Machine to Oracle Big Data Appliance.
Oracle Exadata Database Machine to commodity Hadoop systems.

The phrase “Oracle Database on commodity servers” refers to non-Exadata Linux systems that are officially-supported as Oracle Database platforms. “Commodity Hadoop systems” refers to Hortonworks HDP systems or to Cloudera CDH-based systems other than Oracle Big Data Appliance. In all cases, Oracle Database servers and Hadoop systems must meet the prerequisites identified in the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support).

Oracle SPARC SuperCluster Support

Release 3.1 provides support for Oracle SPARC SuperCluster, with certain limitations:

Ethernet connections between BDA (and commodity Hadoop systems) and SPARC SuperCluster are not supported.
Oracle Database Tablepaces in HDFS (with Smart Scan technology) is not supported on this platform.

Ethernet Option for Connections to the Exadata Database Machine

The preferred method of connecting Oracle Big Data Appliance and Oracle Exadata Database for any purpose is through InfiniBand. Previous releases of Oracle Big Data SQL have required InfiniBand for these connections. In Release 3.1, Ethernet networking between the Exadata Database Machine and Oracle Big Data Appliance is now supported. This enables you to use Oracle Big Data SQL with these two Engineered Systems in environments where InfiniBand is not feasible, such as when the two systems are geographically distant from each other.

Release 3.1 also enables Ethernet connections between commodity Hadoop systems and the Oracle Exadata Database Machine.

Oracle Big Data SQL connectivity between commodity Hadoop systems and commodity Oracle Database servers has been Ethernet-based throughout previous releases.

Note that Ethernet connections between Oracle Big Data Appliance (or commodity Hadoop systems) and Oracle SPARC SuperCluster are not supported at this time.

Simplified Deployment on Oracle Database Servers – Oracle Grid Infrastructure is now Optional

In previous releases of Oracle Big Data SQL, Oracle Grid Infrastructure is a prerequisite of the installation for all Oracle Database servers, including standalone servers that are not part of an Oracle RAC system. In Release 3.1, you have the option to install Oracle Big Data SQL on servers where Oracle Grid Infrastructure is not present. Note that in these cases, the installer makes some configuration file changes that require a restart of Oracle Database.

Unified Platform Support in the Oracle Big Data SQL Installer

Previous Oracle Big Data SQL releases included two separate installation procedures – one for Oracle Engineered Systems and another for commodity servers. In Release 3.1, you use the same installation process for both Oracle and non-Oracle platforms. This is also true for maintenance. For all of the supported Hadoop/Oracle Database combinations there is a uniform set of steps to update the Oracle Big Data SQL configuration when there are changes to the Hadoop cluster or Oracle Database server.

New Features to Simplify ILM – Oracle Database Tablespaces in HDFS (With Smart Scan Technology)

Oracle Database ILM (Information Lifecycle Management) can now be extended to use Hadoop to store read-only Oracle Database tablespaces. When you move tablespaces from Oracle Database to HDFS, the tables, partitions, and data retain their original Oracle Database internal format, remain accessible to queries, and support the full range of Oracle Database performance optimizations and security features, including the following:

Smart Scan for HDFS, which enables off-load of query processing to Oracle Big Data SQL on the Hadoop cluster. Smart Scan also provides filtering of query results in Hadoop prior to the return of the data to Oracle Database. In most circumstances, this can be a significant performance optimization. Indexing, Hybrid Columnar Compression, Partition Pruning, and Oracle Database In-Memory are also supported.
Oracle Advanced Security Option (ASO) Transparent Encryption and Data Redaction.

Tablespaces stored in HDFS are read-only, therefore this storage is best suited to data archiving.

See Section 3.2 in the Oracle Big Data SQL User’s Guide for details.

Enhancements in Oracle Shell for Hadoop Loaders 1.2

Oracle Shell for Hadoop Loaders (OHSH) is an intuitive command line tool for data migration. You can set up resources to connect to Hive, HDFS, or Oracle Database, and then access each of these data sources through the uniform OHSH interface. OHSH is one of the ways to use Copy to Hadoop. Copy to Hadoop users can download OHSH from OTN.

OHSH 1.2 includes the following changes:

On-disk logging of load operations in the $HOME/.ohsh shadow directory.
The ability to minimize output when doing load commands. (See the help command for set outputlevel.)
Loading Hive tables from Oracle tables not living in the oracle user's schema.
Wallet and TNS usage by OHSH relies on the setting of environmental variables WALLET_LOCATION and TNS_ADMIN. The set tnsadmin and set walletlocation commands are no longer supported.

In addition, you no longer set HIVE0_URL to the fully-qualified URL of remote HiveServer2 in order to create a %hive0 resource. In OHSH 1.2, set the environmental variable HS2_HOST_PORT in bin/ohsh, which is the <hostname>:<port> pair of HiveServer2.

Enhancements to Copy To Hadoop

The new method directcopy is added to Copy to Hadoop.

This is a direct, single-step method of copying data from Oracle Database to HDFS. See Using Copy to Hadoop to do Direct Copies in the Oracle Big Data SQL User’s Guide for more information.

Granting Access – Users Now Require the BDSQL_USER Role

In prior Oracle Big Data SQL releases, all users are granted Big Data SQL access implicitly. Release 3.1 introduces the BDSQL_USER role. Users requiring Oracle Big Data SQL access must be granted this role explicitly.

You must also now grant read privileges on the BigDataSQL configuration directory object.

For example, to grant access to user1:

SQL> grant BDSQL_USER to user1; 
SQL> grant read on directory ORACLE_BIGDATA_CONFIG to user1;

Installation Instructions Moved to Oracle Big Data SQL Installation Guide

The Oracle Big Data SQL Installation Guide provides instructions how to install and uninstall the software. In releases prior to Oracle Big Data Appliance 3.1, installation instructions are in the user’s guide.