Preface

The Oracle Big Data SQL User's Guide describes how to use and manage the Oracle Big Data SQL product.

Audience

This guide is intended for administrators and users of Oracle Big Data SQL, including:

Application developers
Data analysts
Data scientists
Database administrators
System administrators

The guide assumes that the reader has sufficient background knowledge about the database server and the particular Hadoop platform which will host the software in order to follow the instructions successfully.

Conventions

The following text conventions are used in this document:

Convention	Meaning
boldface	Boldface type indicates graphical user interface elements associated with an action, or terms defined in text or the glossary.
italic	Italic type indicates book titles, emphasis, or placeholder variables for which you supply particular values.
`monospace`	Monospace type indicates commands within a paragraph, URLs, code in examples, text that appears on the screen, or text that you enter.
`#` prompt	The pound (`#`) prompt indicates a command that is run as the Linux `root` user.

Backus-Naur Form Syntax

The syntax in this reference is presented in a simple variation of Backus-Naur Form (BNF) that uses the following symbols and conventions:

Symbol or Convention	Description
[ ]	Brackets enclose optional items.
{ }	Braces enclose a choice of items, only one of which is required.
\|	A vertical bar separates alternatives within brackets or braces.
...	Ellipses indicate that the preceding syntactic element can be repeated.
delimiters	Delimiters other than brackets, braces, and vertical bars must be entered as shown.
boldface	Words appearing in boldface are keywords. They must be typed as shown. (Keywords are case-sensitive in some, but not all, operating systems.) Words that are not in boldface are placeholders for which you must substitute a name or value.

Using This Guide With Oracle Big Data SQL Cloud Service

This guide describes how to work with both on-premises installations of Oracle Big Data SQL and the Oracle Big Data SQL Cloud Service.

Some of the functionality described in the guide is not currently available in Oracle Big Data Cloud Service. Topics that are not applicable to the cloud service are marked with this badge:

This topic does not apply to Oracle Big Data SQL Cloud Service.

Assume that a badge applies to the current topic as well as all of its subtopics.

Where only one paragraph or a single statement within a topic does not apply to the cloud service, you will see this note in parentheses: (Not available in Oracle Big Data SQL Cloud Service.)

Changes in This Release

This topic does not apply to Oracle Big Data SQL Cloud Service.

Ethernet Option for Connections to the Exadata Database Machine

The following are changes in Oracle Big Data SQL Release 3.1.

The preferred method of connecting Oracle Big Data Appliance and Oracle Exadata Database for any purpose is through InfiniBand. Previous releases of Oracle Big Data SQL have required InfiniBand for these connections. In Release 3.1, Ethernet networking between the Exadata Database Machine and Oracle Big Data Appliance is now supported. This enables you to use Oracle Big Data SQL with these two Engineered Systems in environments where InfiniBand is not feasible, such as when the two systems are geographically distant from each other.

Release 3.1 also enables Ethernet connections between commodity Hadoop systems and the Oracle Exadata Database Machine.

Oracle Big Data SQL connectivity between commodity Hadoop systems and commodity Oracle Database servers has been Ethernet-based throughout previous releases.

Note that Ethernet connections between Oracle Big Data Appliance (or commodity Hadoop systems) and Oracle SPARC SuperCluster are not supported at this time.

New Features to Simplify ILM – Oracle Database Tablespaces in HDFS (With Smart Scan Technology)

Oracle Database ILM (Information Lifecycle Management) can now be extended to use Hadoop to store read-only Oracle Database tablespaces. When you move tablespaces from Oracle Database to HDFS, the tables, partitions, and data retain their original Oracle Database internal format, remain accessible to queries, and support the full range of Oracle Database performance optimizations and security features, including the following:

Smart Scan for HDFS, which enables off-load of query processing to Oracle Big Data SQL on the Hadoop cluster. Smart Scan also provides filtering of query results in Hadoop prior to the return of the data to Oracle Database. In most circumstances, this can be a significant performance optimization. Indexing, Hybrid Columnar Compression, Partition Pruning, and Oracle Database In-Memory are also supported.
Oracle Advanced Security Option (ASO) Transparent Encryption and Data Redaction.

Tablespaces stored in HDFS are read-only, therefore this storage is best suited to data archiving.

See Section 3.2 in the Oracle Big Data SQL User’s Guide for details.

Enhancements in Oracle Shell for Hadoop Loaders 1.2

Oracle Shell for Hadoop Loaders (OHSH) is an intuitive command line tool for data migration. You can set up resources to connect to Hive, HDFS, or Oracle Database, and then access each of these data sources through the uniform OHSH interface. OHSH is one of the ways to use Copy to Hadoop. Copy to Hadoop users can download OHSH from OTN.

OHSH 1.2 includes the following changes:

On-disk logging of load operations in the $HOME/.ohsh shadow directory.
The ability to minimize output when doing load commands. (See the help command for set outputlevel.)
Loading Hive tables from Oracle tables not living in the oracle user's schema.
Wallet and TNS usage by OHSH relies on the setting of environmental variables WALLET_LOCATION and TNS_ADMIN. The set tnsadmin and set walletlocation commands are no longer supported.

In addition, you no longer set HIVE0_URL to the fully-qualified URL of remote HiveServer2 in order to create a %hive0 resource. In OHSH 1.2, set the environmental variable HS2_HOST_PORT in bin/ohsh, which is the <hostname>:<port> pair of HiveServer2.

Enhancements to Copy To Hadoop

The new method directcopy is added to Copy to Hadoop.

This is a direct, single-step method of copying data from Oracle Database to HDFS. See Using Copy to Hadoop to do Direct Copies in the Oracle Big Data SQL User’s Guide for more information.

Granting Access – Users Now Require the BDSQL_USER Role

In prior Oracle Big Data SQL releases, all users are granted Big Data SQL access implicitly. Release 3.1 introduces the BDSQL_USER role. Users requiring Oracle Big Data SQL access must be granted this role explicitly.

You must also now grant read privileges on the BigDataSQL configuration directory object.

For example, to grant access to user1:

SQL> grant BDSQL_USER to user1; 
SQL> grant read on directory ORACLE_BIGDATA_CONFIG to user1;

Installation and Configuration Changes

This topic does not apply to Oracle Big Data SQL Cloud Service.

Expanded Deployments

Release 3.1 broadens support for Oracle Big Data SQL connectivity between Oracle Engineered Systems and commodity servers.

In earlier Oracle Big Data SQL releases, the following Oracle Database/Hadoop connections are possible:

Oracle Exadata Database Machine to Oracle Big Data Appliance.
Oracle Database on commodity servers to commodity Hadoop systems.

As of Release 3.1, Oracle Big Data SQL supports all of the following Oracle Database/Hadoop system connections:

Oracle Database on commodity servers to Oracle Big Data Appliance.
Oracle Database on commodity servers to commodity Hadoop systems.
Oracle Exadata Database Machine to Oracle Big Data Appliance.
Oracle Exadata Database Machine to commodity Hadoop systems.

The phrase “Oracle Database on commodity servers” refers to non-Exadata Linux systems that are officially-supported as Oracle Database platforms. “Commodity Hadoop systems” refers to Hortonworks HDP systems or to Cloudera CDH-based systems other than Oracle Big Data Appliance. In all cases, Oracle Database servers and Hadoop systems must meet the prerequisites identified in the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support).

Oracle SPARC SuperCluster Support

Release 3.1 provides support for Oracle SPARC SuperCluster, with certain limitations:

Ethernet connections between BDA (and commodity Hadoop systems) and SPARC SuperCluster are not supported.
Oracle Database Tablepaces in HDFS (with Smart Scan technology) is not supported on this platform.

Simplified Deployment on Oracle Database Servers – Oracle Grid Infrastructure is now Optional

In previous releases of Oracle Big Data SQL, Oracle Grid Infrastructure is a prerequisite of the installation for all Oracle Database servers, including standalone servers that are not part of an Oracle RAC system. In Release 3.1, you have the option to install Oracle Big Data SQL on servers where Oracle Grid Infrastructure is not present. Note that in these cases, the installer makes some configuration file changes that require a restart of Oracle Database.

Unified Platform Support in the Oracle Big Data SQL Installer

Previous Oracle Big Data SQL releases included two separate installation procedures – one for Oracle Engineered Systems and another for commodity servers. In Release 3.1, you use the same installation process for both Oracle and non-Oracle platforms. This is also true for maintenance. For all of the supported Hadoop/Oracle Database combinations there is a uniform set of steps to update the Oracle Big Data SQL configuration when there are changes to the Hadoop cluster or Oracle Database server.

Installation Instructions Moved to Oracle Big Data SQL Installation Guide

The Oracle Big Data SQL Installation Guide provides instructions how to install and uninstall the software. In releases prior to Oracle Big Data Appliance 3.1, installation instructions are in the user’s guide.