Preface

The Oracle Big Data SQL User's Guide describes how to install and manage the Oracle Big Data SQL product.

Audience

This guide is intended for installers and users of Oracle Big Data SQL. including:

  • Application developers

  • Data analysts

  • Data scientists

  • Database administrators

  • System administrators

The guide assumes that the reader has sufficient background knowledge about the database server and the particular Hadoop platform which will host the software in order to follow the instructions successfully.

Related Documents

See the Oracle Big Data SQL User’s Guide for instructions on using the product.

The following publications provide information about the use of Oracle Big Data SQL with the Oracle Big Data Appliance and Oracle Big Data Connectors:

You can find more information about Oracle’s Big Data solutions and Oracle Database at the Oracle Help Center

For more information on Hortonworks HDP and Ambari, refer to the Hortonworks documentation site at http://docs.hortonworks.com/index.html.

For more information on Cloudera CDH and Configuration Manager, see http://www.cloudera.com/documentation.html

Conventions

The following text conventions are used in this document:

Convention Meaning

boldface

Boldface type indicates graphical user interface elements associated with an action, or terms defined in text or the glossary.

italic

Italic type indicates book titles, emphasis, or placeholder variables for which you supply particular values.

monospace

Monospace type indicates commands within a paragraph, URLs, code in examples, text that appears on the screen, or text that you enter.

# prompt

The pound (#) prompt indicates a command that is run as the Linux root user.

Backus-Naur Form Syntax

The syntax in this reference is presented in a simple variation of Backus-Naur Form (BNF) that uses the following symbols and conventions:

Symbol or Convention Description

[ ]

Brackets enclose optional items.

{ }

Braces enclose a choice of items, only one of which is required.

|

A vertical bar separates alternatives within brackets or braces.

...

Ellipses indicate that the preceding syntactic element can be repeated.

delimiters

Delimiters other than brackets, braces, and vertical bars must be entered as shown.

boldface

Words appearing in boldface are keywords. They must be typed as shown. (Keywords are case-sensitive in some, but not all, operating systems.) Words that are not in boldface are placeholders for which you must substitute a name or value.

Changes in Oracle Big Data SQL 3.1

The following are changes in Oracle Big Data SQL Release 3.1.

Expanded Deployments

Release 3.1 broadens support for Oracle Big Data SQL connectivity between Oracle Engineered Systems and commodity servers.

In earlier Oracle Big Data SQL releases, the following Oracle Database/Hadoop connections are possible:

  • Oracle Exadata Database Machine to Oracle Big Data Appliance.

  • Oracle Database on commodity servers to commodity Hadoop systems.

As of Release 3.1, Oracle Big Data SQL supports all of the following Oracle Database/Hadoop system connections:

  • Oracle Database on commodity servers to Oracle Big Data Appliance.

  • Oracle Database on commodity servers to commodity Hadoop systems.

  • Oracle Exadata Database Machine to Oracle Big Data Appliance.

  • Oracle Exadata Database Machine to commodity Hadoop systems.

The phrase “Oracle Database on commodity servers” refers to non-Exadata Linux systems that are officially-supported as Oracle Database platforms. “Commodity Hadoop systems” refers to Hortonworks HDP systems or to Cloudera CDH-based systems other than Oracle Big Data Appliance. In all cases, Oracle Database servers and Hadoop systems must meet the prerequisites identified in the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support).

Oracle SPARC SuperCluster Support

Release 3.1 provides support for Oracle SPARC SuperCluster, with certain limitations:

  • Ethernet connections between BDA (and commodity Hadoop systems) and SPARC SuperCluster are not supported.

  • Oracle Database Tablepaces in HDFS (with Smart Scan technology) is not supported on this platform.

Ethernet Option for Connections to the Exadata Database Machine

The preferred method of connecting Oracle Big Data Appliance and Oracle Exadata Database for any purpose is through InfiniBand. Previous releases of Oracle Big Data SQL have required InfiniBand for these connections. In Release 3.1, Ethernet networking between the Exadata Database Machine and Oracle Big Data Appliance is now supported. This enables you to use Oracle Big Data SQL with these two Engineered Systems in environments where InfiniBand is not feasible, such as when the two systems are geographically distant from each other.

Release 3.1 also enables Ethernet connections between commodity Hadoop systems and the Oracle Exadata Database Machine.

Oracle Big Data SQL connectivity between commodity Hadoop systems and commodity Oracle Database servers has been Ethernet-based throughout previous releases.

Note that Ethernet connections between Oracle Big Data Appliance (or commodity Hadoop systems) and Oracle SPARC SuperCluster are not supported at this time.

Simplified Deployment on Oracle Database Servers – Oracle Grid Infrastructure is now Optional

In previous releases of Oracle Big Data SQL, Oracle Grid Infrastructure is a prerequisite of the installation for all Oracle Database servers, including standalone servers that are not part of an Oracle RAC system. In Release 3.1, you have the option to install Oracle Big Data SQL on servers where Oracle Grid Infrastructure is not present. Note that in these cases, the installer makes some configuration file changes that require a restart of Oracle Database.

Unified Platform Support in the Oracle Big Data SQL Installer

Previous Oracle Big Data SQL releases included two separate installation procedures – one for Oracle Engineered Systems and another for commodity servers. In Release 3.1, you use the same installation process for both Oracle and non-Oracle platforms. This is also true for maintenance. For all of the supported Hadoop/Oracle Database combinations there is a uniform set of steps to update the Oracle Big Data SQL configuration when there are changes to the Hadoop cluster or Oracle Database server.

New Features to Simplify ILM – Oracle Database Tablespaces in HDFS (With Smart Scan Technology)

Oracle Database ILM (Information Lifecycle Management) can now be extended to use Hadoop to store read-only Oracle Database tablespaces. When you move tablespaces from Oracle Database to HDFS, the tables, partitions, and data retain their original Oracle Database internal format, remain accessible to queries, and support the full range of Oracle Database performance optimizations and security features, including the following:

  • Smart Scan for HDFS, which enables off-load of query processing to Oracle Big Data SQL on the Hadoop cluster. Smart Scan also provides filtering of query results in Hadoop prior to the return of the data to Oracle Database. In most circumstances, this can be a significant performance optimization. Indexing, Hybrid Columnar Compression, Partition Pruning, and Oracle Database In-Memory are also supported.

  • Oracle Advanced Security Option (ASO) Transparent Encryption and Data Redaction.

Tablespaces stored in HDFS are read-only, therefore this storage is best suited to data archiving.

See Section 3.2 in the Oracle Big Data SQL User’s Guide for details.

Enhancements in Oracle Shell for Hadoop Loaders 1.2

Oracle Shell for Hadoop Loaders (OHSH) is an intuitive command line tool for data migration. You can set up resources to connect to Hive, HDFS, or Oracle Database, and then access each of these data sources through the uniform OHSH interface. OHSH is one of the ways to use Copy to Hadoop. Copy to Hadoop users can download OHSH from OTN.

OHSH 1.2 includes the following changes:

  • On-disk logging of load operations in the $HOME/.ohsh shadow directory.

  • The ability to minimize output when doing load commands. (See the help command for set outputlevel.)

  • Loading Hive tables from Oracle tables not living in the oracle user's schema.

  • Wallet and TNS usage by OHSH relies on the setting of environmental variables WALLET_LOCATION and TNS_ADMIN. The set tnsadmin and set walletlocation commands are no longer supported.

In addition, you no longer set HIVE0_URL to the fully-qualified URL of remote HiveServer2 in order to create a %hive0 resource. In OHSH 1.2, set the environmental variable HS2_HOST_PORT in bin/ohsh, which is the <hostname>:<port> pair of HiveServer2.

Enhancements to Copy To Hadoop

The new method directcopy is added to Copy to Hadoop.

This is a direct, single-step method of copying data from Oracle Database to HDFS. See Using Copy to Hadoop to do Direct Copies in the Oracle Big Data SQL User’s Guide for more information.

Granting Access – Users Now Require the BDSQL_USER Role

In prior Oracle Big Data SQL releases, all users are granted Big Data SQL access implicitly. Release 3.1 introduces the BDSQL_USER role. Users requiring Oracle Big Data SQL access must be granted this role explicitly.

You must also now grant read privileges on the BigDataSQL configuration directory object.

For example, to grant access to user1:

SQL> grant BDSQL_USER to user1; 
SQL> grant read on directory ORACLE_BIGDATA_CONFIG to user1;

Installation Instructions Moved to Oracle Big Data SQL Installation Guide

The Oracle Big Data SQL Installation Guide provides instructions how to install and uninstall the software. In releases prior to Oracle Big Data Appliance 3.1, installation instructions are in the user’s guide.