1 Introduction

Oracle Big Data SQL 3.1 can connect Oracle Database to the Hadoop environment on Oracle Big Data Appliance, or on other Hadoop systems based on CDH (Cloudera's Distribution including Apache Hadoop), HDP (Hortonworks Data Platform).

In previous releases of Big Data SQL, the installations on different combinations of Hadoop server and Oracle Database server included some differences. In the current release, there is a common installation process for all supported Hadoop systems and Oracle Database systems.

Note:

Oracle Big Data SQL 3.0.1 is the prior release and is included in the Oracle Big Data Appliance 4.7 installation bundle. Oracle Big Data SQL 3.0.1 can be enabled either in the Oracle Big Data Appliance 4.3 or higher Mammoth installation or post installation. If you choose to install Oracle Big Data SQL 3.0.1, do not use the instructions in this version of the guide. In that case, refer to the Installing On Oracle Big Data Appliance and the Oracle Exadata Database Machine in the Oracle Big Data SQL User’s Guide for Release 3.0.1 .

1.1 Oracle Big Data SQL Master Compatibility Matrix

See the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1) in My Oracle Support for up-to-date information on Big Data SQL compatibility with the following:

  • Oracle Engineered Systems.

  • Other systems.

  • Linux OS distributions and versions.

  • Hadoop distributions.

  • Oracle Database releases, including required patches.

1.2 Installation Overview

The Oracle Big Data SQL software must be installed on all Hadoop cluster nodes and all Oracle Database compute nodes.

Important: About Service Restarts

On the Hadoop-side installation, the following restarts may occur.

  • Cloudera Configuration Manager (or Ambari) may be restarted. This in itself does not interrupt any services.

  • Hive, YARN , and any other services that have a dependency on Hive or YARN (such as Impala) are restarted.

    The Hive libraries parameter is updated in order to include Oracle Big Data SQL JARs. On Cloudera installations, if the YARN Resource Manager is enabled, then it is restarted in order to set cgroup memory limit for Oracle Big Data SQL and the other Hadoop services. On Oracle Big Data Appliance, the YARN Resource Manager is always enabled and therefore always restarted.

    Note:

    The Oracle Big Data SQL installation script includes this message in its output:
    BigDataSQL: Restarting cluster...
    

    The restart is limited to “stale” components of the cluster, which can include Hive, YARN, and their dependent services.

On the Oracle Database server(s), the installation may require a database and/or Oracle Grid infrastructure restart in environments where updates are required to cell settings on the Grid nodes. See Potential Requirement to Restart Grid Infrastructure for details.

How Long Does It Take?

The installation on the Hadoop cluster may take approximately 30 minutes. The generation of the installation bundle for the database side requires a download Hadoop and Hive clients and other software. This may take 20 additional minutes, depending on Internet download speed. The average installation time for the database side can be estimated as follows:

  • 15 minutes for a single node database if a restart is not required. If a restart is required, the time will vary, depending on the size of the database.

  • On a RAC database, multiply the factors above by the number of nodes.

  • If an Oracle Grid restart is required, factor that in as well.

Outline of the Installation Steps

This is the sequence of tasks for installing Oracle Big Data SQL.

  1. Before you start:

    • Check the Oracle Big Data SQL Master Compatibility Matrix (Document 2119369.1 in My Oracle Support) for general platform compatibility.

    • Check Installation Prerequisites in this guide for required software.

    • Login credentials that you will need:

      • root credentials for both the Hadoop cluster and all Oracle Grid nodes.

        On the grid nodes you have the option of using passwordless SSH with the root user instead.

      • oracle Linux user (or other, if the database owner is not oracle)

      • The Oracle Grid user (if this is not the same as the database owner).

      • The Hadoop configuration management service (CM or Amabari) admin password.

  2. On the cluster management server (where CM or Ambari is running), download the software from Oracle. (See Downloading Oracle Big Data SQL.)

  3. On the cluster management server, check to see if Python 2.7 is installed, as described in Installation Prerequisites.

  4. Perform the Hadoop-side installation described in Installing the Hadoop Side of Oracle Big Data SQL.

    The Hadoop-side phase of the installation does the following:

    • Deploys Oracle Big Data SQL binaries along the cluster.

    • Configures Linux and network settings for the service on each cluster node.

    • Configures the BDS service on the cluster management server.

    • Acquires information needed to configure Oracle Database connections to the cluster.

  5. On the Hadoop cluster management server, run the scripts to create the bundle that installs Oracle Big Data SQL on the Oracle Database side. (Described in Creating the Database-Side Installation Bundle.)

  6. Perform the Oracle Database server-side installation. (See Installing the Oracle Database Side of Oracle Big Data SQL.)

    In this phase of the installation, you deploy the database-side installation bundle that you generated to the database nodes. Extract it from the zip file and run it.

    The database-side installer does the following:

    • Copies the Oracle Big Data SQL binaries to the database node.

    • Configures network settings for the service.

    • Inserts cluster metadata into Oracle Database.

    Important:

    Be sure to install the bundle on each compute node. The Hadoop-side installation automatically propagates the Oracle Big Data SQL software to each DataNode of the Hadoop cluster. However, the database-side installation does not work this way. You must copy the software to each database compute node and install it directly.

    In Oracle Grid environments, if cell settings need to be updated then the installer will alert you if a grid restart will be needed. (Be sure to have the grid password because if the grid restart is required, then the installation will not complete without these credentials.)

This completes the basic installation.

Tip:

See How to do a Quick Test in the Oracle Big Data SQL User’s Guide for some simple functionality tests.

Installation Log Files

On the Hadoop cluster side:

/var/log/bigdatasql 
/var/log/oracle

On the Oracle Database side:

$ORACLE_HOME/install/bds* (This is a set of files, not a directory) 
$ORACLE_HOME/bigdatasql/logs 
/var/log/bigdatasql

Tip:

If you make a support request, create a zip archive that includes all of these logs and include it in your email to Oracle Support.

See Also:

1.3 Downloading Oracle Big Data SQL

You can download Oracle Big Data SQL from the Oracle Software Delivery Cloud

  1. On the cluster management server, create a new directory or choose an existing one to be the installation source directory.
  2. Log in to the Oracle Software Delivery Cloud.
  3. Search for Oracle Big Data SQL.
  4. Select Oracle Big Data SQL 3.1.0.0.0 for Linux x86-64.
  5. Read and agree to the Oracle Standard Terms and Restrictions.
  6. From the list, select the zip file that is appropriate for your Hadoop system:
    • Oracle Big Data SQL 3.1.0 installer for Hortonworks Data Platform

    • Oracle Big Data SQL 3.1.0 installer for Cloudera Enterprise

    Each zip file contains the complete installation package.

  7. Download the appropriate package for your Hadoop system. Currently packages are provided for Hortonworks HDP and for Cloudera Enterprise systems (including CDH-based Oracle Big Data Appliance and other CDH systems identified in the Oracle Big Data SQL Master Compatibility Matrix).
    If your Hadoop system is a supported version of Oracle Big Data Appliance, download the installer for Cloudera Enterprise.
Your product bundle should include the significant files listed the table below, as well as other supporting files.

Table 1-1 Oracle Big Data SQL Product Bundle Inventory

File or Directory Description
BDSSetup/setup-bds Cluster-side installation script
BDSSetup/bds-config.json Configuration template file
BDSSetup/deployment_manager/* Deployment manager package
BDSSetup/db/hadoop-*-nativelib-*.tar.gz Hadoop native libraries
BDSSetup/db/bds-database-create-bundle.sh Script to create the database-side installation bundle
BDSSetup/db/database-install.zip Database-side pre-bundle

1.4 Upgrading From a Prior Release of Oracle Big Data SQL

As a prerequisite to upgrading to Release 3.1, you must remove earlier installations.

  1. First, completely uninstall the earlier version of Oracle Big Data SQL from the Hadoop system as well as the Oracle Database system.

    See Chapter 4 in this guide for instructions.

    The methods for uninstalling will differ, depending on whether the installation is on Oracle Engineered Systems or commodity servers.

  2. Then, proceed with the Release 3.1 installation on the Hadoop system and the Oracle Database system as described in Chapter 2 and Chapter 3 of this guide.