1 Introduction

This guide describes how to install Oracle Big Data SQL, how to reconfigure or extend the installation to accommodate changes in the environment, and, if necessary, how to uninstall the software.

This installation is done in phases. The first two phases are:

  • Installation on the node of the Hadoop cluster where the cluster management server is running.

  • Installation on each node of the Oracle Database system.

If you choose to enable optional security features available, then there is an additional third phase in which you activate the security features.

The two systems must be networked together via Ethernet or InfiniBand. (Connectivity to Oracle SuperCluster is InfiniBand only).

Note:

For Ethernet connections between Oracle Database and the Hadoop cluster, Oracle recommends 10 Gb/s Ethernet.

The installation process starts on the Hadoop system, where you install the software manually on one node only (the node running the cluster management software). Oracle Big Data SQL leverages the adminstration facilities of the cluster management software to automatically propagate the installation to all DataNodes in the cluster.

The package that you install on the Hadoop side also generates an Oracle Big Data SQL installation package for your Oracle Database system. After the Hadoop-side installation is complete, copy this package to all nodes of the Oracle Database system, unpack it, and install it using the instructions in this guide. If you have enabled Database Authentication or Hadoop Secure Impersonation, you then perform the third installation step.

1.1 Supported System Combinations

Oracle Big Data SQL supports connectivity between a number of Oracle Engineered Systems and commodity servers.

The current release supports Oracle Big Data SQL connectivity for the following Oracle Database platforms/Hadoop system combinations:

  • Oracle Database on commodity servers with Oracle Big Data Appliance.

  • Oracle Database on commodity servers with commodity Hadoop systems.

  • Oracle Exadata Database Machine with Oracle Big Data Appliance.

  • Oracle Exadata Database Machine with commodity Hadoop systems.

Note:

The phrase “Oracle Database on commodity systems” refers to Oracle Database hosts that are not the Oracle Exadata Database Machine. Commodity database systems may be either Oracle Linux or RHEL-based. “Commodity Hadoop systems” refers to Hortonworks HDP systems and to Cloudera CDH-based systems other than Oracle Big Data Appliance.

1.2 Oracle Big Data SQL Master Compatibility Matrix

See the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support) for up-to-date information on Big Data SQL compatibility with the following:

  • Oracle Engineered Systems.

  • Other systems.

  • Linux OS distributions and versions.

  • Hadoop distributions.

  • Oracle Database releases, including required patches.

1.3 Installing on Oracle Big Data Appliance

Each Oracle Big Data Appliance software release already includes a version of Oracle Big Data SQL that is ready to install, using the utilities available on the appliance.

You can download and install the standalone Big Data SQL bundle as described in this guide on all supported Hadoop platforms, including Big Data appliance. But for Big Data Appliance, the recommended method is to install the Big Data SQL package included with your Big Data Appliance software. The instructions for doing this are in the Oracle Big Data Appliance Owner's Guide. You can find them in the same location in most versions of the Owner's Guide. For example, Big Data Appliance 4.14 includes Big Data SQL 3.2.1.2 and the instructions are here: 10.9.5 Installing Oracle Big Data SQL.

The advantages of installing the version of Big Data SQL included with the appliance are:

  • The prerequisites to the installation are already met.
  • You can add Big Data SQL to the Big Data Appliance release installation by checking a checkbox in the Big Data Appliance Configuration Generation Utillity. The Mammoth utility will then automatically include Big Data SQL in the installation.
  • You can also install Big Data SQL later, using the bdacli utility. This is also a simple procedure. The command is bdacli enable big_data_sql.
  • When Big Data SQL is installed by the Mammoth utility, then during an upgrade to a newer Big Data Appliance software release, Mammoth will automatically upgrade the Hadoop side of the Big Data SQL installation to the version included in the release bundle.

The limitations of installing the version of Big Data SQL include with Big Data Appliance are:

  • The installation is performed for the Hadoop side only. You still need to install the database side of the product using the instructions in this guide. You also must refer to this guide if you want to modify the default installation.
  • The Big Data Appliance release may not include the latest available version of Big Data SQL.

Note:

If you choose to download and install a release of Big Data SQL from the Oracle Software Delivery Cloud instead of installing the version included with Big Data Appliance, then first check the Oracle Big Data SQL Master Compatibility Matrix to confirm that your current Big Data Appliance release level supports the version that you want to install.

1.4 Prerequisites for Networking

The Oracle Big Data SQL installation has the following network dependencies.

1.4.1 Port Access Requirements

Oracle Big Data SQL requires that the following ports are open though firewalls protecting the Hadoop cluster and Oracle Database.

Table 1-1 Ports That Must be Open on Both the Hadoop Cluster and Oracle Database Servers

Port Use
Ephemeral_range, i.e. 9000-65500 UDP communication from the celliniteth.ora IP address
5042 Diskmon

Table 1-2 Additional Ports That Must Be Open on the Hadoop Cluster

Hadoop Cluster Ports Where Use
50010 All nodes on unsecured clusters dfs.datanode.address
1004 All nodes on secured clusters dfs.datanode.address
50020 All nodes dfs.datanode.ipc.address
8020 NameNodes fs.defaultFS
8022 NameNodes dfs.namenode.servicerpc-address
9083 Hive Metastore & HiveServer2 node. hive.metastore
10000 Hive Metastore & HiveServer2 node. hive.server2.thrift.port
88 Kerberos KDC TCP & UDP
16000 Where HDFS Encryption is enabled KMS HTTP Port

1.5 Prerequisites for Installation on the Hadoop Cluster

The following installed software package active services, tools, and environment settings are prerequisites to the Oracle Big Data SQL installation.

Platform requirements, such as supported Linux distributions and versions, as well as supported Oracle Database releases and required patches are not listed here. See the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support) for this information.

The Oracle Big Data SQL installer checks all prerequisites before beginning the installation and reports any missing requirements on each node.

Tip:

Use bds_node_check.sh to pre-check whether or not the DataNodes of the cluster are ready for the installation.

You can manually check for them, but the easiest way is to run bds_node_check.sh on each node. This script returns a complete readiness report. After you download the installation bundle, unzip it, and execute the run file, bds_node_check.sh will be available, along with the tools to perform the installation. See Check for Hadoop-Side Prerequisites With bds_node_check.sh for details.

Note:

  • Oracle Big Data SQL 4.0 does not support single user mode for Cloudera clusters.
  • The JDK is no longer a prerequisite. JDK 8u171 is included with this release of Oracle Big Data SQL.

1.5.1 Software Package Requirements for all DataNodes

The following packages must be pre-installed on all Hadoop cluster nodes before installing Oracle Big Data SQL. These are already installed on releases of Oracle Big Data Appliance that support Oracle Big Data SQL 4.0. Several additional packages are required if Query Server will be installed.

libaio
dmidecode
net-snmp
net-snmp-utils
glibc
libgcc
libstdc++
libuuid
ntp
perl
perl-libwww-perl
perl-libxml-perl
perl-XML-LibXML
perl-Time-HiRes
perl-XML-SAX
perl-Env (only for Oracle Linux 7 and RHEL 7)
rpm
curl
unzip
zip
tar
wget
uname

The following packages are required only if you install Query Server:


expect 
procmail

The yum utility is the recommended method for installing these packages. All of them can be installed with a single yum command. For example (not including expect and procmail):

# yum -y install dmidecode net-snmp net-snmp-utils perl perl-libs perl-Time-HiRes perl-libwww-perl perl-libxml-perl perl-XML-LibXML perl-XML-SAX perl-Env fuse fuse-libs rpm curl unzip zip tar wget uname -y libaio gcc

Special Prequisites for the Configuration Management Server

On the node where CM or Ambari runs (usually Node 3 on Oracle Big Data Appliance), you may also need to install a compatible version of Python as well as the Python Cryptography package. See the next section to determine whether or not this is necessary. If you do need to manually install a version of Python, then add openssl-devel to the yum parameter string:

# yum -y install dmidecode net-snmp net-snmp-utils perl perl-libs perl-Time-HiRes perl-libwww-perl perl-libxml-perl perl-XML-LibXML perl-XML-SAX perl-Env fuse fuse-libs rpm curl unzip zip tar wget uname openssl-devel -y libaio gcc

Other Prequisites

  • HDFS, YARN, and Hive must be running on the cluster at Oracle Big Data SQL installation time and runtime. They can be installed as parcels or packages on Cloudera CDH and as stacks on Hortonworks HDP.
  • On CDH, if you install the Hadoop services required by Oracle Big Data SQL as packages, be sure that they are installed from within CM. Otherwise, CM will not be able to manage them. This is not an issue with parcel-based installation.

1.5.2 Python Requirements for the Cluster Management Node

On the node where the CM or Ambari cluster management service is running, the Oracle Big Data SQL installer requires Python 2.7.5 or greater, but less that 3.0. You must also add the Python Cryptography package to this Python installation if it is not present.

Jaguar, the Oracle Big Data SQL installer, requires Python (>= 2.7.5 <3.0) locally on the node where you run the installer. This is the node where CM or Ambari cluster management service is running. If any installation of Python in this supported version range is already present, you can use it to run Jaguar.

  • On Oracle Big Data Appliance or commodity Hadooop clusters running Oracle Linux 6 or 7:

    Do not manually install Python to support the Jaguar installer. There is a compatible Python package already available on the appliance and the Jaguar installer will automatically find and use this package without prompting you.

  • On commodity Hadoop clusters running Oracle Linux 6:

    Install a compatible version of Python if not present.

  • On Oracle Big Data Appliance or commodity Hadooop clusters running Oracle Linux 5:

    Install a compatible version of Python if not present. On Oracle Big Data Appliance, install it as secondary installation only.

Important:

On Oracle Big Data Appliance do not overwrite the default Python installation with a newer version or switch the default to a newer version. This restriction may also apply other supported Hadoop platforms. Consult the documentation for the CDH or HDP platform you are using.

On Oracle Linux 5 or 6 on a commmodity Hadoop platform, the Jaguar installer will prompt you for the path of the compatible Python installation.

Installing the Required Python Cryptography Module

You can use Python's pip utility to install the Python Cryptography module. Use scl if Python (>= 2.7.5 <3.0) is not the default. This example installs pip and then installs and imports the module.

# scl enable python27 "pip install -U pip"  
# scl enable python27 "pip install cryptography"  
# scl enable python27 "python -c 'import cryptography; print \"ok\";'" 

You can then run the Jaguar installer.

1.5.2.1 Adding Python 2.7.5 or Greater as a Secondary Installation

Below is a procedure for adding the Python 2.7.5 or greater (but less than 3.0) as a secondary installation.

Note:

If you manually install Python, first ensure that the openssl-devel package is installed:

# yum install -y openssl-devel
# pyversion=2.7.5
# cd /tmp/
# mkdir py_install
# cd py_install
# wget https://www.python.org/static/files/pubkeys.txt
# gpg --import pubkeys.txt
# wget https://www.python.org/ftp/python/$pyversion/Python-$pyversion.tgz.asc
# wget https://www.python.org/ftp/python/$pyversion/Python-$pyversion.tgz
# gpg --verify Python-$pyversion.tgz.asc Python-$pyversion.tgz
# tar xfzv Python-$pyversion.tgz
# cd Python-$pyversion
# ./configure --prefix=/usr/local/python/2.7.5
# make
# mkdir -p /usr/local/python/2.7.5
# make install
# export PATH=/usr/local/python/2.7.5/bin:$PATH

If you create a secondary installation of Python, it is strongly recommended that you apply Python update regularly to include new security fixes.

Important: On Oracle Big Data Appliance, do not update the mammoth-installed Python unless directed to do so by Oracle.

1.5.2.2 When You May Need to Use scl to Invoke the Correct Python Version

If there is more than one Python release on the cluster managerment server, then be sure that Python 2.7.5 or greater (but less than 3.0) is invoked for any operations associated with this release of Oracle Big Data SQL.

If the scl utility is available, you can use to invoke Python 2.7.5 or greater explicitly. This is necessary if a different Python installation is the default. In that case, use scl or another method to invoked the correct Python version for scripts as well as Python-based utilities such as Jaguar, the Oracle Big Data SQL installer,

[root@myclusteradminserver:BDSjaguar] # scl enable python27 "./jaguar install bds-config.json"

There is one exception to this requirement. On Oracle Big Data Appliance clusters running Oracle Linux 6 or Oracle Linux 7, it is not necessary to use scl explicitly in order to run the Jaguar installer. In this case, you can invoke Jaguar directly, as in:

[root@myclusteradminserver:BDSjaguar] # ./jaguar install bds-config.json

Jaguar itself will silently invoke scl if it is available and if scl is required to invoke a compatible Python release in this environment.

Note that this only applies to Jaguar on Big Data Appliance. To run any other Python scripts required by Oracle Big Data SQL (even on Oracle Big Data Appliance), use scl if Python 2.7.5 is not the default.

For example, to install the required Python Cryptography package, you may need to invoke scl to ensure that you are using the correct version of Python:
# scl enable python27 pip install cryptography

1.5.3 Environment Settings

The following environment settings are required prior to the installation.

  • ntp enabled
  • Minimum ratio of shmmax to shmall:

    shmmax = shmall * PAGE_SIZE

  • shmmax must be greater that physical memory.
  • swappiness set between 5 and 25.
  • All *.rp_filter instances disabled
  • Socket buffer size equal to or greater than 4194304

1.5.4 Proxy-Related Settings

The installation process requires Internet access in order to download some packages from Cloudera or Hortonworks sites.

If a proxy is required for Internet access, then either ensure that the following are set as Linux environment variables, or, enable the equivalent parameters in the Jaguar configuration file, bds-config.json)

  • http_proxy and https_proxy

  • no_proxy

    Set no_proxy to include the following: "localhost,127.0.0.1,<Comma—separated list of the hostnames in the cluster (in FQDN format).>".

On Cloudera CDH, clear any proxy settings in Cloudera Manager administration before running the installation.

See Also:

Table 2-1 describes the use of http_proxy , https_proxy, and other parameters in the installer configuration file.

1.5.5 CPU, Memory, and Networking Requirements

Oracle Big Data SQL requires the following.

Minimum CPU and Memory for Each Node

  • 8 CPU cores
  • 16 GB RAM

Networking

If Hadoop traffic is over VLANs, all DataNodes must be on the same VLAN.

1.6 Prerequisites for Installation on Oracle Database Nodes

Installation prerequisites vary, depending on type of Hadoop system and Oracle Database system where Oracle Big Data SQL will be installed.

Patch Level

See the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1) in My Oracle Support for supported Linux distributions, Oracle Database release levels, and required patches.

Note:

Be sure that the correct Bundle Patch and any one-off patches identified in the Compatibility Matrix have been pre-applied before starting this installation.

Before you begin the installation, review the additional environmental and user access requirements described below.

Packages Required for Kerberos

If you are installing on a Kerberos-enabled Oracle Database System, these package must be pre-installed:

  • krb5-workstation

  • krb5-libs

Packages for the “Oracle Tablespaces in HDFS” Feature

Oracle Big Data SQL provides a method to store Oracle Database tablespaces in the Hadoop HDFS file system. The following RPMs must be installed:

  • fuse

  • fuse-libs

# yum -y install fuse fuse-libs

Required Environment Variables

The following are always required. Be sure that these environment variables are set correctly.

  • ORACLE_SID

  • ORACLE_HOME

Note:

GI_HOME (which was required in Oracle Big Data SQL 3.1 and earlier) is no longer required.

Required Credentials

  • Oracle Database owner credentials (The owner is usually the oracle Linux account.)

    Big Data SQL is installed as an add-on to Oracle Database. Tasks related directly to database instance are performed through database owner account (oracle or other).

  • Grid user credentials

    In some cases where Grid infrastructure is present, it must be restarted. If the system uses Grid then you should have the Grid user credentials on hand in case a restart is required.

The Linux users grid and oracle (or other database owner) must both be in the same group (usually oinstall). This user requires permission to read all files owned by the grid user and vice versa.

All Oracle Big Data SQL files and directories are owned by the oracle:oinstall user and group.

Required Grid Infrastructure Patches

You can run the script bds-validate-grid-patches.sh to check that the Grid Infrastructure includes all of the patches that are required by the Oracle Big Data Installation. See Check for Required Grid Patches With bds-validate-grid-patches.sh

1.7 Downloading Oracle Big Data SQL and Query Server

You can download Oracle Big Data SQL from the Oracle Software Delivery Cloud (also known as “eDelivery”).

There are three files to download:

  • The primary BDSJaguar bundle, which contains the Jaguar installer for Oracle Big Data SQL

    V982738-01.zip

  • The two parts of the optional Query Server bundle
    V982741-01_1of2.zip
    V982741-01_2of2.zip

If you want to use Query Server, then download the two parts of the Query Server bundle in addition to the primary bundle.

Note:

You cannot use Query Server apart from Oracle Big Data SQL. Query Server is also not installed separately. It can be included in the Jaguar-driven installation as described below.
  1. Log on to the Oracle Software Delivery Cloud.
  2. Search for “Oracle Big Data SQL”.
  3. Select Oracle Big Data SQL 4.0.0.0 for Linux x86-64.
    The actual version available may be greater than 4.0.0.0 n.n.n. The same bundle is compatible for with all supported Hadoop clusters.
  4. Agree to the Oracle Standard Terms and Restrictions. Then you can download the bundle.
  5. Copy the bundle to the Hadoop node that hosts the cluster management server (CDH or Ambari). On Oracle Big Data Appliance this is usually Node3. Chose any location. Copy the Extras bundle to the same location if you intend to use Query Server.
  6. Log on as root and unzip the BDSJaguar bundle.
    You will see that the Release 4.0 bundle contains only the run file.
    # unzip V982738-01.zip
    Archive:  V982738-01.zip
      inflating: BDSJaguar-4.0.0.run
      inflating: readme
    
  7. Before executing the run file, decide if you want to keep the default extraction target for the installation and configuration files. The default is /opt/oracle. If not, then you can change it by setting the JAGUAR_ROOT environment variable.
    # export JAGUAR_ROOT=<my_directory>
    Throughout this guide, the placeholder Big Data SQL Install Directory refers to the JAGUAR_ROOT where you extracted the files.

    Important:

    This is the permanent working directory from which you configure and install Oracle Big Data SQL. You will also need the tools in this directory post installation. It is strongly recommended that you secure this directory against accidental or unauthorized modification or deletion. The primary file to protect is your installation configuration file (by default, bds-config.json). As you customize the configuration to your needs, this file becomes the record of the state of the installation. It is useful for recovery purposes and as a basis for further changes.
  8. Execute the run file.
    # ./BDSJaguar-4.0.0.run
    BDSJaguar-4.0.0.run: platform is: Linux
    BDSJaguar-4.0.0.run: Jaguar directory created successfull
    BDSJaguar-4.0.0.run: Based on features selected in config.json file, extra bundles could be required
    BDSJaguar-4.0.0.run: Please go to /opt/oracle/BDSJaguar
  9. Optional Step: Include Query Server.
    If you want to include Query Server in the installation, then also unzip V982741-01_1of2.zip and V982741-01_2of2.zip to extract the two parts of the bundle. You will also see that both files contain the script join.sh. Run this script in order to assemble the bundle You can use the copy of join.sh from either zip file.
    $ unzip -j -o V982741-01_1of2.zip
    Archive: V982741-01_1of2.zip 
      inflating: BDSQLQS82d323d472f5c4666e1a7e48cd2d75b9-00 
      inflating: join.sh 
      inflating: readme.1st 
    
    $ unzip -j -o V982741-01_2of2.zip
    Archive: V982741-01_2of2.zip 
      inflating: BDSQLQS82d323d472f5c4666e1a7e48cd2d75b9-01 
      inflating: join.sh 
      inflating: readme.1st
      
    $ ./join.sh
    Re-assembling Big Data SQL Query Server bundle
    Detected files:
    BDSQLQS82d323d472f5c4666e1a7e48cd2d75b9-00
    BDSQLQS82d323d472f5c4666e1a7e48cd2d75b9-01
    Joining 2 files
    BigDataSQL-4.0.0-QueryServer.zip successfully created !!!

    Then, unzip the newly created bundle to extract the QueryServer run file.

    # unzip BDSExtra-4.0.0-QueryServer.zip
    ...
    # ./BDSExtra-4.0.0-QueryServer.run

    To include Query Server in the Big Data SQL installation, be sure to execute this extra run file before running the Jaguar installer.

1.8 Upgrading From a Prior Release of Oracle Big Data SQL

On the Oracle Database side, Oracle Big Data SQL can now be installed over a previous release with no need to remove the older software. The install script automatically detects and upgrades an older version of the software.

Upgrading the Oracle Database Side of the Installation

On the database side, you need to perform the installation only once to upgrade the database side for any clusters connected to that particular database. This is because the installations on the database side are not entirely separate. They share the same set of Oracle Big Data SQL binaries. This results in a convenience for you – if you upgrade one installation on a database instance then you have effectively upgraded the database side of all installations on that database instance.

Upgrading the Hadoop Cluster Side of the Installation

If existing Oracle Big Data SQL installations on the Hadoop side are not upgraded, these installations will continue to work with the new Oracle Big Data SQL binaries on the database side, but will not have access to the new features in this release.

1.9 Important Terms and Concepts

These are special terms and concepts in the Oracle Big Data SQL installation.

Oracle Big Data SQL Installation Directory

On both the Hadoop side and database side of the installation, the directory where you unpack the installation bundle is not a temporary directory which you can delete after running the installer. These directories are staging areas for any future changes to the configuration. You should not delete them and may want to secure them against accidental deletion.

Database Authentication Keys

Database Authentication uses a key that must be identical on both sides of the installation (the Hadoop cluster and Oracle Database). The first part of the key is created on the cluster side and stored in the .reqkey file. This file is consumed only once on the database side, to connect the first Hadoop cluster to the database. Subsequent cluster installations use the configured key and the .reqkey file is no longer required. The full key (which is completed on the database side) is stored in an .ackkey file. This key is included in the part of the ZIP file created by the database-side installation and must be copied back to the Hadoop cluster by the user.

Request Key

By default, the Database Authentication feature is enabled in the configuration. (You can disable it by setting the parameter database_auth_enabled to “false” in the configuration file.) When this setting is true, then the Jaguar install and reconfigure operations can generate a request key (stored in a file with the extension .reqkey ). This key is part of a unique GUID-key pair used for Database Authorization. This GUID-key pair is generated during the database side of the installation. The Jaguar operation creates a request key if the command line includes the --requestdb command line parameter along with a single database name (or a comma separated list of names). In this example, the install operation creates three keys, one for each of three different databases:

# ./jaguar --requestdb orcl,testdb,proddb install
The operation creates the request keys files in the directory <Oracle Big Data SQL install directory>/BDSJaguar/dbkeys. In this example, Jaguar install would generate these request key files:
orcl.reqkey
testdb.reqkey
proddb.reqkey

Prior to the database side of the installation, you copy request key to the database node and into the path of the database-side installer, which at runtime generates the GUID-key pair.

Acknowledge Key

After you copy a request key into the database-side installation directory, then when you run the database-side Oracle Big Data SQL installer it generates a corresponding acknowledge key . The acknowledge key is the original request key, paired with a GUID. This key is stored in a file that is included in a ZIP archive along with other information that must be returned to the Hadoop cluster by the user. .

Database Request Operation (databasereq)

The Jaguar databasereq operation is “standalone” way to generate a request key. It lets you create one or more request keys without performing an install or reconfigure operation:

# ./jaguar --requestdb <database name list> databasereq {configuration file | null}

Database Acknowledge ZIP File

If Database Authentication, or Hadoop Secure Impersonation is enabled for the configuration, then the database-side installer creates a ZIP bundle configuration information . If Database Authentication is enabled, this bundle includes the acknowledge key file. Information required for Hadoop Secure Impersonation is also included if that option was enabled. Copy this ZIP file back to/opt/oracle/DM/databases/conf on the Hadoop cluster management server for processing.

Database Acknowledge is a third phase of the installation and is performed only when any of the three security features cited above are enabled.

Database Acknowledge Operation (databaseack)

If you have opted to enable any or all of three new security features (Database Authentication, or Hadoop Secure Impersonation), then after copying the Database Acknowledge ZIP file back to the Hadoop cluster, run the Jaguar Database Acknowledge operation.

The setup process for these features is a “round trip” that starts on the Hadoop cluster management server, where you set the security directives in the configuration file and run Jaguar, to the Oracle Database system where you run the database-side installation, and back to the Hadoop cluster management server where you return a copy of the ZIP file generated by the database-side installation. The last step is when you run databaseack, the Database Acknowledge operation described in the outline below. Database Acknowledge completes the setup of these security features.

Default Cluster

The default cluster is the first Oracle Big Data SQL connection installed on an Oracle Database. In this context, the term default cluster refers to the installation directory on the database node where the connection to the Hadoop cluster is established. It does not literally refer to the Hadoop cluster itself. Each connection between a Hadoop cluster and a database has its own installation directory on the database node.

An important aspect of the default cluster is that the setting for Hadoop Secure Impersonation in the default cluster determines that setting for all other cluster connections to a given database. If you run a Jaguar reconfigure operation some time after installation and use it to turn Hadoop Secure Impersonation in the default cluster on or off, this change is effective for all clusters associated with the database.

If you perform installations to add additional clusters, the first cluster remains the default. If the default cluster is uninstalled, then next one (in chronological order of installation) becomes the default.

1.10 Installation Overview

The Oracle Big Data SQL software must be installed on all Hadoop cluster DataNodes and all Oracle Database compute nodes.

Important: About Service Restarts

On the Hadoop-side installation, the following restarts may occur.

  • Cloudera Configuration Manager (or Ambari) may be restarted. This in itself does not interrupt any services.

  • Hive, YARN , and any other services that have a dependency on Hive or YARN (such as Impala) are restarted.

    The Hive libraries parameter is updated in order to include Oracle Big Data SQL JARs. On Cloudera installations, if the YARN Resource Manager is enabled, then it is restarted in order to set cgroup memory limit for Oracle Big Data SQL and the other Hadoop services. On Oracle Big Data Appliance, the YARN Resource Manager is always enabled and therefore always restarted.

On the Oracle Database server(s), the installation may require a database and/or Oracle Grid infrastructure restart in environments where updates are required to Oracle Big Data SQL cell settings on the Grid nodes. See Potential Requirement to Restart Grid Infrastructure for details.

If a Previous Version of Oracle Big Data SQL is Already Installed

On commodity Hadoop systems (those other than Oracle Big Data Appliance) the installer automatically uninstalls any previous release from the Hadoop cluster.

You can install Oracle Big Data SQL on all supported Oracle Database systems without uninstalling a previous version.

Before installing this Oracle Big Data SQL release on Oracle Big Data Appliance, you must use bdacli to manually uninstall the older version if it had been enabled via bdacli or Mammoth. If you are not sure, try bdacli disable big_data_sql. If the disable comment fails, then the installation was likely done with the setup-bds installer. In that case, you can install the new version Oracle Big Data SQL without disabling the old version.

How Long Does It Take?

The table below estimates the time required for each phase of the installation. Actual times will vary.

Table 1-3 Installation Time Estimates

Installation on the Hadoop Cluster Installation on Oracle Database Nodes

Eight minutes to 28 minutes

The Hadoop side installation may take eight minute if all resources are locally available. An additional 20 minutes or more may be required if resources must be downloaded from the Internet.

The average installation time for the database side can be estimated as follows:

  • 15 minutes for a single node database if a restart is not required. If a restart is required, the time will vary, depending on the size of the database.

  • On a RAC database, multiply the factors above by the number of nodes.

  • If an Oracle Grid restart is required, factor that in as well.

The installation process on Hadoop side includes installation on the Hadoop cluster as well as generation of the bundle for the second phase of the installation on the Oracle Database side. The database bundle includes Hadoop and Hive clients and other software. The Hadoop and Hive client software enable Oracle Database to communicate with HDFS and the Hive Metastore. The client software is specific to the version of the Hadoop distribution (i.e. Cloudera or Hortonworks). As explained later in this guide, you can download these packages prior to the installation, set up an URL or repository within your network, and make that target available to the installation script. If instead you let the installer download them from the Internet, the extra time for the installation depends upon your Internet download speed.

Pre-installation Steps

  • Check to be sure that the Hadoop cluster and the Oracle Database system both meet all of the prerequisites for installation. On the database side, this includes confirming that all of the required patches are in installed. Check against these sources:
    • Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support)
    • Sections 2.1 in this guide, which identifies the prerequisites for installing on the Hadoop cluster. Also see Section 3.1, which describes the prerequisites for installing the Oracle Database system component of Oracle Big Data SQL.

    Oracle Big Data Appliance already meets all prerequisites.
  • Have these login credentials available:

    • root credentials for both the Hadoop cluster and all Oracle Grid nodes.

      On the grid nodes you have the option of using passwordless SSH with the root user instead.

    • oracle Linux user (or other, if the database owner is not oracle)

    • The Oracle Grid user (if this is not the same as the database owner).

    • The Hadoop configuration management service (CM or Amabari) admin password.

  • On the cluster management server (where CM or Ambari is running), download the Oracle Big Data SQL installation bundle and unzip it into a permanent location of your choice. (See Downloading Oracle Big Data SQL and Query Server.)

Outline of the Installation Steps

This is an overview to familiarize you with the process. Complete installation instructions are provided in Chapters 2 and 3.

The installation always has two phases – the installation on the Hadoop cluster and the subsequent installation on the Oracle Database system. It may also include the third, “Database Acknowledge,” phase, depending on your configuration choices.

  1. Start the Hadoop-Side Installation

    Review the installation parameter options described in Chapter 2. The installation on the Hadoop side is where you make all of the decisions about how to configure Oracle Big Data SQL, including those that affect the Oracle Database side of the installation.

  2. Edit the bds-config.json file provided with the bundle in order to configure the Jaguar installer as appropriate for your environment. You could also create your own configuration file using the same parameters.

  3. Run the installer to perform the Hadoop-side installation as described in Installing or Upgrading the Hadoop Side of Oracle Big Data SQL.

    If the Database Authentication feature is enabled, then Jaguar must also output a “request key” (.reqkey) file for each database that will connect to the Hadoop cluster. You generate this file by including the —-requestdb parameter in the Jaguar install command (the recommended way). You can also generate the file later with other Jaguar operations that support the —-requestdb.

    This file contains one half of a GUID-key pair that is used in Database Authentication. The steps to create and install the key are explained in more detail in the installation steps.

  4. Copy the database-side installation bundle to any temporary directory on each Oracle Database compute node.

  5. If a request key file was generated, copy over that file to the same directory.

  6. Start the Database-Side Installation

    Log on to the database server as the database owner Unzip bundle and execute the run file it contained. The run file does not install the software. It sets up an installation directory under $ORACLE_HOME.

  7. As the database owner, perform the Oracle Database server-side installation. (See Installing or Upgrading the Oracle Database Side of Oracle Big Data SQL.)

    In this phase of the installation, you copy the database-side installation bundle to a temporary location on each compute node. If a .reqkey file was generated for the database, then copy the file into the installation directory before proceeding. Then run the bds-database-install.sh installation program.

    The database-side installer does the following:

    • Copies the Oracle Big Data SQL binaries to the database node.

    • Creates all database metadata and MTA extprocs (external processes) required to access the Hadoop cluster and configures the communication settings.

    Important:

    Be sure to install the bundle on each database compute node. The Hadoop-side installation automatically propagates the software to each node of the Hadoop cluster. However, the database-side installation does not work this way. You must copy the software to each database compute node and install it directly.

    In Oracle Grid environments, if cell settings need to be updated, then a Grid restart may be needed. Be sure that you know the Grid password. If a Grid restart is required, then you will need the Grid credentials to complete the installation.

  8. If Applicable, Perform the “Database Acknowledge” Step

    If Database Authentication or Hadoop Secure Impersonation were enabled, the database-side installation generates a ZIP file that you must copy back to Hadoop cluster management server. The file is generated in the installation directory under $ORACLE_HOME and has the following filename format.

    <Hadoop cluster name>-<Number nodes in the cluster>-<FQDN of the cluster management server node>-<FQDN of this database node>.zip
    Copy this file back to /opt/oracle/DM/databases/conf on the Hadoop cluster management server and then as root run the Database Acknowledge command from the BDSJaguar directory:
    # cd <Big Data SQL install directory>/BDSJaguar
    # ./jaguar databaseack <bds-config.json>

Workflow Diagrams

Complete Installation Workflow

The figure below illustrates the complete set of installation steps as described in this overview.

Note:

Before you start the steps shown in the workflow, be sure that both systems meet the installation prerequisites.

Figure 1-1 Installation Workflow

Note:

The --reqkey parameter in this diagram actually requires the full path to the file, as in /bds-databse-install.sh --reqkey=/opt/tmp/orcl.reqkey.
Description of Figure 1-1 follows
Description of "Figure 1-1 Installation Workflow"

Key Generation and Installation

The figure below focuses on the three steps required to create and installing the GUID-key pair used in Database Authentication. The braces around parameters of the Jaguar command indicate that one of the operations in the list is required. Each of these operations supports use of the —-requestdb parameter. Note that although updatenodes is included in this list, updatenodes is deprecated in this release. You should use reconfigure instead.

Figure 1-2 Generating and Installing the GUID-Key Pair for Database Authentication

Description of Figure 1-2 follows
Description of "Figure 1-2 Generating and Installing the GUID-Key Pair for Database Authentication"

1.11 Post-Installation Checks

Validating the Installation With bdschecksw and Other Tests

  • The script bdschecksw now runs automatically as part of the installation. This script gathers and analyzes diagnostic information about the Oracle Big Data SQL installation from both the Oracle Database and the Hadoop cluster sides of the installation. You can also run this script as a troubleshooting check at any time after the installation. The script is in $ORACLE_HOME/bin on the Oracle Database server.
    $ bdschecksw --help
    See Running Diagnostics With bdachecksw in the Oracle Big Data SQL User’s Guide for a complete description.
  • Also see How to do a Quick Test in the user’s guide for some additional functionality tests.

Checking the Installation Log Files

You can examine these log files after the installation.

On the Hadoop cluster side:

/var/log/bigdatasql 
/var/log/oracle

On the Oracle Database side:

$ORACLE_HOME/install/bds* (This is a set of files, not a directory) 
$ORACLE_HOME/bigdatasql/logs 
/var/log/bigdatasql

Tip:

If you make a support request, create a zip archive that includes all of these logs and include it in your email to Oracle Support.

Other Post-Installation Steps to Consider

  • Read about measures you can take to secure the installation. (See Securing Oracle Big Data SQL.)

  • Learn how to modify the Oracle Big Data SQL configuration when changes occur on the Hadoop cluster and in the Oracle Database installation. (See Expanding or Shrinking an Installation.)

  • If you have used Copy to Hadoop in earlier Oracle Big Data SQL releases, learn how Oracle Shell for Hadoop Loaders can simplify Copy to Hadoop tasks. (See Additional Tools Installed.)

1.11.1 Run bds_cluster_node_helper.sh to Get Information About the Oracle Big Data SQL Installation on a Node

The script bds_cluster_node_helper.sh aggregates information about a Hadoop cluster node that is useful for Oracle Big Data SQL maintenance purposes.

This script provides options to do the following:
  • Show Oracle Big Data SQL status information via bdscli, the Oracle Big Data SQL command line interface.
  • Collect and archive log data that is pertinent to Oracle Big Data SQL operations. There are three levels to the scope of the data collection.
  • Set some parameters that control the level of debug information in logs that are collected.

You can find this script at <Oracle Big Data SQL installation directory>/BDSJaguar. It must be run as root.

Usage

# bds_cluster_node_helper.sh [OPTIONS]

Table 1-4 Parameters for bds_cluster_node_helper.sh

Parameter Description
-h, --help Show usage information.
-v, --version Show the Oracle Big Data Appliance release version.
--skip-bdscli-info Skip bdscli information gathering.

Default: false.

Runs the following bdscli commands and returns the output:

bdscli -e "list bdsql"
bdscli -e "list bdsql detail"
bdscli -e "list offloadgroup"
bdscli -e "list offloadgroup detail"
bdscli -e "list quarantine"
--get-logs [--log-level=<1|2|3>] [--bundle-name=<name>] [--wrap, --envelop] Generates a gzipped tar file of logs.

Default: false.

Options:

  • --log-level=<supported value>

    Specifies the log level.

  • --bundle-name=<name>

    Names the .tar.gz created.

  • --wrap, --envelop

    Prepares the bundle to be sent over email.

Note:

See the table below for more detail on each -get-logs sub-option.
--set-debug=<on|off> --set-debug=<supported value>

Set or remove the _cell_server_event parameter from the celllinit.ora file.

  • --set-debug=on
    • In the files /opt/oracle/bigdatasql/bdcell-12.1/bigdata-log4j.properties and /opt/oracle/bigdatasql/bdcell-12.2/bigdata-log4j.properties, this sets the parameter value log4j.logger.oracle.hadoop.sql=ALL.In cellinit.ora, sets _cell_server_event as follows: _cell_server_event="trace[CELLSRV_Disk_Layer] disk=highest, memory=highest"
  • --set-debug=off
    • In the file /opt/oracle/cell/cellsrv/deploy/config/cellinit.ora, this setting remove the parameter whose prefix is _cell_server_event="trace[CELLSRV_Disk_Layer]

    • In bigdata-log4j.properties, sets log4j.logger.oracle.hadoop.sql=OFF.
The table below provides full details on bds_cluster_node_helper.sh --get-logs sub-options.

Table 1-5 Sub-Parameters for --get-logs Option of bds_cluster_node_helper.sh

bds_cluster_node_helper.sh --get-logs sub-options Description
--get-logs --log-level=<1|2|3>

Specifies the log level.

Default: 1.

The scope of the recovery for each log level is as follows:

  • --get-logs, or --get-logs --log-level=1:
    /var/log/bigdatasql
    /opt/oracle/cell/.install_log.txt
  • --get-logs --log-level=2:
    Includes level 1 logging, plus:
    
    /var/log/oracle
    /opt/oracle/cell/cellsrv/deploy/wls/logs
    /opt/oracle/cell/cellsrv/deploy/msdomain/servers/msServer/logs
  • --get-logs --log-level=3

    Includes level 1 and 2 logging, plus:

    
    /var/run/cloudera-scm-agent/process (on Cloudera clusters)
    /var/lib/ambari-agent/data (on Hortonworks HDP clusters)

Example: # bds_cluster_node_helper.sh --get-logs --log-level=2

--get-logs --bundle-name=<name> Give a name to the created tar.gz bundle.

Default: bds-<Oracle Big Data SQL version>-<YYYY-mm-dd-HH-MM-SS>. For example, bds_4.0.0_2019-01-20_23-55-03.tar.gz

The customer can use this option to specify a different name. For example:
# bds_cluster_node_helper.sh --get-logs --bundle-name=logs_from_node1.tar.gz
--get-logs [--wrap | --envelop] Prepares the bundle for email transmission.

Default: false.

Examples:
# bds_cluster_node_helper.sh --get-logs --wrap
# bds_cluster_node_helper.sh --get-logs --envelop

These sub-options are equivalent.

1.12 Using the Installation Quick Reference

Once you are familiar with the functionality of the Jaguar utility on the Hadoop side and bds-database-install.sh on the Oracle Database side, you may find it useful to work from the Installation Quick Reference for subsequent installations. This reference provides an abbreviated description of the installation steps. It does not fully explain each step, so users should already have a working knowledge of the process. Links to relevant details in this and other documents are included.