1 Introduction

This guide describes how to install Oracle Big Data SQL, how to reconfigure or extend the installation to accommodate changes in the environment, and, if necessary, how to uninstall the software.

1.1 About Installation

The Oracle Big Data SQL installation is done in phases.

The first two phases are:

  • Installation on the node of the Hadoop cluster where the cluster management server is running.

  • Installation on each node of the Oracle Database system.

  • (Optional) Activate security features if you have chosen to enable them.

The Hadoop cluster and Oracle Database system must be networked together via Ethernet or InfiniBand. (Connectivity to Oracle SuperCluster is InfiniBand only).

Note:

For Ethernet connections between Oracle Database and the Hadoop cluster, Oracle recommends 10 Gb/s Ethernet.

The installation process starts on the Hadoop system, where you install the software manually on one node only (the node running the cluster management software). Oracle Big Data SQL leverages the administration facilities of the cluster management software to automatically propagate the installation to all DataNodes in the cluster.

The package that you install on the Hadoop side also generates an Oracle Big Data SQL installation package for your Oracle Database system. After the Hadoop-side installation is complete, copy this package to all nodes of the Oracle Database system, unpack it, and install it using the instructions in this guide. If you have enabled Database Authentication or Hadoop Secure Impersonation, you then perform the third installation step.

1.2 Supported System Combinations

Oracle Big Data SQL supports connectivity between a number of Oracle Engineered Systems and commodity servers.

The current release supports Oracle Big Data SQL connectivity for the following Oracle Database platforms/Hadoop system combinations:

  • Oracle Database on commodity servers with Oracle Big Data Appliance.

  • Oracle Database on commodity servers with commodity Hadoop systems.

  • Oracle Exadata Database Machine with Oracle Big Data Appliance.

  • Oracle Exadata Database Machine with commodity Hadoop systems.

Note:

The phrase “Oracle Database on commodity systems” refers to Oracle Database hosts that are not the Oracle Exadata Database Machine. Commodity database systems may be either Oracle Linux or RHEL-based. “Commodity Hadoop systems” refers to Hortonworks HDP systems and to Cloudera CDH-based systems other than Oracle Big Data Appliance.

1.3 Oracle Big Data SQL Master Compatibility Matrix

See the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support) for up-to-date information on Big Data SQL compatibility with the following:

  • Oracle Engineered Systems.

  • Other systems.

  • Linux OS distributions and versions.

  • Hadoop distributions.

  • Oracle Database releases, including required patches.

1.4 Installing on Oracle Big Data Appliance

Each Oracle Big Data Appliance software release already includes a version of Oracle Big Data SQL that is ready to install, using the utilities available on the appliance.

You can download and install the standalone Big Data SQL bundle as described in this guide on all supported Hadoop platforms, including Big Data appliance. But for Big Data Appliance, the recommended method is to install the Big Data SQL package included with your Big Data Appliance software. The instructions for doing this are in the Oracle Big Data Appliance Owner's Guide. You can find them in the same location in most versions of the Owner's Guide. For example, Big Data Appliance 5.1 and 5.2 include Big Data SQL 4.0 (not 4.1) and the instructions are here: 10.9.5 Installing Oracle Big Data SQL.

The advantages of installing the version of Big Data SQL included with the appliance are:

  • The prerequisites to the installation are already met.
  • You can add Big Data SQL to the Big Data Appliance release installation by checking a checkbox in the Big Data Appliance Configuration Generation Utillity. The Mammoth utility will then automatically include Big Data SQL in the installation.
  • You can also install Big Data SQL later, using the bdacli utility. This is also a simple procedure. The command is bdacli enable big_data_sql.
  • When Big Data SQL is installed by the Mammoth utility, then during an upgrade to a newer Big Data Appliance software release, Mammoth will automatically upgrade the Hadoop side of the Big Data SQL installation to the version included in the release bundle.

The limitations of installing the version of Big Data SQL include with Big Data Appliance are:

  • The installation is performed for the Hadoop side only. You still need to install the database side of the product using the instructions in this guide. You also must refer to this guide if you want to modify the default installation.
  • The Big Data Appliance release may not include the latest available version of Big Data SQL.

Note:

If you choose to download and install a release of Big Data SQL from the Oracle Software Delivery Cloud instead of installing the version included with Big Data Appliance, then first check the Oracle Big Data SQL Master Compatibility Matrix to confirm that your current Big Data Appliance release level supports the version that you want to install.

1.5 Prerequisites for Networking

The Oracle Big Data SQL installation has the following network dependencies.

1.5.1 Port Access Requirements

Oracle Big Data SQL requires that the following ports are open though firewalls protecting the Hadoop cluster and Oracle Database.

Table 1-1 Ports That Must be Open on Both the Hadoop Cluster and Oracle Database Servers

Port Use
Ephemeral_range, i.e. 9000-65500 UDP communication from the celliniteth.ora IP address
5042 Diskmon

Table 1-2 Additional Ports That Must Be Open on the Hadoop Cluster

Hadoop Cluster Ports Where Use
50010 All nodes on unsecured clusters dfs.datanode.address
1004 All nodes on secured clusters dfs.datanode.address
50020 All nodes dfs.datanode.ipc.address
8020 NameNodes fs.defaultFS
8022 NameNodes dfs.namenode.servicerpc-address
9083 Hive Metastore & HiveServer2 node. hive.metastore
10000 Hive Metastore & HiveServer2 node. hive.server2.thrift.port
88 Kerberos KDC TCP & UDP
16000 Where HDFS Encryption is enabled KMS HTTP Port

1.6 Prerequisites for Installation on the Hadoop Cluster

The following installed software package active services, tools, and environment settings are prerequisites to the Oracle Big Data SQL installation.

Platform requirements, such as supported Linux distributions and versions, as well as supported Oracle Database releases and required patches are not listed here. See the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support) for this information.

The Oracle Big Data SQL installer checks all prerequisites before beginning the installation and reports any missing requirements on each node.

Tip:

Use bds_node_check.sh to pre-check whether or not the DataNodes of the cluster are ready for the installation.

You can manually check for them, but the easiest way is to run bds_node_check.sh on each node. This script returns a complete readiness report. After you download the installation bundle, unzip it, and execute the run file, bds_node_check.sh will be available, along with the tools to perform the installation. See Check Hadoop-Side Prerequisites for details.

Note:

  • Oracle Big Data SQL 4.1 does not support single user mode for Cloudera clusters.
  • The JDK is no longer a prerequisite. JDK 8u171 is included with this release of Oracle Big Data SQL.

1.6.1 Software Package Requirements for all DataNodes

The following packages must be pre-installed on all Hadoop cluster nodes before installing Oracle Big Data SQL. These are already installed on releases of Oracle Big Data Appliance that support Oracle Big Data SQL 4.1.1. Several additional packages are required if Query Server will be installed.

libaio
dmidecode
net-snmp
net-snmp-utils
glibc
libgcc
libcgroup-tools (Oracle Linux 7 only)
libstdc++
libuuid
ntp
perl
perl-libwww-perl
perl-libxml-perl
perl-XML-LibXML
perl-Time-HiRes
perl-XML-SAX
perl-Env (Oracle Linux 7 only)
rpm
curl
unzip
zip
tar
uname

The following packages are required only if you install Query Server:


expect 
procmail

The yum utility is the recommended method for installing these packages. All of them can be installed with a single yum command. For example (not including expect and procmail):

# yum -y install dmidecode net-snmp net-snmp-utils glibc libgcc libcgroup-tools libstdc++ libuuid ntp perl perl-libs perl-Time-HiRes perl-libwww-perl perl-libxml-perl perl-XML-LibXML perl-XML-SAX perl-Env fuse fuse-libs rpm curl unzip zip tar uname libaio gcc

Special Prequisites for the Configuration Management Server

On the node where CM or Ambari runs (usually Node 3 on Oracle Big Data Appliance), you may also need to install a compatible version of Python as well as the Python Cryptography package. See the next section to determine whether or not this is necessary. If you do need to manually install a version of Python, then add openssl-devel to the yum parameter string:

Other Prequisites

  • HDFS, YARN, and Hive must be running on the cluster at Oracle Big Data SQL installation time and runtime. They can be installed as parcels or packages on Cloudera CDH and as stacks on Hortonworks HDP.
  • On CDH, if you install the Hadoop services required by Oracle Big Data SQL as packages, be sure that they are installed from within CM. Otherwise, CM will not be able to manage them. This is not an issue with parcel-based installation.

1.6.2 Python Requirements for the Cluster Management Node

On the node where the CM or Ambari cluster management service is running, the Oracle Big Data SQL installer requires Python 2.7.5 or greater, but less that 3.0. You must also add the Python Cryptography package to this Python installation if it is not present.

Jaguar, the Oracle Big Data SQL installer, requires Python (>= 2.7.5 <3.0) locally on the node where you run the installer. This is the node where CM or Ambari cluster management service is running. If any installation of Python in this supported version range is already present, you can use it to run Jaguar.

  • On Oracle Big Data Appliance or commodity Hadooop clusters running Oracle Linux 6 or 7:

    Do not manually install Python to support the Jaguar installer. There is a compatible Python package already available on the appliance and the Jaguar installer will automatically find and use this package without prompting you.

  • On commodity Hadoop clusters running Oracle Linux 6:

    Install a compatible version of Python if not present.

  • On Oracle Big Data Appliance or commodity Hadooop clusters running Oracle Linux 5:

    Install a compatible version of Python if not present. On Oracle Big Data Appliance, install it as secondary installation only.

Important:

On Oracle Big Data Appliance do not overwrite the default Python installation with a newer version or switch the default to a newer version. This restriction may also apply other supported Hadoop platforms. Consult the documentation for the CDH or HDP platform you are using.

On Oracle Linux 6 on a commmodity Hadoop platform, the Jaguar installer will prompt you for the path of the compatible Python installation.

Installing the Required Python Cryptography Module

You can use Python's pip utility to install the Python Cryptography module. Use scl if Python (>= 2.7.5 <3.0) is not the default. This example installs pip and then installs and imports the module.

# scl enable python27 "pip install -U pip"  
# scl enable python27 "pip install cryptography"  
# scl enable python27 "python -c 'import cryptography; print \"ok\";'" 

You can then run the Jaguar installer.

1.6.2.1 Adding Python 2.7.5 or Greater as a Secondary Installation

Below is a procedure for adding the Python 2.7.5 or greater (but less than 3.0) as a secondary installation.

Note:

If you manually install Python, first ensure that the openssl-devel package is installed:

# yum install -y openssl-devel
# pyversion=2.7.5
# cd /tmp/
# mkdir py_install
# cd py_install
# wget https://www.python.org/static/files/pubkeys.txt
# gpg --import pubkeys.txt
# wget https://www.python.org/ftp/python/$pyversion/Python-$pyversion.tgz.asc
# wget https://www.python.org/ftp/python/$pyversion/Python-$pyversion.tgz
# gpg --verify Python-$pyversion.tgz.asc Python-$pyversion.tgz
# tar xfzv Python-$pyversion.tgz
# cd Python-$pyversion
# ./configure --prefix=/usr/local/python/2.7.5
# make
# mkdir -p /usr/local/python/2.7.5
# make install
# export PATH=/usr/local/python/2.7.5/bin:$PATH

If you create a secondary installation of Python, it is strongly recommended that you apply Python update regularly to include new security fixes.

Important: On Oracle Big Data Appliance, do not update the mammoth-installed Python unless directed to do so by Oracle.

1.6.2.2 When You May Need to Use scl to Invoke the Correct Python Version

If there is more than one Python release on the cluster managerment server, then be sure that Python 2.7.5 or greater (but less than 3.0) is invoked for any operations associated with this release of Oracle Big Data SQL.

If the scl utility is available, you can use to invoke Python 2.7.5 or greater explicitly. This is necessary if a different Python installation is the default. In that case, use scl or another method to invoked the correct Python version for scripts as well as Python-based utilities such as Jaguar, the Oracle Big Data SQL installer,

[root@myclusteradminserver:BDSjaguar] # scl enable python27 "./jaguar install bds-config.json"

There is one exception to this requirement. On Oracle Big Data Appliance clusters running Oracle Linux 6 or Oracle Linux 7, it is not necessary to use scl explicitly in order to run the Jaguar installer. In this case, you can invoke Jaguar directly, as in:

[root@myclusteradminserver:BDSjaguar] # ./jaguar install bds-config.json

Jaguar itself will silently invoke scl if it is available and if scl is required to invoke a compatible Python release in this environment.

Note that this only applies to Jaguar on Big Data Appliance. To run any other Python scripts required by Oracle Big Data SQL (even on Oracle Big Data Appliance), use scl if Python 2.7.5 is not the default.

For example, to install the required Python Cryptography package, you may need to invoke scl to ensure that you are using the correct version of Python:
# scl enable python27 pip install cryptography

1.6.3 Environment Settings

The following environment settings are required prior to the installation.

  • ntp enabled
  • Minimum ratio of shmmax to shmall:

    shmmax = shmall * PAGE_SIZE

  • shmmax must be greater that physical memory.
  • swappiness set between 5 and 25.
  • All *.rp_filter instances disabled
  • Socket buffer size equal to or greater than 4194304

1.6.4 Proxy-Related Settings

The installation process requires Internet access in order to download some packages from Cloudera or Hortonworks sites.

If a proxy is required for Internet access, then either ensure that the following are set as Linux environment variables, or, enable the equivalent parameters in the Jaguar configuration file, bds-config.json)

  • http_proxy and https_proxy

  • no_proxy

    Set no_proxy to include the following: "localhost,127.0.0.1,<Comma—separated list of the hostnames in the cluster (in FQDN format).>".

On Cloudera CDH, clear any proxy settings in Cloudera Manager administration before running the installation.

See Also:

Table 2-1 describes the use of http_proxy , https_proxy, and other parameters in the installer configuration file.

1.6.5 CPU, Memory, and Networking Requirements

Oracle Big Data SQL requires the following.

Minimum CPU and Memory for Each Node

  • 8 CPU cores
  • 32 GB RAM

Note:

The RAM requirement is 64 GB per node if you intend to support connections to all versions of Oracle Database compatible with this release – 12.1, 12.2, 18c, and 19c. See the database_compatibility parameter in the Jaguar Configuration Parameter and Command Reference. If you set this parameter to "full", then 64 GB per node is the minimum requirement.

Networking

If Hadoop traffic is over VLANs, all DataNodes must be on the same VLAN.

1.7 Prerequisites for Installation on Oracle Database Nodes

Installation prerequisites vary, depending on type of Hadoop system and Oracle Database system where Oracle Big Data SQL will be installed.

Patch Level

See the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1) in My Oracle Support for supported Linux distributions, Oracle Database release levels, and required patches.

Note:

Be sure that the correct Bundle Patch and any one-off patches identified in the Compatibility Matrix have been pre-applied before starting this installation.

Before you begin the installation, review the additional environmental and user access requirements described below.

Packages Required for Kerberos

If you are installing on a Kerberos-enabled Oracle Database System, these package must be pre-installed:

  • krb5-workstation

  • krb5-libs

Packages for the “Oracle Tablespaces in HDFS” Feature

Oracle Big Data SQL provides a method to store Oracle Database tablespaces in the Hadoop HDFS file system. The following RPMs must be installed:

  • fuse

  • fuse-libs

# yum -y install fuse fuse-libs

rdma-core and ibverbs Packages

rdma-core and ibverbs packages are only required for Exadata. If you have a problem bringing up a non-Exadata database due to a diskmon failure with error messages related to either packages, you should remove the packages.

Required Environment Variables

The following are always required. Be sure that these environment variables are set correctly.

  • ORACLE_SID

  • ORACLE_HOME

Note:

GI_HOME (which was required in Oracle Big Data SQL 3.1 and earlier) is no longer required.

Required Credentials

  • Oracle Database owner credentials (The owner is usually the oracle Linux account.)

    Big Data SQL is installed as an add-on to Oracle Database. Tasks related directly to database instance are performed through database owner account (oracle or other).

  • Grid user credentials

    In some cases where Grid infrastructure is present, it must be restarted. If the system uses Grid then you should have the Grid user credentials on hand in case a restart is required.

The Linux users grid and oracle (or other database owner) must both be in the same group (usually oinstall). This user requires permission to read all files owned by the grid user and vice versa.

All Oracle Big Data SQL files and directories are owned by the oracle:oinstall user and group.

Required Grid Infrastructure Patches

Check the Grid Infrastructure to make sure all patches required by the Oracle Big Data Installation have been installed. Go to the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support) to find up-to-date information on patch requirements.

1.8 Downloading Oracle Big Data SQL and Query Server

You can download Oracle Big Data SQL from the Oracle Software Delivery Cloud (also known as eDelivery).

Follow these steps to download Oracle Big Data SQL and prepare for installation:

  1. Download Oracle Big Data SQL

    Always download and install the latest version of Big Data SQL. This will provide the best installation experience and offer compatibility with all supported Oracle Databases.

    There are three files to download for Oracle Big Data SQL 4.1.1:

    • The primary BDSJaguar bundle, which contains the Jaguar installer for Oracle Big Data SQL:
      • Vnnnnnn-01.zip
    • The two parts of the optional Query Server bundle. If you want to use Query Server, then download the following two parts of the Query Server bundle in addition to the primary BDSJaguar bundle:
      • Vnnnnnn-01_1of2.zip
      • Vnnnnnn-01_2of2.zip
    1. Sign in to Oracle Software Delivery Cloud.
    2. Search for Oracle Big Data SQL.

      A list of Oracle Big Data SQL versions to download appears.

    3. Click Select for DLP: Oracle Big Data SQL 4.1.1 (Oracle Big Data SQL) .

      The Download Queue now shows an entry to download. The View Items icon displays the number of items.

      Description of view_items.png follows
      Description of the illustration view_items.png
    4. Click Continue.

      The actual version available may be greater than 4.1.1. n.n.n. The same bundle is compatible with all supported Hadoop clusters.

    5. Click Continue.

      The Download Queue displays showing your selected downloads.

    6. Click Continue.

      Oracle Standard Terms and Restrictions displays.

    7. Accept the license agreement.

      The list of downloadable bundles is displayed.

      Description of download.png follows
      Description of the illustration download.png
    8. Select all three bundles, and click Download.
  2. Prepare for installation
    1. Copy the Installer Bundle to the Hadoop node that hosts the cluster management server (CDH or Ambari). On Oracle Big Data Appliance this is usually Node3. Choose any location. If you intend to use Query Server, copy the Query Server zips to the same location.
    2. Log on as root and unzip the Installer Bundle.
      You will see that the Release 4.1.1 Installer Bundle contains only the run file.
      # unzip Vnnnnnn-01.zip
      Archive:  Vnnnnnn-01.zip
        inflating: BDSJaguar-4.1.1.run
        inflating: readme
      
    3. Before executing the run file, decide if you want to keep the default extraction target for the installation and configuration files. The default is /opt/oracle. If not, then you can change it by setting the JAGUAR_ROOT environment variable.
      # export JAGUAR_ROOT=<my_directory>

      Throughout this guide, the placeholder Big Data SQL Install Directory refers to the JAGUAR_ROOT where you extracted the files.

      Important:

      This is the permanent working directory from which you configure and install Oracle Big Data SQL. You will also need the tools in this directory post installation. It is strongly recommended that you secure this directory against accidental or unauthorized modification or deletion. The primary file to protect is your installation configuration file (by default, bds-config.json). As you customize the configuration to your needs, this file becomes the record of the state of the installation. It is useful for recovery purposes and as a basis for further changes.
    4. Execute the run file.
      # ./BDSJaguar-4.1.1.run
      BDSJaguar-4.1.1.run: platform is: Linux
      BDSJaguar-4.1.1.run: Jaguar directory created successfull
      BDSJaguar-4.1.1.run: Based on features selected in config.json file, extra bundles could be required
      BDSJaguar-4.1.1.run: Please go to /opt/oracle/BDSJaguar
  3. (Optional) Include Query Server

    If you want to include Query Server in the installation, you must unzip both Query Server downloads to extract both parts of the bundle, then run join.sh to assemble into one Query Server bundle.

    Note:

    You cannot use or install Query Server separately from Oracle Big Data SQL. It can be included in the Jaguar-driven installation as described below.
    1. Unzip Vnnnnnn-01_1of2.zip to extract the first part of the bundle.
      $ unzip -j -o Vnnnnnn-01_1of2.zip
       Archive: Vnnnnnn-01_1of2.zip
       inflating: BDSQLQS82d323d472f5c4666e1a7e48cd2d75b9-00
       inflating: join.sh
       inflating: readme.1st 
    2. Unzip Vnnnnnn-01_2of2.zip to extract the second part of the bundle.
      $ unzip -j -o Vnnnnnn-01_2of2.zip 
       Archive: Vnnnnnn-01_2of2.zip
       inflating: BDSQLQS82d323d472f5c4666e1a7e48cd2d75b9-01
       inflating: join.sh
       inflating: readme.1st
    3. Run the join.sh script from either zip file to assemble the bundle.
      $ ./join.sh
       Re-assembling Big Data SQL Query Server bundle
       Detected files:
       BDSQLQS82d323d472f5c4666e1a7e48cd2d75b9-00
       BDSQLQS82d323d472f5c4666e1a7e48cd2d75b9-01
       Joining 2 files
       BigDataSQL-4.1.1-QueryServer.zip successfully created !!!
      
    4. Unzip the newly created bundle to extract the QueryServer run file.
      # unzip BDSExtra-4.1.1-QueryServer.zip
      ...
      # ./BDSExtra-4.1.1-QueryServer.run
    5. Execute the BDSExtra-4.1.1-QueryServer.run script to include Query Server in the Big Data SQL installation.

      Note:

      To include Query Server in the Big Data SQL installation, be sure to execute this extra run file before running the Jaguar installer.

1.9 Upgrading From a Prior Release of Oracle Big Data SQL

On the Oracle Database side, Oracle Big Data SQL can now be installed over a previous release with no need to remove the older software. The install script automatically detects and upgrades an older version of the software.

Upgrading the Oracle Database Side of the Installation

On the database side, you need to perform the installation only once to upgrade the database side for any clusters connected to that particular database. This is because the installations on the database side are not entirely separate. They share the same set of Oracle Big Data SQL binaries. This results in a convenience for you – if you upgrade one installation on a database instance then you have effectively upgraded the database side of all installations on that database instance.

Upgrading the Hadoop Cluster Side of the Installation

If existing Oracle Big Data SQL installations on the Hadoop side are not upgraded, these installations will continue to work with the new Oracle Big Data SQL binaries on the database side, but will not have access to the new features in this release.

1.10 Important Terms and Concepts

These are special terms and concepts in the Oracle Big Data SQL installation.

Oracle Big Data SQL Installation Directory

On both the Hadoop side and database side of the installation, the directory where you unpack the installation bundle is not a temporary directory which you can delete after running the installer. These directories are staging areas for any future changes to the configuration. You should not delete them and may want to secure them against accidental deletion.

Database Authentication Keys

Database Authentication uses a key that must be identical on both sides of the installation (the Hadoop cluster and Oracle Database). The first part of the key is created on the cluster side and stored in the .reqkey file. This file is consumed only once on the database side, to connect the first Hadoop cluster to the database. Subsequent cluster installations use the configured key and the .reqkey file is no longer required. The full key (which is completed on the database side) is stored in an .ackkey file. This key is included in the part of the ZIP file created by the database-side installation and must be copied back to the Hadoop cluster by the user.

Request Key

By default, the Database Authentication feature is enabled in the configuration. (You can disable it by setting the parameter database_auth_enabled to “false” in the configuration file.) When this setting is true, then the Jaguar install and reconfigure operations can generate a request key (stored in a file with the extension .reqkey ). This key is part of a unique GUID-key pair used for Database Authorization. This GUID-key pair is generated during the database side of the installation. The Jaguar operation creates a request key if the command line includes the --requestdb command line parameter along with a single database name (or a comma separated list of names). In this example, the install operation creates three keys, one for each of three different databases:

# ./jaguar --requestdb orcl,testdb,proddb install
The operation creates the request keys files in the directory <Oracle Big Data SQL install directory>/BDSJaguar/dbkeys. In this example, Jaguar install would generate these request key files:
orcl.reqkey
testdb.reqkey
proddb.reqkey

Prior to the database side of the installation, you copy request key to the database node and into the path of the database-side installer, which at runtime generates the GUID-key pair.

Acknowledge Key

After you copy a request key into the database-side installation directory, then when you run the database-side Oracle Big Data SQL installer it generates a corresponding acknowledge key . The acknowledge key is the original request key, paired with a GUID. This key is stored in a file that is included in a ZIP archive along with other information that must be returned to the Hadoop cluster by the user. .

Database Request Operation (databasereq)

The Jaguar databasereq operation is “standalone” way to generate a request key. It lets you create one or more request keys without performing an install or reconfigure operation:

# ./jaguar --requestdb <database name list> databasereq {configuration file | null}

Database Acknowledge ZIP File

If Database Authentication, or Hadoop Secure Impersonation is enabled for the configuration, then the database-side installer creates a ZIP bundle configuration information . If Database Authentication is enabled, this bundle includes the acknowledge key file. Information required for Hadoop Secure Impersonation is also included if that option was enabled. Copy this ZIP file back to/opt/oracle/DM/databases/conf on the Hadoop cluster management server for processing.

Database Acknowledge is a third phase of the installation and is performed only when any of the three security features cited above are enabled.

Database Acknowledge Operation (databaseack)

If you have opted to enable any or all of three new security features (Database Authentication, or Hadoop Secure Impersonation), then after copying the Database Acknowledge ZIP file back to the Hadoop cluster, run the Jaguar Database Acknowledge operation.

The setup process for these features is a “round trip” that starts on the Hadoop cluster management server, where you set the security directives in the configuration file and run Jaguar, to the Oracle Database system where you run the database-side installation, and back to the Hadoop cluster management server where you return a copy of the ZIP file generated by the database-side installation. The last step is when you run databaseack, the Database Acknowledge operation described in the outline below. Database Acknowledge completes the setup of these security features.

Default Cluster

The default cluster is the first Oracle Big Data SQL connection installed on an Oracle Database. In this context, the term default cluster refers to the installation directory on the database node where the connection to the Hadoop cluster is established. It does not literally refer to the Hadoop cluster itself. Each connection between a Hadoop cluster and a database has its own installation directory on the database node.

An important aspect of the default cluster is that the setting for Hadoop Secure Impersonation in the default cluster determines that setting for all other cluster connections to a given database. If you run a Jaguar reconfigure operation some time after installation and use it to turn Hadoop Secure Impersonation in the default cluster on or off, this change is effective for all clusters associated with the database.

If you perform installations to add additional clusters, the first cluster remains the default. If the default cluster is uninstalled, then next one (in chronological order of installation) becomes the default.

1.11 Installation Overview

The Oracle Big Data SQL software must be installed on all Hadoop cluster DataNodes and all Oracle Database compute nodes.

Important: About Service Restarts

On the Hadoop-side installation, the following restarts may occur.

  • Cloudera Configuration Manager (or Ambari) may be restarted. This in itself does not interrupt any services.

  • Hive, YARN , and any other services that have a dependency on Hive or YARN (such as Impala) are restarted.

    The Hive libraries parameter is updated in order to include Oracle Big Data SQL JARs. On Cloudera installations, if the YARN Resource Manager is enabled, then it is restarted in order to set cgroup memory limit for Oracle Big Data SQL and the other Hadoop services. On Oracle Big Data Appliance, the YARN Resource Manager is always enabled and therefore always restarted.

On the Oracle Database server(s), the installation may require a database and/or Oracle Grid infrastructure restart in environments where updates are required to Oracle Big Data SQL cell settings on the Grid nodes. See Potential Requirement to Restart Grid Infrastructure for details.

If a Previous Version of Oracle Big Data SQL is Already Installed

On commodity Hadoop systems (those other than Oracle Big Data Appliance) the installer automatically uninstalls any previous release from the Hadoop cluster.

You can install Oracle Big Data SQL on all supported Oracle Database systems without uninstalling a previous version.

Before installing this Oracle Big Data SQL release on Oracle Big Data Appliance, you must use bdacli to manually uninstall the older version if it had been enabled via bdacli or Mammoth. If you are not sure, try bdacli disable big_data_sql. If the disable comment fails, then the installation was likely done with the setup-bds installer. In that case, you can install the new version Oracle Big Data SQL without disabling the old version.

How Long Does It Take?

The table below estimates the time required for each phase of the installation. Actual times will vary.

Table 1-3 Installation Time Estimates

Installation on the Hadoop Cluster Installation on Oracle Database Nodes

Eight minutes to 28 minutes

The Hadoop side installation may take eight minute if all resources are locally available. An additional 20 minutes or more may be required if resources must be downloaded from the Internet.

The average installation time for the database side can be estimated as follows:

  • 15 minutes for a single node database if a restart is not required. If a restart is required, the time will vary, depending on the size of the database.

  • On a RAC database, multiply the factors above by the number of nodes.

  • If an Oracle Grid restart is required, factor that in as well.

The installation process on Hadoop side includes installation on the Hadoop cluster as well as generation of the bundle for the second phase of the installation on the Oracle Database side. The database bundle includes Hadoop and Hive clients and other software. The Hadoop and Hive client software enable Oracle Database to communicate with HDFS and the Hive Metastore. The client software is specific to the version of the Hadoop distribution (i.e. Cloudera or Hortonworks). As explained later in this guide, you can download these packages prior to the installation, set up an URL or repository within your network, and make that target available to the installation script. If instead you let the installer download them from the Internet, the extra time for the installation depends upon your Internet download speed.

Pre-installation Steps

  • Check to be sure that the Hadoop cluster and the Oracle Database system both meet all of the prerequisites for installation. On the database side, this includes confirming that all of the required patches are in installed. Check against these sources:
    • Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support)
    • Sections 2.1 in this guide, which identifies the prerequisites for installing on the Hadoop cluster. Also see Section 3.1, which describes the prerequisites for installing the Oracle Database system component of Oracle Big Data SQL.

    Oracle Big Data Appliance already meets all prerequisites.
  • Have these login credentials available:

    • root credentials for both the Hadoop cluster and all Oracle Grid nodes.

      On the grid nodes you have the option of using passwordless SSH with the root user instead.

    • oracle Linux user (or other, if the database owner is not oracle)

    • The Oracle Grid user (if this is not the same as the database owner).

    • The Hadoop configuration management service (CM or Amabari) admin password.

  • On the cluster management server (where CM or Ambari is running), download the Oracle Big Data SQL installation bundle and unzip it into a permanent location of your choice. (See Downloading Oracle Big Data SQL and Query Server.)

Outline of the Installation Steps

This is an overview to familiarize you with the process.
  • Phase 1: (Required) Perform the Hadoop cluster-side installation.
  • Phase 2: (Required) Perform the database-side installation.
  • Phase 3: (Optional) Database Acknowledge phase - only needed if Database Authentication or Hadoop Secure Impersonation is enabled.

For complete instructions, see Installing or Upgrading the Hadoop Side of Oracle Big Data SQL and Installing or Upgrading the Oracle Database Side of Oracle Big Data SQL.

  1. Start the Hadoop-Side Installation

    Review the installation parameter options described in Chapter 2. The installation on the Hadoop side is where you make all of the decisions about how to configure Oracle Big Data SQL, including those that affect the Oracle Database side of the installation.

  2. Edit the bds-config.json file provided with the bundle in order to configure the Jaguar installer as appropriate for your environment. You could also create your own configuration file using the same parameters.

  3. Run the installer to perform the Hadoop-side installation as described in Installing or Upgrading the Hadoop Side of Oracle Big Data SQL.

    If the Database Authentication feature is enabled, then Jaguar must also output a “request key” (.reqkey) file for each database that will connect to the Hadoop cluster. You generate this file by including the —-requestdb parameter in the Jaguar install command (the recommended way). You can also generate the file later with other Jaguar operations that support the —-requestdb.

    This file contains one half of a GUID-key pair that is used in Database Authentication. The steps to create and install the key are explained in more detail in the installation steps.

  4. Copy the database-side installation bundle to any temporary directory on each Oracle Database compute node.

  5. If a request key file was generated, copy over that file to the same directory.

  6. Start the Database-Side Installation

    Log on to the database server as the database owner Unzip bundle and execute the run file it contained. The run file does not install the software. It sets up an installation directory under $ORACLE_HOME.

  7. As the database owner, perform the Oracle Database server-side installation. (See Installing or Upgrading the Oracle Database Side of Oracle Big Data SQL.)

    In this phase of the installation, you copy the database-side installation bundle to a temporary location on each compute node. If a .reqkey file was generated for the database, then copy the file into the installation directory before proceeding. Then run the bds-database-install.sh installation program.

    The database-side installer does the following:

    • Copies the Oracle Big Data SQL binaries to the database node.

    • Creates all database metadata and MTA extprocs (external processes) required to access the Hadoop cluster and configures the communication settings.

    Important:

    Be sure to install the bundle on each database compute node. The Hadoop-side installation automatically propagates the software to each node of the Hadoop cluster. However, the database-side installation does not work this way. You must copy the software to each database compute node and install it directly.

    In Oracle Grid environments, if cell settings need to be updated, then a Grid restart may be needed. Be sure that you know the Grid password. If a Grid restart is required, then you will need the Grid credentials to complete the installation.

  8. If Applicable, Perform the “Database Acknowledge” Step

    If Database Authentication or Hadoop Secure Impersonation were enabled, the database-side installation generates a ZIP file that you must copy back to Hadoop cluster management server. The file is generated in the installation directory under $ORACLE_HOME and has the following filename format.

    <Hadoop cluster name>-<Number nodes in the cluster>-<FQDN of the cluster management server node>-<FQDN of this database node>.zip
    Copy this file back to /opt/oracle/DM/databases/conf on the Hadoop cluster management server and then as root run the Database Acknowledge command from the BDSJaguar directory:
    # cd <Big Data SQL install directory>/BDSJaguar
    # ./jaguar databaseack <bds-config.json>

Workflow Diagrams

Complete Installation Workflow

The figure below illustrates the complete set of installation steps as described in this overview.

Note:

Before you start the steps shown in the workflow, be sure that both systems meet the installation prerequisites.

Figure 1-1 Installation Workflow

Note:

The --reqkey parameter in this diagram actually requires the full path to the file, as in /bds-databse-install.sh --reqkey=/opt/tmp/orcl.reqkey.
Description of Figure 1-1 follows
Description of "Figure 1-1 Installation Workflow"

Key Generation and Installation

The figure below focuses on the three steps required to create and installing the GUID-key pair used in Database Authentication. The braces around parameters of the Jaguar command indicate that one of the operations in the list is required. Each of these operations supports use of the —-requestdb parameter. Note that although updatenodes is included in this list, updatenodes is deprecated in this release. You should use reconfigure instead.

Figure 1-2 Generating and Installing the GUID-Key Pair for Database Authentication

Description of Figure 1-2 follows
Description of "Figure 1-2 Generating and Installing the GUID-Key Pair for Database Authentication"

1.12 Using the Installation Quick Reference

Once you are familiar with the functionality of the Jaguar utility on the Hadoop side and bds-database-install.sh on the Oracle Database side, you may find it useful to work from the Installation Quick Reference for subsequent installations. This reference provides an abbreviated description of the installation steps. It does not fully explain each step, so users should already have a working knowledge of the process. Links to relevant details in this and other documents are included.