2 Installing the Hadoop Side of Oracle Big Data SQL

Oracle Big Data SQL is deployed using the services provides by the cluster management server. The installer program uses the management server API to register the service and start the deployment task. From there, the management server controls the process and deploys the software to the nodes of the cluster and installs it.

After installing Big Data SQL on the cluster management server, use the tools provided in the bundle to generate an installation package for the database server side.

Important Notes for Oracle Big Data Appliance Installations Prior to Release 4.8:

  • Oracle Big Data SQL 3.1 is decoupled from the bdacli utility on Oracle Big Data Appliance systems prior to Release 4.8. On these systems, you cannot use the following bdacli commands with Release 3.1 on Oracle Big Data Appliance 4.7 and earlier:

    bdacli {enable|disable} big_data_sql
    bdacli getinfo cluster_big_data_sql_enabled
    bdacli {start | stop | restart | status}  {big_data_sql_cluster | big_data_sql_server node_name}
    
  • The bdacli commands listed above do work for previous versions of Oracle Big Data SQL installed on the Oracle Big Data Appliance releases that they support.

On all supported Hadoop systems (listed in the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1) in My Oracle Support) ,you can use the setup-bds script as described in this guide to install, extend, reconfigure, and uninstall Oracle Big Data SQL 3.1 on a Hadoop cluster. You can also use the cluster management server interface to stop or stop Oracle Big Data SQL processes.

On Oracle Big Data Appliance 4.8 and greater, because Oracle Big Data SQL integration with bdacli has been restored, you have the additional option of using bdacli to install and uninstall (enable and disable) Oracle Big Data SQL. You also have access to all other bdacli commands for administering Oracle Big Data SQL.

See Also:

olink:BIGOG-GUID-685D1923-EC2A-42B6-8D97-1DFB8239D57C in the Oracle Big Data Appliance Owner’s Guide.

2.1 Installation Prerequisites

The following active services, installed packages, and available system tools are prerequisites to the Oracle Big Data SQL installation.

Platform requirements, such as supported Linux distributions and versions, as well as supported Oracle Database releases and required patches and are not listed here. See the Oracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1) in My Oracle Support for this information.

Services Running

These Apache Hadoop services must be running on the cluster.

  • HDFS

  • YARN

  • Hive

You do not need to take any extra steps to ensure that the correct HDFS and Hive clients URLs are specified in the database-side installation bundle.

Important:

The Apache Hadoop services listed above must be installed as parcels on Cloudera CDH and as stacks on Hortonworks HDP. Installation of these services via RPM is not supported in either case.

About HBase:

HBase is not a prerequisite for the installation of Oracle Big SQL on the Hadoop cluster. However, the installation on the Oracle Database nodes does require an HBase client. If HBase is present on the cluster during the Hadoop-side installation, then no action on your part is required. In this case, when you create the package that is installed on the Oracle Database nodes, the installation package builder automatically includes the URL to the download site of the compatible HBase client JAR.

If HBase is not installed on the Hadoop cluster, then when you create the database-side installation package, use the --hbase-client-ws parameter to add the URL to the the installation package. If an HBase service is installed on the Hadoop cluster, then the URL should point to the download site for a compatible client. When you install the database side of Oracle Big Data SQL, the installer will download the HBase client from the URL provided.

Packages

The following packages must be pre-installed on all Hadoop cluster nodes before installing Oracle Big Data SQL.

  • Oracle JDK version 1.7 or later

  • dmidecode

  • net-snmp, net-snmp-utils

  • perl

    PERL LibXML – 1.7.0 or higher, e.g. perl-XML-LibXML-1.70-5.el6.x86_64.rpm

    perl-libwww-perl, perl-libxml-perl, perl-Time-HiRes, perl-libs, perl-XML-SAX, perl-Env

The Java JDK is available for download on the Oracle Technology Network.

The yum utility is the recommended method for installing these packages:

yum -y install dmidecode
yum -y install net-snmp net-snmp-utils
yum -y install perl perl-libs
yum -y install perl-Time-HiRes perl-libwww-perl
yum -y install perl-libxml-perl perl-XML-LibXML perl-XML-SAX
yum -y install perl-Env
yum -y localinstall <JDK RPM that you downloaded>

System Tools

  • curl

  • gcc

  • libaio

  • rpm

  • scp

  • tar

  • unzip

  • wget

  • yum

  • zip

The libaio libraries must be installed on each Hadoop cluster node:
yum install -y libaio gcc 

Environment Settings

The following environment settings are required prior to the installation.

  • NTP enabled

  • Ensure that /usr/java/default exists and is linked to the appropriate Java version (if $JAVA_HOME does not exist).

  • The path to the Java binaries must exist in /usr/java/latest.

  • The installation process requires Internet access in order to download some packages from Cloudera or Hortonworks sites. If a proxy is needed for this access, ensure that the following Linux environment variables are properly set:

    • http_proxy and https_proxy

    • no_proxy

      Set no_proxy to include the following: "localhost,127.0.0.1,<Comma—separated list of the hostnames in the cluster (in FQDN format).>".

  • On Cloudera CDH, clear any proxy settings in Cloudera Manager administration before running the installation. You can restore them after running the script that creates the database-side installation bundle (bds-database-create-bundle.sh).

Python 2.7 (for the Oracle Big Data SQL Installer)

The Oracle Big Data SQL installer requires Python 2.7 locally on the node where you run the installer. This should be the same node where the cluster management service (CM or Ambari) is running.

If any version of Python 2.7.x is already installed, you can use it to run the Oracle Big Data installer.

If an earlier version than Python 2.7.0 is already installed on the cluster management server and you need to avoid overwriting this existing installation, you can add Python 2.7.x as a secondary installation.

Restriction:

On Oracle Big Data Appliance do not overwrite or update the pre-installed Python release. This restriction may also apply other supported Hadoop platforms. Consult the documentation for the CDH or HDP platform you are using.

On Oracle Linux 5, add Python 2.7 as a secondary installation. On Oracle Linux 6, both Python 2.6 and 2.7 are pre-installed and you should use the provided version of Python 2.7 for the installer. Check whether the default interpreter is Python 2.6 or 2.7. To run the Oracle Big Data SQL installer, you may need to invoke Python 2.7 explicitly. On Oracle Big Data Appliance, SCL is installed so you can use to enable version 2.7 for the shell as in this example:

# scl enable python27 "./setup-bds install bds-config.json"

Below is a procedure for adding the Python 2.7.5 as a secondary installation.

Tip:

If you manually install Python, first ensure that the openssl-devel package is installed:

# yum install -y openssl-devel

If you create a secondary installation of Python, it is strongly recommended that you apply Python update regularly to include new security fixes. Do not update the mammoth-installed Python unless directed to do so by Oracle.

# pyversion=2.7.5
# cd /tmp/
# mkdir py_install
# cd py_install
# wget https://www.python.org/static/files/pubkeys.txt
# gpg --import pubkeys.txt
# wget https://www.python.org/ftp/python/$pyversion/Python-$pyversion.tgz.asc
# wget https://www.python.org/ftp/python/$pyversion/Python-$pyversion.tgz
# gpg --verify Python-$pyversion.tgz.asc Python-$pyversion.tgz
# tar xfzv Python-$pyversion.tgz
# cd Python-$pyversion
# ./configure --prefix=/usr/local/python/2.7
# make
# mkdir -p /usr/local/python/2.7
# make install
# export PATH=/usr/local/python/2.7/bin:$PATH

If Oracle Big Data SQL is Already Installed

If Oracle Big Data SQL is already installed, please read Upgrading From a Prior Release of Oracle Big Data SQL before proceeding.

Important:

If Oracle Big Data SQL 3.0.1 or earlier is already enabled on the Hadoop cluster, it must be disabled or removed before installing Release 3.1.

For Release 3.0.1 or earlier on Oracle Big Data Appliance, run bdacli disable big_data_sql. This will disable Oracle Big Data SQL on all nodes of the cluster.

The uninstall on other Hadoop platforms (HDP or non-Oracle CDH-based systems) differs for Oracle Big Data SQL 3.0.1 and 3.0:

  • For Release 3.0.1, run setup-bds from the original installation with the --uninstall parameter, as in ./setup-bds --uninstall bds-config.json.

  • For Release 3.0 there is no programmatic uninstall. You can find the manual procedure for uninstalling Release 3.0 in the Oracle Big Data SQL 3.0 User's Guide.

    Pre-3.0 releases of Oracle Big Data SQL were not supported on systems other than Oracle Big Data Appliance.

2.1.1 Checking for Prerequisites on the Hadoop DataNodes

You can check the DataNodes of the cluster for Oracle Big Data SQL installation prerequisites as follows.

As root, run the following checks on each DataNode:
# yum -y install dmidecode
# yum -y install net-snmp net-snmp-utils
# yum -y install perl perl-libs
# yum -y install perl-Time-HiRes perl-libwww-perl
# yum -y install perl-libxml-perl perl-XML-LibXML perl-XML-SAX
# ls -l /usr/java

2.2 Installing on the Hadoop Cluster Management Server

The first step of the Oracle Big Data SQL installation is to run the installer on the Hadoop cluster management server (where Cloudera Manager runs on a CDH system or Ambari runs on an HDP system). As post-installation task on the management server, you then run the script that prepares the installation bundle for the database server.

On the Hadoop side, you manually install the software on the cluster management server only. The installer uses CDH or Ambari to analyze the cluster configuration and automatically deploys and installs Oracle Big Data SQL on all nodes where it is required.

Note:

About Installer Security
  • Passwords for Ambari and Cloudera Manager are not be passed in on the command line and are not be saved in any persistent files (including log or trace files) during the installation or after the installation is complete.

  • No temporary or persistent world-writable files are created.

  • No setuid or setgid files are used.

  • The installer works with hardened Oracle Database environments as well as hardened CDH and HDP clusters as described in the Cloudera CDH and Hortonworks HDP security documentation.

This is a summary of the three tasks to perform on the cluster manager server. Details follow this summary.

Temporary Workaround may be Required:

The setup-bds installation script sets some Hive auxiliary parameters, which may overwrite existing custom settings. For example, to enable Hive operation logging for the installation, the script sets hive.server2.logging.operation.enabled=true. After setup-bds has finished (and before you run the next script in the process, bds-database-create-bundle.sh), check your Hive auxiliary parameter settings and, if needed, restore any that may have been overwritten. This workaround will be unnecessary in a subsequent release of Oracle Big Data SQL.

In an Oracle Big Data SQL uninstall (which also uses setup-bds) the uninstall completely removes the hive.server2.logging.operation.enabled parameter, which effectively sets it to true, the default value.

  • Extract the files from BigDataSQL-<Hadoop_distribution>-<version>.zip archive. Then, configure bds-config.json and run setup-bds with the install argument. This installs Oracle Big Data SQL on the cluster management server as well as on all Hadoop DataNodes in the cluster.

  • Run the database bundle creation script, bds-database-create-bundle . This generates the database bundle file that you will run on the Oracle Database server in order to install Oracle Big Data SQL on the Oracle Database server.

  • Check the parameters in the database bundle file and adjust as needed.

After you have checked and (if necessary) edited the database bundle file, copy it over to the Oracle Database server and run it as described in Installing on the Oracle Database Server

Install Oracle Big Data SQL on the Cluster Management Server

Important: Patch 25796576 Required For Some Systems:

Install this patch if your system meets the following criteria:

  • For systems other than Oracle Big Data Appliance: using CDH, and running on Oracle Linux 6, Oracle Linux 7, Red Hat 6 or Red Hat 7.

  • For Oracle Big Data Appliance: when Oracle Big Data SQL is installed with the installer described in this document (in other words, when it is not installed by Mammoth or the bdacli utility).

If your system meets these criteria, download Patch 25796576 to the same node and same directory where you download the zip file in the steps below. Install the patch after unpacking the zip file.

For instructions on installing the patch and more detail on when the patch is required, see the following document in My Oracle Support: One-Off Patch 25796576 On An Oracle Big Data Appliance CDH Cluster OL6 with Big Data SQL 3.1 (Doc ID 2269180.1). This does not apply to Oracle Linux 5.

To install Big Data SQL on the cluster management server:

  1. Copy the appropriate zip file (BigDataSQL-<Hadoop_distribution>-<version>.zip) to a temporary location on the cluster management server (the node where Cloudera Manager or Ambari is running).

  2. Unzip file.

    1. For CDH systems, run Patch 25796576 at this point and then proceed with the Oracle Big Data SQL installation.

  3. Change directories to BDSSetup .

  4. Edit the configuration file.

    Table 2–4 below describes the use of each configuration parameter.

    • For CDH, edit bds-config.json , as shown in this example. Any unused port will work as the web server port.

      {
      "cluster": {
         "name": "cluster1",
         "display_name": "Cluster 1"
      },
      "database":{
         "ip": "10.11.12.13/14"
      },
      "memory": {
         "min_hard_limit": 8192
      },
      "webserver": {
         "port": 80
      }
      }
      
    • For HDP, edit bds-config.json as in this example. Notice that the HDP configuration file does not include the display_name or min_hard_limit parameters.

      {
      "cluster": {
         "name": "cluster1",
      },
      "database":{
         "ip": "10.11.12.13/14"
      },
      "webserver": {
         "port": 80
      }
      }
      

    DATABASE_IP must be the correct network interface address for the database node where you will perform the installation. You can confirm this by running /sbin/ip -o -f inet addr show on the database node.

    Note:

    The next step requires the cluster administrator user ID and password.
  5. In the BDSSetup directory, become root and run setup-bds. Pass it the install parameter and the configuration file name (bds-config.json ) as arguments. Note that the cluster management service is restarted in this process.

    [root@myclusteradminserver:BDSSetup] # ./setup-bds install bds-config.json
    
    The script prompts for the cluster management service administrator credentials and then installs Oracle Big Data SQL on the management server and the cluster nodes. The script output terminates with the following message if the installation completed without error.
    BigDataSQL: INSTALL workflow completed.
    

    See Also:

    This is a condensed example. An example of the complete standard output from a successful installation is provided in Oracle Big Data SQL Install/Uninstall/Reconfigure Examples.

Parameters in bds-config.json

The table below explains edits you need to make to bds-config.json for CDH and HDP platforms.

Table 2-1 Configuration Parameters in bds-config.json

Configuration Parameter Description Applies To Required/Optional
cluster:{name} The name of the cluster. See the description of display_name below for behavior that applies to both of these parameters. CDH, HDP Required for CDH.

Optional for HDP.

cluster:{display_name}

The display_name is an optional identifier for locating the target cluster in Cloudera Manager. You can use it as an alternative to the required name parameter. Oracle Big Data SQL attempts to use it as a fallback if cluster:{name} cannot be validated.

CDH only Optional
database:{ip}

The IP address of the Oracle Database server that will make connection requests. This must be configured on one interface on the database node. The address must include the prefix length (as in 100.112.10.36/24). Although only one IP address is specified in the configuration file, it is possible to install the database-side software on multiple database servers (as in a RAC environment) by using a command line parameter to override ip at installation time. (See the description of --ip-cell in Table 3-1.)

CDH, HDP Required
memory:{min_hard_limit}

The minimum amount of memory required for Oracle Big Data SQL.

CDH only Optional
webserver:{port} Port for the temporary repository used for deploying tasks and gather responses from the DataNodes during installation. This can be any port that does not conflict with current cluster operation. CDH, HDP Required
api:{port} Port for the CDH or Ambari REST API in cases where this is different from the default port. CDH, HDP Optional

Operations Performed by setup-bds

The table below lists the full set of operations performed by setup-bds .

The syntax for all options is:

# ./setup-bds <option> bds-config.json

For example:

# ./setup-bds install bds-config.json

Table 2-2 Command Line Options for setup-bds.sh

setup-bds Option Use
install

Install the Oracle Big Data SQL software on the cluster management server.

extend

Extend Oracle Big Data SQL to any new DataNodes and update the cells inventory if the cluster has grown since the last Oracle Big data SQL installation.

remove

Remove Oracle Big Data SQL components from any nodes where the DataNode service no longer exists.

This must be done if a DataNode service is moved or removed. A possible scenario where this would be necessary is if a DataNode service has been moved another node for better cluster load balancing.

reconfigure

Modify the current installation by applying changes you have made to the configuration in bds-config.json.

Note that if you run setup-bds reconfigure bds-config.json to reconfigure the Hadoop side Oracle Big Data SQL, a corresponding reconfiguration is required on the Oracle Database side. The two sides cannot communicate if the configurations do not match. In this case you must also regenerate the database-side bundle files to incorporate the changes, and then redeploy the bundle on all database servers where it was previously installed.

For reconfigurations, a lightweight database bundle is provided so that the changes can be deployed relatively quickly. This bundle does not need and does not include the tarballs that are required by the initial installation.

uninstall Uninstall Oracle Big Data SQL from the Hadoop cluster management server.

2.3 Creating the Database-Side Installation Bundle

After installing Oracle Big Data SQL on the cluster management server, run the script BDSSetup/db/bds-database-create-bundle.sh. This script creates the corresponding Oracle Big Data SQL installation bundle for any Oracle Database servers that will query data on the Hadoop system.

In addition to running bds-database-create-bundle at installation time, you also need to run it when you have made configuration changes to the cluster management server that must be communicated to the database server, such as:

  • A change cluster configuration, such as from unsecure to secure HTTP.

  • Migration of the Hive Metastore from one node to another.

  • Change of the CM or Ambari port on the cluster management server.

  • Change of IP address used for installation on the database side.

External Resources Required for the Database-Side Bundle

bds-database-create-bundle attempts to download the following external resources.

  • Hadoop and Hive client tarballs from Cloudera or Hortonworks repository web site.

  • Configuration files for Yarn and Hive from the cluster management server, via Cloudera Manager (for the CDH versions) or Ambari (for the HDP versions).

  • For HDP only, HDFS and MapReduce configuration files from Ambari.

If some of these resources are not accessible from the management server, you can add them manually. You can also use the command line switches described in Table 2-3 to manually turn off selected resource downloads so that these resources are not added to the bundle.

Running bds-database-create-bundle

Run this script as root. Note that you are prompted for the cluster management service administrator credentials.

  1. Change directories to BDSSetup/db.

  2. Run the BDS database bundle creation script. See Table 2-3 below for optional parameters that you can pass to the script in order to override any of the default settings.

    [root@myclusteradminserver: db] # ./bds-database-create-bundle.sh <optional parameters> 
    

    This message is returned if the operation is successful:

          bds-database-create-bundle: database bundle creation script completed all steps
    

The bds-database-create-bundle script generates two different database bundles in the BDSSetup directory (not in the BDSSetup/db directory):

  • bds-database-install.zip

    Deploy this bundle to the database servers for the initial installation of Oracle Big Data SQL (or if a full re-installation is required for other reasons). It contains all of the files needed to install the software, including the resources that were downloaded from Cloudera or Hortonworks (tarballs and configuration files).

  • bds-database-install-config.zip

    Deploy this bundle to the database servers instead of bds-database-install.zip when you change the existing Oracle Big Data SQL configuration on the cluster management server. This package makes the corresponding changes to the database-side configuration so that the two sides (Hadoop and Oracle Database) are aligned. A reconfiguration of Oracle Big Data SQL does not require the external resources needed in the full installation (such as the client tarballs). Therefore this bundle is smaller and can be deployed faster than the full installation bundle.

The database bundle file includes a number of parameters. When you run bds-database-create-bundle.sh, you can use the switches in the table below to override any of these parameters as necessary. Any URLs specified must be accessible from the cluster management server at the time you run bds-database-create-bundle.sh.

Table 2-3 Command Line Switches for bds-database-create-bundle.sh

Parameter Value
--hadoop-client-ws Specifies an URL for the Hadoop client tarball download.
--no-hadoop-client-ws Exclude this download.
--hive-client-ws Specifies an URL for the Hive client tarball download.
--no-hive-client-ws Exclude this download.
--yarn-conf-ws Specifies an URL for the YARN configuration zip file download.
--no-yarn-conf-ws Exclude this download.
--hive-conf-ws Specifies an URL for the Hive configuration zip file download.
--no-hive-conf-ws Exclude this download.
--ignore-missing-files Create the bundle file even if some files are missing.
--jdk-tar-path Override the default JDK path. Do not specify a relative path, use --jdk-tar-path=<jdk tarfile absolute path>.
--clean-previous Deletes previous bundle files and directories from bds-database-install/ . If cluster management server the cluster settings have changed (for example, because of an extension, service node migration, or adding/removing security) then it necessary to redo the installation on the database server. As part of this re-installation, you must run --clean-previous to purge the cluster information left the database server side from the previous installation.
--script-only This is useful for re-installations on the database side when there are no cluster configuration changes to communicate to the database server and where there is no need to refresh files (such as client tarballs) on the database side. With this switch, bds-database-create-bundle.sh generates a zip file that contains only the database installation script and does not bundle in other components, such as the tarballs. If these already exist on the database server, you can use --script-only to bypass the downloading and packaging of these large files. Do not include --clean-previous in this case.
--hbase-client-ws This is parameter is required only if HBase is not installed in the Hadoop cluster. It specifies the URL where the HBase tarball can be downloaded from the Cloudera or Ambari website. The URL should point to the specific version of HBase that is supported by the CDH or HDP installation you are using.

If HBase is not installed, include this parameter when you run bds-database-create-bundle.sh

$ bds-database-create-bundle.sh --hbase-client-ws <URL>

If HBase is not installed and you do not supply this parameter when you run bds-database-create-bundle.sh, then you will be prompted for the URL.

--hdfs-conf-ws Specify an URL for the HDFS configuration zip file download.
--no-hdfs-conf-ws Exclude this download (HDP only).
--mapreduce-conf-ws Specify an URL for the MapReduce configuration zip file download (HDP only).
--no-mapreduce-conf-ws Exclude this download (HDP only).
--reconfigure Creates the bundles with the existing files, but force downloading new configuration.

Manually Adding Resources if Download Sites are not Accessible to the BDS Database Bundle Creation Script

If one or more of the default download sites is inaccessible from the cluster management server, there are two ways around this problem:

  • Download the files from another server first and then provide bds-database-create-bundle.sh with the alternate path as an argument. For example:

    $ ./bds-database-create-bundle.sh --yarn-conf-ws='http://nodexample:1234/config/yarn'
    
  • Because the script will first search locally in /bds-database-install for resources, you can download the files to another server, move the files into /bds-database-install on the cluster management server and then run the bundle creation script with no additional argument. For example:

    $ cp hadoop-xxxx.tar.gz bds-database-install/
    $ cp hive-xxxx.tar.gz bds-database-install/
    $ cp yarn-conf.zip bds-database-install/
    $ cp hive-conf.zip bds-database-install/
    $ cd db
    $ ./bds-database-create-bundle.sh
    

Copying the Database Bundle to the Oracle Database Server

Use scp to copy the database bundle you created to the Oracle Database server. In the example below, dbnode is the database server. The Linux account and target directory here are arbitrary. Use any account authorized to scp to the specified path.

For a first-time installation of the current release of Oracle Big Data SQL, copy bds-database-install.zip (the full installation bundle) to each database node.

$ scp bds-database-install.zip oracle@dbnode:/home/oracle

If you are updating the configuration in the existing Oracle Big Data SQL installation on the database servers, copy the smaller configuration update bundle to the database nodes (bds-database-install-config.zip) instead of (bds-database-install.zip).

$ scp bds-database-install-config.zip oracle@dbnode:/home/oracle

The next step is to log on to the Oracle Database server and install the bundle.