11 Installing the Oracle Big Data Appliance Software

This chapter explains how to use the Mammoth Utility to install the software on Oracle Big Data Appliance. It contains these sections:

Using the Mammoth Utility
Installing the Software on a Single or Primary Rack
Upgrading the Software on Oracle Big Data Appliance
Mammoth Utility Syntax
What If an Error Occurs During the Installation?
Adding a Rack to an Existing Cluster
Mammoth Utility Steps
Using the Mammoth Reconfiguration Utility
Mammoth Reconfiguration Utility Syntax

11.1 Using the Mammoth Utility

The Mammoth Utility installs and configures the software on Oracle Big Data Appliance using the files generated by the Oracle Big Data Appliance Configuration Utility. At a minimum, Mammoth installs and configures Cloudera's Distribution including Apache Hadoop. This includes all the Hadoop software and Cloudera Manager, which is the tool for administering your Hadoop cluster. Mammoth will optionally install and configure Oracle NoSQL Database and, if you have a license, all components of Oracle Big Data Connectors.

In addition to installing the software across all servers in the rack, the Mammoth Utility creates the required user accounts, starts the correct services, and sets the appropriate configuration parameters. When it is done, you have a fully functional, highly tuned, up and running Hadoop cluster.

You must run the Mammoth Utility once for each rack.

For one Oracle Big Data Appliance rack that forms one Hadoop cluster, follow the procedure in "Installing the Software on a Single or Primary Rack."
For multiple racks where each rack forms an independent Hadoop cluster, follow the procedure in "Installing the Software on a Single or Primary Rack" for each rack.
For multiple racks that form a single, multirack Hadoop cluster:
- Identify the primary rack of the cluster, then follow the procedure in "Installing the Software on a Single or Primary Rack."
- For the other racks of the cluster, follow the procedure in "Adding a Rack to an Existing Cluster."

11.2 Installing the Software on a Single or Primary Rack

Follow this procedure to install and configure the software on a single Oracle Big Data Appliance rack or on the primary rack of a multiple-rack cluster.

To install the software:

Verify that the Oracle Big Data Appliance rack is configured according to the custom network settings described in /opt/oracle/bda/BdaDeploy.json. If the rack is still configured to the factory default IP addresses, first perform the network configuration steps described in "Configuring the Network."
Verify that the software is not installed on the rack already. If the software is installed and you want to reinstall it, then use the mammoth -p option in Step 9.
Download the BDAMammoth zip file to any directory on node01 (such as /tmp). See My Oracle Support Information Center: Oracle Big Data Appliance (ID 1445762.2) for the download location.

$ unzip p14479858_110_Linux-x86-64.zip
Archive:  p14479858_110_Linux-x86-64.zip
  inflating: README.txt
   creating: BDAMammoth-1.1.0/
  inflating: BDAMammoth-1.1.0/bda-configurator-1.1.0.ods
  inflating: BDAMammoth-1.1.0/BDAMammoth-1.1.0.run

Change to the BDAMammoth-version directory:
```
$ cd BDAMammoth-1.1.0
```
Extract all files from BDAMammoth-version.run:
```
$ ./BDAMammoth-1.1.0.run
```
Change to the BDAMammoth directory.
```
$ cd /opt/oracle/BDAMammoth
```
Copy mammoth-rack_name.params to the current directory. See "About the Configuration Files."
Run the mammoth command with the appropriate options. See Table 11-1. This sample command runs steps 1 and 2 on rack bda2:
```
./mammoth -r 1-2 bda2
```

The Mammoth Utility stores the current configuration in the /opt/oracle/bda/install/state directory. Do not delete the files in this directory. The Mammoth Utility fails without this information if you need to use it again, such as adding a rack to the cluster.

11.3 Upgrading the Software on Oracle Big Data Appliance

The procedure for upgrading the software is the same whether you are upgrading from one major release to another or just applying a patch set. The procedure is also the same whether your Hadoop cluster consists of one Oracle Big Data Appliance rack or multiple racks.

The process upgrades all components of the software stack including the firmware, operating system, CDH, JDK, and Oracle Big Data Connectors (if previously installed).

Software downgrades are not supported.

Note:

Because the upgrade process automatically stops and starts services as needed, the cluster is unavailable while the mammoth command is executing.

To upgrade the software:

Download the BDAMammoth zip file to any directory on node01 (such as /tmp). See My Oracle Support Information Center: Oracle Big Data Appliance (ID 1445762.2) for the download location.

$ unzip p14479858_110_Linux-x86-64.zip
Archive:  p14479858_110_Linux-x86-64.zip
  inflating: README.txt
   creating: BDAMammoth-1.1.0/
  inflating: BDAMammoth-1.1.0/bda-configurator-1.1.0.ods
  inflating: BDAMammoth-1.1.0/BDAMammoth-1.1.0.run

Change to the BDAMammoth-version directory:
```
$ cd BDAMammoth-1.1.0
```
Extract all files from BDAMammoth-version.run:
```
$ ./BDAMammoth-1.1.0.run
```
The new version of the Mammoth software is installed in /opt/oracle/BDAMammoth, and the previous version is saved in /opt/oracle/BDAMammoth/previous-BDAMammoth.
Change to the BDAMammoth directory.
```
$ cd /opt/oracle/BDAMammoth
```
Run the mammoth command with the -p option:
```
./mammoth -p rack_name
```

11.4 Mammoth Utility Syntax

You must change to the /opt/oracle/BDAMammoth directory to use the Mammoth Utility. It has this syntax:

./mammoth option [rack_name]

In this command, rack_name is the name of an Oracle Big Data Appliance rack. You must enter the rack name in the first command exactly as it appears in the configuration file name (mammoth-rack_name.params). Afterward, rack_name defaults to the rack specified in a previous mammoth command.

You must finish installing one rack before starting the installation of another rack.

Table 11-1 lists the Mammoth Utility options.

Table 11-1 Mammoth Utility Options

Option	Description
`-h`	Displays command Help including command usage and a list of steps.
`-i`	Runs all mandatory steps, equivalent to `-r 1-18`.
`-l`	List the steps of the Mammoth Utility.
`-p`	Upgrades the software on the cluster to the current version.
`-r` `n-N`	Run steps n through N of the Mammoth Utility while no errors occur
`-s` `n`	Runs step n.
`-v`	Displays the version number of the Mammoth Utility.

Example 11-1 Mammoth Utility Syntax Examples

This command displays Help for the Mammoth Utility:

./mammoth -h

This command does a complete install on rack bda3:

./mammoth -i bda3

The next command runs steps 2 through 6 on the rack being set up:

./mammoth -r 2-6

11.5 What If an Error Occurs During the Installation?

Each step generates a detailed log file listing the actions performed on each server and whether the step completed successfully. If an error occurs, the script stops. You can then check the log files in /opt/oracle/BDAMammoth/bdaconfig/tmp. The log files are named in this format:

STEP-i-yyyymmddhhmmss.log

In this format, i is the step number and yyyymmddhhmmss identifies the year, month, day, hour, minute, and second that the file was created.

After fixing the problem, you can rerun all steps or a range of steps. You cannot skip steps or run them out of order.

11.6 Adding a Rack to an Existing Cluster

Each multirack cluster has one rack designated as the primary rack. Whether a rack is the primary one is indicated in the Oracle Big Data Appliance Configuration Worksheets and specified in the mammoth-rack_name.params file. Each rack of a multirack Hadoop cluster has a separate mammoth-rack_name.params file.

To install the software on additional racks in the same cluster:

Install the software on the primary rack of the Hadoop cluster. See "Installing the Software on a Single or Primary Rack".
Ensure that all racks are running the same software version. See "About Software Version Differences".
Ensure that all racks that form a single Hadoop cluster are cabled together. See Chapter 9.
Copy the mammoth-rack_name.params files of the non-primary racks to node01 (the bottom server) of the primary rack. Do not copy them to the non-primary racks.
Connect as root to node01 of the primary rack and change to the BDAMammoth directory:
```
cd /opt/oracle/BDAMammoth
```
Note: Always start Mammoth from the primary rack.
For each non-primary rack, enter the mammoth command with the appropriate option. See "Mammoth Utility Syntax". For example, this command starts the installation on rack bda4:
```
./mammoth -i bda4
```

The primary rack of a multirack Hadoop cluster is configured the same as a single Hadoop cluster. It runs the NameNode, Secondary Name Node, Hue, Hive, and other key services. The other racks of a multirack Hadoop cluster are configured differently. They only run the DataNodes and TaskTrackers.

Oracle Big Data Connectors are installed, if you have a license for them, on all nodes of the non-primary racks although no services run on them. Oracle Data Integrator agent still runs on node03 of the primary rack. You cannot add nodes to an Oracle NoSQL Database cluster after it is set up. However, a logical volume is created on the additional rack for future use when nodes can be added to an Oracle NoSQL Database cluster.

The Mammoth Utility obtains the current configuration from the files stored in /opt/oracle/bda/install/state. If those files are missing or if any of the services have been moved manually to run on other nodes, then the Mammoth Utility fails.

A new Oracle Big Data Appliance rack may be factory-installed with a newer image than the previously installed racks. All racks configured as one Hadoop cluster must have the same image. When all racks have the same image, you can install the software on the new rack.

About Software Version Differences

A new Oracle Big Data Appliance rack may be factory-installed with a newer base image than the previously installed racks. Use the imageinfo utility on any server to get the image version. Only when all racks of a single Hadoop cluster have the same image version can you proceed to install the software on the new rack.

To synchronize the new rack with the rest of the Hadoop cluster, either upgrade the existing cluster to the latest image version or downgrade the image version of the new rack.

To upgrade the image version:

Run the Mammoth Utility with the -p option to upgrade the software on the cluster to the latest version. See "Upgrading the Software on Oracle Big Data Appliance."

To downgrade the image version:

Reimage the new rack to the older version installed on the cluster. See My Oracle Support Information Center: Oracle Big Data Appliance (ID 1445762.2) and its related notes.
Use the older version of the Oracle Big Data Appliance Configuration Utility to generate the configuration files.
Use the older version of the Mammoth Utility to install the software.

11.7 Mammoth Utility Steps

Following are descriptions of the steps that the Mammoth Utility performs when installing the software.

Step 1 SetupInstall

Validates the configuration files.

Step 2 SetupSSHroot

Sets up a Secure Shell (SSH) for the root user so you can connect to all addresses on the administrative network without entering a password.

Step 3 UpdateEtcHosts

This step performs several tasks:

Generates /etc/hosts from the configuration file and copies it to all servers so they use the InfiniBand connections to communicate internally. The file maps private IP addresses to public host names.

Sets up passwordless SSH for the root user on the InfiniBand network.

Sets up an alias to identify the node where the Mammoth Utility is run as the puppet master node. For example, if you run the Mammoth Utility from bda1node01 with an IP address 192.168.41.1, then a list of aliases for that IP address includes bda1node01-master. The Mammoth Utility uses Puppet for the software installation; the next step describes Puppet in more detail.

Checks the network timing on all nodes. If the timing checks fail, then there are unresolved names and IP addresses that will prevent the installation from running correctly. Fix these issues before continuing with the installation.

Step 4 PreInstallChecks

This step performs a variety of hardware and software checks. A failure in any of these checks causes the Mammoth Utility to fail:

The ARP cache querying time is 2 seconds or less.
All server clocks are synchronized within 10 seconds of the current server.
All servers succeeded on the last restart and generated a /root/BDA_REBOOT_SUCCEEDED file.
The bdacheckhw utility succeeds.
The bdachecksw utility succeeds.

Step 5 SetupPuppet

This step configures puppet agents on all nodes and start them, configures a puppet master on the node where the Mammoth Utility is being run, waits for the agents to submit their certificates, and automates their signing. After this step is completed, Puppet can deploy the software.

Puppet is a distributed configuration management tool that is commonly used for managing Hadoop clusters. The puppet master is a parent service and maintains a Puppet repository. A puppet agent operates on each Hadoop node.

A file named /etc/puppet/puppet.conf resides on every server and identifies the location of the puppet master.

Puppet operates in two modes:

Periodic pull mode in which the puppet agents periodically contact the puppet master and asks for an update, or
Kick mode in which the puppet master alerts the puppet agents that a configuration update is available, and the agents then ask for the update. Puppet operates in kick mode during the Mammoth Utility installation.

In both modes, the puppet master must trust the agent. To establish this trust, the agent sends a certificate to the puppet master node where the sys admin process signs it. When this transaction is complete, the puppet master sends the new configuration to the agent.

For subsequent steps, you can check the Puppet log files on each server, as described in "What If an Error Occurs During the Installation?".

Step 6 PatchFactoryImage

Installs the most recent Oracle Big Data Appliance image and system parameter settings.

Step 7 CopyLicenseFiles

Copies third-party licenses to /opt/oss/src/OSSLicenses.pdf on every server, as required by the licensing agreements.

Step 8 CopySofwareSource

Copies third-party software source code to /opt/oss/src/ on every server, as required by the licensing agreements.

Step 9 CreateLogicalVolumes

Creates a logical volume if physical disks are allocated to Oracle NoSQL Database. This step varies depending on the amount of disk space allocated to Oracle NoSQL Database during configuration:

0 terabytes: This step does nothing.
54 terabytes: The disk space is allocated across the cluster using one disk on each node. The disk mounted at /u12 is used for the logical volume.
108 terabytes: The disk space is allocated across the cluster using two disks on each node. The disks mounted at /u11 and /u12 are used for the logical volume.

The logical volume is mounted at /lv1 and corresponds to device /dev/lvg1/lv1.

After this step finishes, the Linux file systems table in /etc/fstab shows the logical disk instead of /u12, or /u11 and /u12.

Step 10 CreateUsers

Creates the hdfs and mapred users, and the hadoop group. It also creates the oracle user and the dba and oinstall groups.

The various packages installed in later steps also create users and groups during their installation.

See Also:

Oracle Big Data Appliance Software User's Guide for more information about users and groups.

Step 11 SetupMountPoints

The NameNode and Secondary Name Node data is copied to multiple places to prevent a loss of this critical information should a failure occur in either the disk or the entire node where they are set up. The data is replicated during normal operation as follows:

The Name Node and Secondary Name Node data is written to a partition that is mirrored so the loss of a single disk can be tolerated. This mirroring is done at the factory as part of the operating system installation.
This step creates a directory named /opt/exportdir on node04 and mounts it on the Name Node and Secondary Name Node. It also exports /opt/exportdir from node04 and mounts it at /opt/shareddir on all nodes of the cluster. During operation of Oracle Big Data Appliance, the Name Node and Secondary Name Node data is also written to /opt/exportdir.
Optionally, this step mounts on the Name Node and Secondary Name Node a directory on an external server so that the data is written there also. The external server and directory must be identified for this purpose in the Oracle Big Data Appliance Configuration Worksheets. You can examine this configuration setting by looking at the value of $external_dir_path in /opt/oracle/bda/puppet/manifests/environment.pp.

Mammoth checks for these requirements:
- Under the specified directory path, a subdirectory must exist with the same name as the cluster. This subdirectory must be owned by root.
- Under this subdirectory, two subdirectories named nn and snn must exist and be owned by user hdfs and group hadoop. The hdfs UID must the same as the hdfs UID on Oracle Big Data Appliance, and the hadoop GID must be the same as the hadoop GID on Oracle Big Data Appliance.
For example, if the NFS directory is specified in environment.pp as
```
NFS_DIRECTORY=extfiler:/scratch/bda
```
and the cluster name is specified as
```
CLUSTER_NAME=bda1
```
then:
- The /scratch/bda/bda1 directory must exist on EXTFILER and be owned by root.
- The /scratch/bda/bda1/nn and /scratch/bda/bda1/snn directories must exist on EXTFILER and be owned by hdfs in group hadoop.
See Also:
Oracle Big Data Appliance Configuration Worksheets for detailed information about external NameNode backups.

Step 12 SetupMySQL

Installs and configures MySQL Database. This step creates the primary database and several databases on node02 for use by Cloudera Manager. It also sets up replication of the primary database to a backup database on node03.

Step 13 InstallHadoop

Installs all packages in Cloudera's Distribution including Apache Hadoop (CDH) and Cloudera Manager. It then starts the Cloudera Manager server on node02 and configures the cluster.

Step 14 StartHadoopServices

Starts the agents on all nodes and starts all CDH services. After this step, you have a fully functional Hadoop installation.

Cloudera Manager runs on port 7180 of node02. You can open it in a browser, for example:

http://bda1node02.example.com:7180

In this example, bda1node02 is the name of node02 and example.com is the domain. The default user name and password is admin, which is changed in Step 18.

Step 15 StartHiveService

Starts the Hive service on node03 and copies the Hadoop client configuration to /etc/hadoop/conf on all nodes.

Step 16 InstallBDASoftware

Installs Oracle NoSQL Database Community Edition and the server-side components of Oracle Big Data Connectors, if these options were selected in the Oracle Big Data Appliance Configuration Worksheets. Oracle NoSQL Database must be allocated disk space (54 or 108 TB), and Oracle Big Data Connectors must be licensed separately.

Step 17 SetupASR

Installs and configures Auto Service Request (ASR).

Note:

For this step to run successfully, the ASR host system must be up with ASR Manager running and configured properly. See Chapter 10.

This step does the following:

Installs the required software packages
Configures the trap destinations
Starts the monitoring daemon

To activate the assets from ASR Manager, see "Activating ASR Assets".

Step 18 CleanupInstall

Performs the following:

Changes the root password on all nodes (optional).
Changes the Cloudera Manager password if specified in the Installation Template.
Deletes temporary files created during the installation.
Copies log files from all nodes to subdirectories in /opt/oracle/bda/install/log.
Runs cluster verification checks, including TeraSort, to ensure that everything is working properly. It also generates an install summary. All logs are stored in a subdirectory under /opt/oracle/bda/install/log on node01.

Step 19 CleanupSSHroot (Optional)

Removes passwordless SSH for root that was set up in Step 2.

11.8 Using the Mammoth Reconfiguration Utility

The following is the general procedure for running the Mammoth Reconfiguration Utility.

To run the Mammoth Reconfiguration Utility:

Log into the HDFS node (node01) of the primary rack and change to the BDAMammoth directory:
```
cd /opt/oracle/BDAMammoth
```
Note: If the HDFS node is in failure, then log in to the noncritical node that you want to reconfigure as the new HDFS node.
Enter the mammoth-reconfig command with the appropriate subcommand option. See "Mammoth Reconfiguration Utility Syntax".

Specific procedures for performing common software installations are provided in this topic:

Changing the Software Configuration

11.8.1 Changing the Software Configuration

During the initial configuration of Oracle Big Data Appliance, the optional software components may or may not be installed. Using the Mammoth Reconfiguration Utility, you can reverse those decisions. In this release, you can turn Auto Service Request on.

To turn on Auto Service Request:

Set up your My Oracle Support account and install ASR Manager. See Chapter 10.
Log into the HDFS node (node01) of the primary rack and change to the BDAMammoth directory:
```
cd /opt/oracle/BDAMammoth
```
Turn on Auto Service Request monitoring and activate the assets:
```
mammoth-reconfig add asr
```

11.9 Mammoth Reconfiguration Utility Syntax

The mammoth-reconfig command has this basic syntax:

mammoth-reconfig option parameter

This utility uses the configuration settings stored in /opt/oracle/bda/install/state/mammoth-saved.params. It prompts for any missing information, such as passwords. When the utility makes a change, it modifies this file to reflect the new configuration.

Options

add

Adds a service to the cluster. The parameter is a keyword that identifies the service:

asr: Turns on Auto Service Request monitoring on Oracle Big Data Appliance and activates assets on ASR Manager. The installation process prompts you for the ASR Manager host name, port number, and root password. See Chapter 10 for more information about Auto Service Request.

This example adds Auto Service Request support to all servers in the cluster:

mammoth-reconfig add asr