13 Installing the Oracle Big Data Appliance Software

This chapter explains how to use the Mammoth Utility to install the software on Oracle Big Data Appliance. It contains these sections:

Using the Mammoth Utility
Installing the Software on a Single or Primary Rack
Mammoth Utility Syntax
What If an Error Occurs During the Installation?
Adding a Rack to an Existing Cluster
Mammoth Utility Steps

13.1 Using the Mammoth Utility

The Mammoth Utility installs and configures the software on Oracle Big Data Appliance using the files generated by the Oracle Big Data Appliance Configuration Utility. At a minimum, Mammoth installs and configures Cloudera's Distribution including Apache Hadoop. This includes all the Hadoop software and Cloudera Manager, which is the tool for administering your Hadoop cluster. Mammoth will optionally install and configure Oracle NoSQL Database and, if you have a license, all components of Oracle Big Data Connectors.

In addition to installing the software across all servers in the rack, the Mammoth Utility creates the required user accounts, starts the correct services, and sets the appropriate configuration parameters. When it is done, you have a fully functional, highly tuned, up and running Hadoop cluster.

You must run the Mammoth Utility once for each rack.

For one Oracle Big Data Appliance rack that forms one Hadoop cluster, follow the procedure in "Installing the Software on a Single or Primary Rack".
For multiple racks where each rack forms an independent Hadoop cluster, follow the procedure in "Installing the Software on a Single or Primary Rack" for each rack.
For multiple racks that form a single, multirack Hadoop cluster:
- Identify the primary rack of the cluster, then follow the procedure in "Installing the Software on a Single or Primary Rack".
- For the other racks of the cluster, follow the procedure in Adding a Rack to an Existing Cluster.

13.2 Installing the Software on a Single or Primary Rack

Follow this procedure to install and configure the software on a single Oracle Big Data Appliance rack or on the primary rack of a multiple-rack cluster.

To install the software:

Verify that the Oracle Big Data Appliance rack is configured according to the custom network settings described in /opt/oracle/bda/BdaDeploy.json. If the rack is still configured to the factory default IP addresses, first perform the network configuration steps described in "Configuring the Network".
Verify that the software is not installed on the rack already. If the software is installed and you want to reinstall it:

Caution:
These steps result in the loss of all data stored in HDFS. Consider your options carefully before proceeding.
- Use the mammoth -u option, described in "Mammoth Utility Syntax".
- If mammoth -u fails, then reimage the entire rack using the reimagerack utility. See "Checking the Health of the Network".
Copy BDAMammoth-version.run to any directory on node01 (such as /tmp). You can download this file from the same location as the base image file. See the procedures under "Reinstalling the Base Image".
Log in to node01 as root and decompress the BDAMammoth-version.run self-extracting file. This example extracts Mammoth version 1.0.3 in the /tmp directory:
```
/tmp/BDAMammoth-1.0.3.run
```
Change directories:
```
cd /opt/oracle/BDAMammoth
```
Copy mammoth-rack_name.params to the current directory. See "About the Configuration Files".
Run the mammoth command with the appropriate option. See Table 13-1. This sample command runs steps 1 and 2 on rack bda2:
```
./mammoth -r 1-2 bda2
```

The Mammoth Utility stores the current configuration in the /opt/oracle/bda/install/state directory. Do not delete the files in this directory. The Mammoth Utility fails without this information if you need to use it again, such as adding a rack to the cluster.

13.3 Mammoth Utility Syntax

You must change to the /opt/oracle/BDAMammoth directory to use the Mammoth Utility. It has this syntax:

./mammoth option [rack_name]

In this command, rack_name is the name of an Oracle Big Data Appliance rack. You must enter the rack name in the first command exactly as it appears in the configuration file name (mammoth-rack_name.params). Afterward, rack_name defaults to the rack specified in a previous mammoth command.

You must finish installing one rack before starting the installation of another rack.

Table 13-1 lists the Mammoth Utility options.

Table 13-1 Mammoth Utility Options

Option	Description
`-h`	Displays command Help including command usage and a list of steps.
`-i`	Runs all mandatory steps, equivalent to `-r 1-18`.
`-l`	List the steps of the Mammoth Utility.
`-r` `n-N`	Run steps n through N of the Mammoth Utility while no errors occur
`-s` `n`	Runs step n.
`-u`	Uninstalls all software from all racks in an existing Hadoop cluster. This option results in a loss of all data.
`-v`	Displays the version number of the Mammoth Utility.

Example 13-1 Mammoth Utility Syntax Examples

This command displays Help for the Mammoth Utility:

./mammoth -h

This command does a complete install on rack bda3:

./mammoth -i bda3

The next command runs steps 2 through 6 on the rack being set up:

./mammoth -r 2-6

13.4 What If an Error Occurs During the Installation?

Each step generates a detailed log file listing the actions performed on each server and whether the step completed successfully. If an error occurs, the script stops. You can then check the log files in /opt/oracle/BDAMammoth/bdaconfig/tmp. The log files are named in this format:

STEP-i-yyyymmddhhmmss.log

In this format, i is the step number and yyyymmddhhmmss identifies the year, month, day, hour, minute, and second that the file was created.

After fixing the problem, you can rerun all steps or a range of steps. You cannot skip steps or run them out of order.

13.5 Adding a Rack to an Existing Cluster

Each multirack cluster has one rack designated as the primary rack. Whether a rack is the primary one is indicated in the Oracle Big Data Appliance Configuration Worksheets and specified in the mammoth-rack_name.params file. Each rack of a multirack Hadoop cluster has a separate mammoth-rack_name.params file.

To install the software on additional racks in the same cluster:

Install the software on the primary rack of the Hadoop cluster. See "Installing the Software on a Single or Primary Rack".
Ensure that all racks are running the same software version. See "About Software Version Differences".
Ensure that all racks that form a single Hadoop cluster are cabled together. See Chapter 9.
Copy the mammoth-rack_name.params files of the non-primary racks to node01 (the bottom server) of the primary rack. Do not copy them to the non-primary racks.
Connect as root to node01 of the primary rack and change to the BDAMammoth directory:
```
cd /opt/oracle/BDAMammoth
```
Note: Always start Mammoth from the primary rack.
For each non-primary rack, issue the mammoth command with the appropriate option. See "Mammoth Utility Syntax". For example, this command starts the installation on rack bda4:
```
./mammoth -i bda4
```

The primary rack of a multirack Hadoop cluster is configured the same as a single Hadoop cluster. It runs the NameNode, Secondary Name Node, Hue, Hive, and other key services. The other racks of a multirack Hadoop cluster are configured differently. They only run the DataNodes and TaskTrackers.

Oracle Big Data Connectors are installed on all nodes of the non-primary racks although no services run on them. Oracle Data Integrator agent still runs on node03 of the primary rack. You cannot add nodes to an Oracle NoSQL Database cluster after it is set up. However, a logical volume is created on the additional rack for future use when nodes can be added to an Oracle NoSQL Database cluster.

The Mammoth Utility obtains the current configuration from the files stored in /opt/oracle/bda/install/state. If those files are missing or if any of the services have been moved manually to run on other nodes, then the Mammoth Utility fails.

A new Oracle Big Data Appliance rack may be factory-installed with a newer image than the previously installed racks. All racks configured as one Hadoop cluster must have the same image. When all racks have the same image, you can install the software on the new rack.

About Software Version Differences

A new Oracle Big Data Appliance rack may be factory-installed with a newer base image than the previously installed racks. Use the imageinfo utility on any server to get the image version. Only when all racks of a single Hadoop cluster have the same image version can you proceed to install the software on the new rack.

To synchronize the new rack with the rest of the Hadoop cluster, either upgrade the existing cluster to the latest image version or downgrade the image version of the new rack.

To downgrade the image version:

Reimage the new rack to the older version installed on the cluster. See My Oracle Support Master Note 1434477.1 and its related notes.
Use the older version of the Oracle Big Data Appliance Configuration Utility to generate the configuration files.
Use the older version of the Mammoth Utility to install the software.

13.6 Mammoth Utility Steps

Following are descriptions of the steps that the Mammoth Utility performs when installing the software.

Step 1 SetupInstall

Validates the configuration files.

Step 2 WriteNodelists

Displays a road map of the planned system, including this information:

Location of important nodes in the system
List of partitions for HDFS directories
List of disks reserved for Oracle NoSQL Database
Ports used by various components
Location of services
User names and initial passwords

This step, and every subsequent step, stores information in a file named /opt/oracle/bda/environment.pp. Check the contents of this file now to ensure that the environment generated by the Mammoth Utility appears correct.

This step also generates a file named passwords.pp. It contains the passwords for various software components that run under an operating system user identity. Operating system root passwords are not written to disk. The last step of the installation removes passwords.pp.

Step 3 SetupSSHroot

Sets up a Secure Shell (SSH) for the root user so you can connect to all addresses on the administrative network without entering a password.

Step 4 UpdateEtcHosts

This step performs several tasks:

Generates /etc/hosts from the configuration file and copies it to all servers so they use the InfiniBand connections to communicate internally. The file maps private IP addresses to public host names.

Sets up passwordless SSH for the root user on the InfiniBand network.

Sets up an alias to identify the node where the Mammoth Utility is run as the puppet master node. For example, if you run the Mammoth Utility from bda1node01 with an IP address 192.168.41.1, then a list of aliases for that IP address includes bda1node01-master. The Mammoth Utility uses Puppet for the software installation; the next step describes Puppet in more detail.

Checks the network timing on all nodes. If the timing checks fail, then there are unresolved names and IP addresses that will prevent the installation from running correctly. Fix these issues before continuing with the installation.

Step 5 SetupPuppet

This step configures puppet agents on all nodes and start them, configures a puppet master on the node where the Mammoth Utility is being run, waits for the agents to submit their certificates, and automates their signing. After this step is completed, Puppet can deploy the software.

Puppet is a distributed configuration management tool that is commonly used for managing Hadoop clusters. The puppet master is a parent service and maintains a Puppet repository. A puppet agent operates on each Hadoop node.

A file named /etc/puppet/puppet.conf resides on every server and identifies the location of the puppet master.

Puppet operates in two modes:

Periodic pull mode in which the puppet agents periodically contact the puppet master and asks for an update, or
Kick mode in which the puppet master alerts the puppet agents that a configuration update is available, and the agents then ask for the update. Puppet operates in kick mode during the Mammoth Utility installation.

In both modes, the puppet master must trust the agent. To establish this trust, the agent sends a certificate to the puppet master node where the sys admin process signs it. When this transaction is complete, the puppet master sends the new configuration to the agent.

For subsequent steps, you can check the Puppet log files on each server, as described in "What If an Error Occurs During the Installation?".

Step 6 PatchFactoryImage

Installs the most recent Oracle Big Data Appliance image and system parameter settings.

Step 7 CopyLicenseFiles

Copies third-party licenses to /opt/oss/src/OSSLicenses.pdf on every server, as required by the licensing agreements.

Step 8 CopySofwareSource

Copies third-party software source code to /opt/oss/src/ on every server, as required by the licensing agreements.

Step 9 CreateLogicalVolumes

Creates a logical volume if physical disks are allocated to Oracle NoSQL Database. This step varies depending on the amount of disk space allocated to Oracle NoSQL Database during configuration:

0 terabytes: This step does nothing.
54 terabytes: The disk space is allocated across the cluster using one disk on each node. The disk mounted at /u12 is used for the logical volume.
108 terabytes: The disk space is allocated across the cluster using two disks on each node. The disks mounted at /u11 and /u12 are used for the logical volume.

The logical volume is mounted at /lv1.

After this step finishes, the Linux file systems table in /etc/fstab shows the logical disks instead of the physical disks they represent.

Step 10 CreateUsers

Creates the hdfs and mapred users, and the hadoop group. It also creates the oracle user and the dba and oinstall groups.

The various packages installed in later steps also create users and groups during their installation.

See Also:

Oracle Big Data Appliance Software User's Guide for more information about users and groups.

Step 11 SetupMountPoints

The NameNode and Secondary Name Node data is copied to multiple places to prevent a loss of this critical information should a failure occur in either the disk or the entire node where they are set up. The data is replicated during normal operation as follows:

The Name Node and Secondary Name Node data is written to a partition that is mirrored so the loss of a single disk can be tolerated. This mirroring is done at the factory as part of the operating system installation.
This step creates a directory named /opt/exportdir on node04 and mounts it on the Name Node and Secondary Name Node. It also exports /opt/exportdir from node04 and mounts it at /opt/shareddir on all nodes of the cluster. During operation of Oracle Big Data Appliance, the Name Node and Secondary Name Node data is also written to /opt/exportdir.
Optionally, this step mounts on the Name Node and Secondary Name Node a directory on an external server so that the data is written there also. The external server and directory must be identified for this purpose in the Oracle Big Data Appliance Configuration Worksheets. You can examine this configuration setting by looking at the value of $external_dir_path in /opt/oracle/bda/puppet/manifests/environment.pp.

Mammoth checks for these requirements:
- Under the specified directory path, a subdirectory must exist with the same name as the cluster. This subdirectory must be owned by root.
- Under this subdirectory, two subdirectories named nn and snn must exist and be owned by user hdfs and group hadoop. The hdfs UID must the same as the hdfs UID on Oracle Big Data Appliance, and the hadoop GID must be the same as the hadoop GID on Oracle Big Data Appliance.
For example, if the NFS directory is specified in environment.pp as
```
NFS_DIRECTORY=extfiler:/scratch/bda
```
and the cluster name is specified as
```
CLUSTER_NAME=bda1
```
then:
- The /scratch/bda/bda1 directory must exist on EXTFILER and be owned by root.
- The /scratch/bda/bda1/nn and /scratch/bda/bda1/snn directories must exist on EXTFILER and be owned by hdfs in group hadoop.

Step 12 SetupMySQL

Installs and configures MySQL Database. This step creates the primary database and several databases for use by Cloudera Manager on node03. It also sets up replication of the primary database to a backup database on node02.

When this step is complete, you can open MySQL Database:

# mysql -uroot -p password
mysql> show databases

Step 13 InstallHadoop

Installs all packages in Cloudera's Distribution including Apache Hadoop (CDH) and Cloudera Manager. It then starts the Cloudera Manager server on node02 and configures the cluster.

Step 14 StartHadoopServices

Starts the agents on all nodes and starts all CDH services. After this step, you have a fully functional Hadoop installation.

Cloudera Manager runs on port 7180 of node02. You can open it in a browser, for example:

http://bda1node02.example.com:7180

In this example, bda1node02 is the name of node02 and example.com is the domain. The default user name and password is admin, which is changed in Step 18.

Step 15 StartHiveService

Starts the Hive service on node03 and copies the Hadoop client configuration to /etc/hadoop/conf on all nodes.

Step 16 InstallBDASoftware

Installs Oracle NoSQL Database Community Edition and the server-side components of Oracle Big Data Connectors, if these options were selected in the Oracle Big Data Appliance Configuration Worksheets. Oracle NoSQL Database must be allocated disk space (54 or 108 TB) and Oracle Big Data Connectors must be licensed separately.

Step 17 SetupASR

Installs and configures Auto Service Request (ASR).

Note:

For this step to run successfully, the ASR host system must be up with ASR Manager running and configured properly. See Chapter 12.

This step does the following:

Installs the required software packages
Configures the trap destinations
Starts the monitoring daemon

To activate the assets from ASR Manager, see "Activating ASR Assets".

Step 18 CleanupInstall

Performs the following:

Changes the root password on all nodes (optional).
Changes the Cloudera Manager password if specified in the Installation Template.
Deletes temporary files created during the installation.
Copies log files from all nodes to subdirectories in /opt/oracle/bda/install/log.
Runs cluster verification checks, including TeraSort, to ensure that everything is working properly. It also generates an install summary. All logs are stored in a subdirectory under /opt/oracle/bda/install/log on node01.

Step 19 CleanupSSHroot (Optional)

Removes passwordless SSH for root that was set up in Step 3.