|Oracle® Big Data Appliance Owner's Guide
Release 1 (1.0.3)
Part Number E25960-05
|PDF · Mobi · ePub|
This chapter explains how to use the Mammoth Utility to install the software on Oracle Big Data Appliance. It contains these sections:
The Mammoth Utility installs and configures the software on Oracle Big Data Appliance using the files generated by the Oracle Big Data Appliance Configuration Utility. At a minimum, Mammoth installs and configures Cloudera's Distribution including Apache Hadoop. This includes all the Hadoop software and Cloudera Manager, which is the tool for administering your Hadoop cluster. Mammoth will optionally install and configure Oracle NoSQL Database and, if you have a license, all components of Oracle Big Data Connectors.
In addition to installing the software across all servers in the rack, the Mammoth Utility creates the required user accounts, starts the correct services, and sets the appropriate configuration parameters. When it is done, you have a fully functional, highly tuned, up and running Hadoop cluster.
You must run the Mammoth Utility once for each rack.
For one Oracle Big Data Appliance rack that forms one Hadoop cluster, follow the procedure in "Installing the Software on a Single or Primary Rack".
For multiple racks where each rack forms an independent Hadoop cluster, follow the procedure in "Installing the Software on a Single or Primary Rack" for each rack.
For multiple racks that form a single, multirack Hadoop cluster:
Identify the primary rack of the cluster, then follow the procedure in "Installing the Software on a Single or Primary Rack".
For the other racks of the cluster, follow the procedure in Adding a Rack to an Existing Cluster.
Follow this procedure to install and configure the software on a single Oracle Big Data Appliance rack or on the primary rack of a multiple-rack cluster.
Verify that the Oracle Big Data Appliance rack is configured according to the custom network settings described in /opt/oracle/bda/BdaDeploy.json. If the rack is still configured to the factory default IP addresses, first perform the network configuration steps described in "Configuring the Network".
Verify that the software is not installed on the rack already. If the software is installed and you want to reinstall it:
Caution:These steps result in the loss of all data stored in HDFS. Consider your options carefully before proceeding.
Copy BDAMammoth-version.run to any directory on node01 (such as /tmp). You can download this file from the same location as the base image file. See the procedures under "Reinstalling the Base Image".
Log in to node01 as
root and decompress the BDAMammoth-version.run self-extracting file. This example extracts Mammoth version 1.0.3 in the /tmp directory:
Copy mammoth-rack_name.params to the current directory. See "About the Configuration Files".
mammoth command with the appropriate option. See Table 13-1. This sample command runs steps 1 and 2 on rack bda2:
./mammoth -r 1-2 bda2
The Mammoth Utility stores the current configuration in the /opt/oracle/bda/install/state directory. Do not delete the files in this directory. The Mammoth Utility fails without this information if you need to use it again, such as adding a rack to the cluster.
You must change to the /opt/oracle/BDAMammoth directory to use the Mammoth Utility. It has this syntax:
./mammoth option [rack_name]
In this command, rack_name is the name of an Oracle Big Data Appliance rack. You must enter the rack name in the first command exactly as it appears in the configuration file name (mammoth-rack_name.params). Afterward, rack_name defaults to the rack specified in a previous
You must finish installing one rack before starting the installation of another rack.
Table 13-1 lists the Mammoth Utility options.
Displays command Help including command usage and a list of steps.
Runs all mandatory steps, equivalent to
List the steps of the Mammoth Utility.
Run steps n through N of the Mammoth Utility while no errors occur
Runs step n.
Uninstalls all software from all racks in an existing Hadoop cluster. This option results in a loss of all data.
Displays the version number of the Mammoth Utility.
Each step generates a detailed log file listing the actions performed on each server and whether the step completed successfully. If an error occurs, the script stops. You can then check the log files in /opt/oracle/BDAMammoth/bdaconfig/tmp. The log files are named in this format:
In this format, i is the step number and yyyymmddhhmmss identifies the year, month, day, hour, minute, and second that the file was created.
After fixing the problem, you can rerun all steps or a range of steps. You cannot skip steps or run them out of order.
Each multirack cluster has one rack designated as the primary rack. Whether a rack is the primary one is indicated in the Oracle Big Data Appliance Configuration Worksheets and specified in the mammoth-rack_name.params file. Each rack of a multirack Hadoop cluster has a separate mammoth-rack_name.params file.
To install the software on additional racks in the same cluster:
Install the software on the primary rack of the Hadoop cluster. See "Installing the Software on a Single or Primary Rack".
Ensure that all racks are running the same software version. See "About Software Version Differences".
Ensure that all racks that form a single Hadoop cluster are cabled together. See Chapter 9.
Copy the mammoth-rack_name.params files of the non-primary racks to node01 (the bottom server) of the primary rack. Do not copy them to the non-primary racks.
root to node01 of the primary rack and change to the BDAMammoth directory:
Note: Always start Mammoth from the primary rack.
For each non-primary rack, issue the
mammoth command with the appropriate option. See "Mammoth Utility Syntax". For example, this command starts the installation on rack bda4:
./mammoth -i bda4
The primary rack of a multirack Hadoop cluster is configured the same as a single Hadoop cluster. It runs the NameNode, Secondary Name Node, Hue, Hive, and other key services. The other racks of a multirack Hadoop cluster are configured differently. They only run the DataNodes and TaskTrackers.
Oracle Big Data Connectors are installed on all nodes of the non-primary racks although no services run on them. Oracle Data Integrator agent still runs on node03 of the primary rack. You cannot add nodes to an Oracle NoSQL Database cluster after it is set up. However, a logical volume is created on the additional rack for future use when nodes can be added to an Oracle NoSQL Database cluster.
The Mammoth Utility obtains the current configuration from the files stored in /opt/oracle/bda/install/state. If those files are missing or if any of the services have been moved manually to run on other nodes, then the Mammoth Utility fails.
A new Oracle Big Data Appliance rack may be factory-installed with a newer image than the previously installed racks. All racks configured as one Hadoop cluster must have the same image. When all racks have the same image, you can install the software on the new rack.
A new Oracle Big Data Appliance rack may be factory-installed with a newer base image than the previously installed racks. Use the
imageinfo utility on any server to get the image version. Only when all racks of a single Hadoop cluster have the same image version can you proceed to install the software on the new rack.
To synchronize the new rack with the rest of the Hadoop cluster, either upgrade the existing cluster to the latest image version or downgrade the image version of the new rack.
To downgrade the image version:
Reimage the new rack to the older version installed on the cluster. See My Oracle Support Master Note 1434477.1 and its related notes.
Use the older version of the Oracle Big Data Appliance Configuration Utility to generate the configuration files.
Use the older version of the Mammoth Utility to install the software.
Following are descriptions of the steps that the Mammoth Utility performs when installing the software.
Validates the configuration files.
Displays a road map of the planned system, including this information:
Location of important nodes in the system
List of partitions for HDFS directories
List of disks reserved for Oracle NoSQL Database
Ports used by various components
Location of services
User names and initial passwords
This step, and every subsequent step, stores information in a file named /opt/oracle/bda/environment.pp. Check the contents of this file now to ensure that the environment generated by the Mammoth Utility appears correct.
This step also generates a file named passwords.pp. It contains the passwords for various software components that run under an operating system user identity. Operating system
root passwords are not written to disk. The last step of the installation removes passwords.pp.
Sets up a Secure Shell (SSH) for the root user so you can connect to all addresses on the administrative network without entering a password.
This step performs several tasks:
Generates /etc/hosts from the configuration file and copies it to all servers so they use the InfiniBand connections to communicate internally. The file maps private IP addresses to public host names.
Sets up passwordless SSH for the root user on the InfiniBand network.
Sets up an alias to identify the node where the Mammoth Utility is run as the puppet master node. For example, if you run the Mammoth Utility from bda1node01 with an IP address 192.168.41.1, then a list of aliases for that IP address includes bda1node01-master. The Mammoth Utility uses Puppet for the software installation; the next step describes Puppet in more detail.
Checks the network timing on all nodes. If the timing checks fail, then there are unresolved names and IP addresses that will prevent the installation from running correctly. Fix these issues before continuing with the installation.
This step configures puppet agents on all nodes and start them, configures a puppet master on the node where the Mammoth Utility is being run, waits for the agents to submit their certificates, and automates their signing. After this step is completed, Puppet can deploy the software.
Puppet is a distributed configuration management tool that is commonly used for managing Hadoop clusters. The puppet master is a parent service and maintains a Puppet repository. A puppet agent operates on each Hadoop node.
A file named /etc/puppet/puppet.conf resides on every server and identifies the location of the puppet master.
Puppet operates in two modes:
Periodic pull mode in which the puppet agents periodically contact the puppet master and asks for an update, or
Kick mode in which the puppet master alerts the puppet agents that a configuration update is available, and the agents then ask for the update. Puppet operates in kick mode during the Mammoth Utility installation.
In both modes, the puppet master must trust the agent. To establish this trust, the agent sends a certificate to the puppet master node where the sys admin process signs it. When this transaction is complete, the puppet master sends the new configuration to the agent.
For subsequent steps, you can check the Puppet log files on each server, as described in "What If an Error Occurs During the Installation?".
Installs the most recent Oracle Big Data Appliance image and system parameter settings.
0 terabytes: This step does nothing.
54 terabytes: The disk space is allocated across the cluster using one disk on each node. The disk mounted at
/u12 is used for the logical volume.
108 terabytes: The disk space is allocated across the cluster using two disks on each node. The disks mounted at
/u12 are used for the logical volume.
After this step finishes, the Linux file systems table in /etc/fstab shows the logical disks instead of the physical disks they represent.
The various packages installed in later steps also create users and groups during their installation.
See Also:Oracle Big Data Appliance Software User's Guide for more information about users and groups.
The NameNode and Secondary Name Node data is copied to multiple places to prevent a loss of this critical information should a failure occur in either the disk or the entire node where they are set up. The data is replicated during normal operation as follows:
The Name Node and Secondary Name Node data is written to a partition that is mirrored so the loss of a single disk can be tolerated. This mirroring is done at the factory as part of the operating system installation.
This step creates a directory named /opt/exportdir on node04 and mounts it on the Name Node and Secondary Name Node. It also exports /opt/exportdir from node04 and mounts it at /opt/shareddir on all nodes of the cluster. During operation of Oracle Big Data Appliance, the Name Node and Secondary Name Node data is also written to /opt/exportdir.
Optionally, this step mounts on the Name Node and Secondary Name Node a directory on an external server so that the data is written there also. The external server and directory must be identified for this purpose in the Oracle Big Data Appliance Configuration Worksheets. You can examine this configuration setting by looking at the value of $external_dir_path in /opt/oracle/bda/puppet/manifests/environment.pp.
Mammoth checks for these requirements:
Under the specified directory path, a subdirectory must exist with the same name as the cluster. This subdirectory must be owned by
Under this subdirectory, two subdirectories named nn and snn must exist and be owned by user
hdfs and group
hdfs UID must the same as the
hdfs UID on Oracle Big Data Appliance, and the
hadoop GID must be the same as the
hadoop GID on Oracle Big Data Appliance.
For example, if the NFS directory is specified in environment.pp as
and the cluster name is specified as
The /scratch/bda/bda1 directory must exist on EXTFILER and be owned by
The /scratch/bda/bda1/nn and /scratch/bda/bda1/snn directories must exist on EXTFILER and be owned by
hdfs in group
Installs and configures MySQL Database. This step creates the primary database and several databases for use by Cloudera Manager on node03. It also sets up replication of the primary database to a backup database on node02.
When this step is complete, you can open MySQL Database:
# mysql -uroot -p password mysql> show databases
Installs all packages in Cloudera's Distribution including Apache Hadoop (CDH) and Cloudera Manager. It then starts the Cloudera Manager server on node02 and configures the cluster.
Starts the agents on all nodes and starts all CDH services. After this step, you have a fully functional Hadoop installation.
Cloudera Manager runs on port 7180 of node02. You can open it in a browser, for example:
In this example, bda1node02 is the name of node02 and
example.com is the domain. The default user name and password is
admin, which is changed in Step 18.
Starts the Hive service on node03 and copies the Hadoop client configuration to /etc/hadoop/conf on all nodes.
Installs Oracle NoSQL Database Community Edition and the server-side components of Oracle Big Data Connectors, if these options were selected in the Oracle Big Data Appliance Configuration Worksheets. Oracle NoSQL Database must be allocated disk space (54 or 108 TB) and Oracle Big Data Connectors must be licensed separately.
Installs and configures Auto Service Request (ASR).
Note:For this step to run successfully, the ASR host system must be up with ASR Manager running and configured properly. See Chapter 12.
This step does the following:
Installs the required software packages
Configures the trap destinations
Starts the monitoring daemon
To activate the assets from ASR Manager, see "Activating ASR Assets".
Performs the following:
Changes the root password on all nodes (optional).
Changes the Cloudera Manager password if specified in the Installation Template.
Deletes temporary files created during the installation.
Copies log files from all nodes to subdirectories in /opt/oracle/bda/install/log.
Runs cluster verification checks, including TeraSort, to ensure that everything is working properly. It also generates an install summary. All logs are stored in a subdirectory under /opt/oracle/bda/install/log on node01.
Removes passwordless SSH for
root that was set up in Step 3.