C H A P T E R 2 |
This chapter explains how to install Sun HPC ClusterTools software on systems running the Solaris OS using the installation utilities supplied with the HPC ClusterTools software. For information about how to install Sun HPC ClusterTools software on a Linux-based system, see Chapter 3.
The following are tips for installing Sun HPC ClusterTools 8.2.1c software on clusters containing hundreds of nodes using the centralized method:
Minimize other system activity during installation – Invoking installation of Sun HPC ClusterTools 8.2.1c software on hundreds of nodes from a central host imposes high demands on system resources. Avoid system resource exhaustion by keeping the cluster nodes as quiescent as possible during the installation.
Use a node list file – For various centralized installation tasks, you specify the nodes on which the task is to be invoked. You have the choice of specifying the nodes either on the command line, using the –n option or by referencing a node list file using the –N option. If you reference a node list file, you only enter the node names once when you create the file.
Reduce system resource consumption on the central host – You can avoid overtaxing system resources on a single central host by using more than one central host. Simply divide the total list of nodes to be installed on into separate node lists, and initiate the installation commands on the various central hosts, with each host using a different node list.
Use the –g option with CLI-initiated tasks – Use the –g option with CLI commands to obtain a list of nodes that successfully executed the command and a separate list of nodes that failed. You can then reference the list of failed nodes with the –N option in a later retry of the command.
Use the –k option with CLI-initiated tasks – Use the –k option with CLI commands to have all logs saved on the central node where the command was initiated. This option makes it unnecessary to go to each node to examine local logs.
Before you can install and configure the software, you must download the correct software archive for your hardware platform and then extract it to the correct directory. If you have installed a previous version of the software, there are additional steps you need to do to prepare for installation. The following procedure explains these steps.
The following procedure downloads Sun HPC ClusterTools 8.2.1c software to a standard location and prepares it for installation by the ctinstall utility.
Download and extract the archive file containing the Sun HPC ClusterTools software to a location that is visible to all the nodes in the cluster.
If you download the file to a shared file system, ensure that the following conditions are met:
All compute and administrative nodes have access to the shared file system.
The file system is readable by superuser and accessible through a common path from all nodes.
For centralized installations, these conditions must be met on the central host as well.
You can download the correct HPC ClusterTools archive file for your platform from the following location:
Log in as superuser on the system from which you will execute the ClusterTools installation utilities.
Use the ctinstall command to install Sun HPC ClusterTools software. See TABLE 2-1 for a summary of the ctinstall options.
Note - The options –g, –k, –n, –N, –r, and –S are incompatible with local (non-centralized) installations. If the –l option is used with any of these options, an error message is displayed. |
By default, ctinstall installs the software into /opt/SUNWhpc/HPC8.2.1c/compiler. You can use the -t switch on the ctinstall command line to install the software into another location. The path you specify with the -t switch will replace the /opt portion of the default path.
For example, the following command line will cause the software to be installed on the local node in a location whose pathname that begins with /usr/mpi:
# ./ctinstall –l -t /usr/mpi |
The full pathname of the non-standard installation locations is /usr/mpi/SUNWhpc/HPC8.2.1c/compiler.
To use this path, you must set both the PATH and OPAL_PREFIX variables and specify the appropriate compiler name in place of compiler. In the following example, a Sun Studio compiled version of the software is being installed.
# setenv PATH OPAL_PREFIX /usr/mpi/SUNWhpc/HPC8.2.1c/sun # setenv PATH $(OPAL_PREFIX)/bin:$(PATH) |
You can choose between two methods of initiating operations on the cluster nodes:
Centralized – Initiate commands from a central host, specifying the nodes on which the command is to take effect. The initiating host establishes remote connections to the target nodes and broadcasts the commands to them over an rsh, ssh, or telnet connection. The central (initiating) host can be part of the cluster or it can be an administrative system external to the cluster.
Local – Initiate commands directly on the node you are logged into. The effects of the command are restricted to the local node.
Support for centralized command initiation is built into the Sun HPC ClusterTools software installation utilities. Issuing these commands from a central host has the equivalent effect as invoking the commands locally using one of the Cluster Console tools, cconsole, ctelnet, or crlogin.
The Sun HPC ClusterTools software CLI utilities provide several options that are specific to the centralized command initiation mode and are intended to simplify management of parallel installation of the software from a central host. These options support:
Creating corresponding versions of local log files on the central host for easier access
Generating a list of nodes that had successful operations and another list of nodes that were unsuccessful. These pass/fail node lists can then be used in subsequent operations, such as software removal.
The initiating system can be one of the cluster nodes or it can be external to the cluster. It must be a Sun system running the Solaris 9 or Solaris 10 Operating System (Solaris OS). Compute nodes must run the Solaris 10 OS.
This section shows examples of HPC ClusterTools software being installed from a central host.
# ./ctinstall –n node1,node2 –r rsh |
This command installs the full Sun HPC ClusterTools software suite on node1 and node2 from a central host. The node list is specified on the command line. The remote connection method is ssh. This requires a trusted hosts setup.
The software will be ready for use when the installation process completes.
# ./ctinstall –n node1,node2 –r ssh |
This example is the same as that in the previous section, except that the remote connection method is ssh. This method requires that the initiating node be able to log in as superuser to the target nodes without being prompted for any interaction, such as a password.
# ./ctinstall –N /tmp/nodelist –r telnet |
This command installs the full Sun HPC ClusterTools software suite on the set of nodes listed in the file /tmp/nodelist from a central host. A node list file is particularly useful when you have a large set of nodes or you want to run operations on the same set of nodes repeatedly.
The node list file has the following contents:
# Node list for the above example node1 node2 |
The remote connection method is telnet. All cluster nodes must share the same password. If some nodes do not use the same password as others, install the software in groups, each group consisting of nodes that use a common password.
The software will be ready for use when the installation process completes.
# ./ctinstall –N /tmp/nodelist –r telnet –k /tmp/cluster-logs –g |
The command in this section is the same as that shown in the previous section, except that it includes the –k and –g options.
In this example, the –k option causes the local log files of all specified nodes to be saved in /tmp/cluster-logs on the central host.
The –g option causes a pair of node list files to be created on the central host in /var/sadm/system/logs/hpc/nodelists. One file, ctinstall.pass$$, contains a list of the nodes on which the installation was successful. The other file, ctinstall.fail$$, lists the nodes on which the installation was unsuccessful. The $$ symbol is replaced by the process number associated with the installation.
These generated node list files can then be used for command retries or in subsequent operations using the –N switch.
This section shows examples of HPC ClusterTools software being installed on the local node.
Note - The options –g, –k, –n, –N, –r, and –S are incompatible with local (non-centralized) installations. If the –l option is used with any of these options, an error message is displayed. |
# ./ctinstall –l |
This command installs the full Sun HPC ClusterTools software suite on the local node only.
# ./ctinstall –l –p SUNWompi,SUNWompimn |
The command in this section installs the packages SUNWompi and SUNWompimn on the local node.
Solaris OS Packages lists the packages in the Sun HPC ClusterTools 8.2.1c installation.
The following command installs only the specified software packages.
# ./ctinstall –N /tmp/nodelist –r telnet –p SUNWompi |
This command installs the packages SUNWompi and SUNWompimn on the set of nodes listed in the file /tmp/nodelist. No other packages are installed. The remote connection method is telnet.
Solaris OS Packages lists the packages in the Sun HPC ClusterTools 8.2.1c installation.
The –p option can be useful if individual packages were not installed on the nodes by ctinstall.
# ./ctinstall –N /tmp/nodelist –r rsh |
This command installs and activates the full Sun HPC ClusterTools software suite on the nodes listed in the file /tmp/nodelist. The remote connection method is rsh.
The following is the Solaris OS package breakdown for the Sun HPC ClusterTools 8.2.1c (Open MPI) release.
You can verify that the software is installed properly by launching a simple non-MPI parallel job using mpirun. In the following example, hostname is the name of the system on which the RPM packages were installed:
% /opt/SUNWhpc/HPC8.2.1c/bin/mpirun hostname |
The Sun HPC ClusterTools 8.2.1c installation tools log information about installation-related tasks locally on the nodes where installation tasks are performed. The default location for the log files is /var/sadm/system/logs/hpc. If installation tasks are initiated from a central host, a summary log file is also created on the central host.
Two types of log files are created locally on each cluster node where installation operations take place.
Task-specific logs – Separate log files are created for each installation-related task. They are:
These log files contain detailed logging information for the most recent associated task. Each time a task is repeated, its log file is overwritten.
History log – A ct_history.log file is created to store all installation-related tasks performed on the local node. This provides a convenient record of the Sun HPC ClusterTools 8.2.1c software installation history on the local node. Each time a new installation task is performed on the node, a new log entry is appended to the history log.
These node specific installation log files are created regardless of the installation method used, local or centralized.
When installation tasks are initiated from a central host, a summary log file named ct_summary.log is created on the central host. This log file records the final summary report that is generated by the CLI. The ct_summary.log is not overwritten when a new task is performed. As with the ct_history.log file, new entries are appended to the summary log file.
This section describes how to remove Sun HPC ClusterTools software using the ctremove utility. See Table 1 for a summary of the ctremove options.
This section shows the basic steps involved in removing Sun HPC ClusterTools software from one or more platforms.
# cd $INSTALL_LOC/SUNwhpc/HPC8.2.1c/bin/Install_Utilities/bin # ctremove options |
$INSTALL_LOC is the location of the software that will be removed.
Note - If any nodes are active at the time ctremove is initiated, they will be deactivated automatically before the removal process begins. |
This section shows examples of software removal in which the ctremove command is initiated from a central host.
Remove the Software From Specified Nodes and Generate Log Files |
Use the -k option to direct log files to a central location and the -g option to generate lists of successful and unsuccessful node removals.
# ./ctremove –N /tmp/nodelist –r rsh –k /tmp/cluster-logs –g |
This command example is the same as the in the previous section, except that it specifies the options –k and –g in addition to -N and -r.
Copyright © 2010, Oracle and/or its affiliates. All rights reserved.