Sun HPC ClusterTools 3.0 Administrator's Guide: With CRE

Appendix A Installing and Removing the Software

This appendix includes instructions for

Installing at the Command Line

The easiest way to configure and install Sun HPC ClusterTools 3.0 software is to use the configuration tool, install_gui, as described in the Sun HPC ClusterTools 3.0 Installation Guide. If you prefer, however, you may install the software from the command line as described in this appendix, with a few references to the installation guide.

Figure A-1 summarizes the steps involved. The solid lines identify tasks that are always performed. The dashed lines indicate special-case tasks.

Figure A-1 Installing Sun HPC ClusterTools 3.0 Software at the Command Line

Graphic

Before Installation

Before installing Sun HPC ClusterTools 3.0 software, you need to ensure that the hardware and software that make up your cluster meet certain requirements. You must have already installed LSF 3.2.3. Further requirements are outlined in the Sun HPC ClusterTools 3.0 Installation Guide. Review them before proceeding with the instructions in this appendix.

If you are installing the software on a cluster of more than 16 nodes, you will probably want to use the CCM tools to make installation easier. You can use these tools to install on up to 16 nodes at a time. If you need to install software on a cluster of more than 16 nodes, you must install it first on a group of up to 16 nodes, then add more nodes by repeating the installation process on additional groups of up to 16 until you have installed the software on all the nodes in the cluster. For each group of nodes, you will need to create a separate configuration file, each with a unique file name, such as hpc_config1, hpc_config2, and so on.

The hpc_config File

Many aspects of the Sun HPC ClusterTools 3.0 installation process are controlled by a configuration file called hpc_config, which is similar to the lsf_config file used to install LSF 3.2.3.

Instructions for accessing and editing hpc_config are provided in "Accessing hpc_config" and "Editing hpc_config".

Accessing hpc_config

Use a text editor to edit the hpc_config file directly. This file must be located in a directory within a file system that is mounted read/write/execute accessible on all the other nodes in the cluster. A template for hpc_config is provided on the Sun HPC ClusterTools 3.0 distribution CD-ROM to simplify creation of this file.

Before starting the installation process, you should copy this template to a directory on the node chosen to be the installation platform and edit it so that it satisfies your site-specific installation requirements. Choose a node to function as the installation platform and a home directory on that node for hpc_config.


Note -

The directory containing hpc_config must be read/write/execute accessible (777 permissions) by all the nodes in the cluster.


The hpc_config template is located in

/cdrom/hpc_3_0_ct/Product/Install_Utilities/config_dir/hpc_config

To access hpc_config on the distribution CD-ROM, perform the following steps on the node chosen to be the installation platform:

  1. Mount the CD-ROM path on all the nodes in the cluster.

  2. Load the Sun HPC ClusterTools distribution CD-ROM in the CD-ROM drawer.

  3. Copy the configuration template onto the node.


    # cd config_dir_install
    # cp /cdrom/hpc_3_0_ct/Product/Install_Utilities/config_dir/hpc_config .

    config_dir_install is a variable representing the directory where the configuration files will reside; all cluster nodes must be able to read from and write to this directory.

  4. Edit the hpc_config file according to the instructions provided in the next section.

If You Have Already Installed the Software

If you have already installed the software, you can find a copy of the hpc_config template in the directory /opt/SUNWhpc/bin/Install_Utililites/config_dir.

If you are editing an existing hpc_config file after installing the software using the graphical installation tool, the hpc_config file created by the tool will not contain the comment lines included in the template.

Editing hpc_config

Example A-1 shows the basic hpc_config template, but without most of the comment lines provided in the online template. The template is simplified here to make it easier to read and because each section is discussed in detail following Example A-1. Two examples of edited hpc_config files follow the general description of the template.

The template comprises five sections:

For the purposes of initial installation, ignore the fifth section.

Supported Software Installation

LSF Support

You will be using the software with LSF, so enter yes here.


LSF_SUPPORT="yes"

Since you will be using LSF, complete only Part A of this section.

LSF Parameter Modification

Allowing the Sun HPC installation script to modify LSF parameters optimizes HPC job launches. Your choice for this variable must be yes or no.


MODIFY_LSF_PARAM="choice"

Name of the LSF Cluster

Before installing Sun HPC ClusterTools software, you must have installed LSF 3.2.3. When you installed the LSF software, you selected a name for the LSF cluster. Enter this name in the LSF_CLUSTER_NAME field.


LSF_CLUSTER_NAME="clustername"

General Installation Information

All installations must complete this section. If you are installing the software locally on a single-node cluster, you can stop after completing this section.

Type of Installation

Three types of installation are possible for Sun HPC ClusterTools 3.0 software:

Specify one of the installation types: nfs, smp-local, or cluster-local. There is no default type of installation.


INSTALL_CONFIG="config_choice"

Installation Location

The way the INSTALL_LOC path is used varies, depending on which type of installation you have chosen.

You must enter a full path name. The default location is /opt. The location must have set (or mounted, if this is an NFS installation) read/write (755) permission on all the nodes in the cluster.


INSTALL_LOC="/opt"

If you choose an installation directory other than the default /opt, a symbolic link is created from /opt/SUNWhpc to the chosen installation point.

CD-ROM Mount Point

Specify a mount point for the CD-ROM. This mount point must be mounted on (that is, NFS-accessible to) all the nodes in the cluster. The default mount point is /cdrom/hpc_3_0_ct. For example:


CD_MOUNT_PT="/cdrom/hpc_3_0_ct"

Information for NFS and Cluster-Local Installations

If you are installing the software either on an NFS server for remote mounting or locally on each node of a multinode cluster, you need to complete this section.

Installation Method Options

Specify either rsh or cluster-tool as the method for propagating the installation to all the nodes in the cluster.

Also note that this method requires that all nodes are trusted hosts--at least during the installation process.


INSTALL_METHOD="method"

Hardware Information

There are two ways to enter information in this section:

In each triplet, specify the host name of a node, followed by the host name of the terminal concentrator and the port ID on the terminal concentrator to which that node is connected. Separate the triplet fields with virgules (/). Use spaces between node triplets.

Every node in your Sun HPC cluster must also be in the corresponding LSF cluster. See the discussion of the lsf.cluster.clustername configuration file in the LSF Batch Administrator's Guide for information on LSF clusters.


Note -

If you will not be using the CCM tools, you can allow the installation script to derive the node list from the LSF configuration file lsf.cluster.clustername. To do this, either set the NODES variable to NULL or leave the line commented out. You must be installing from one of the nodes in the LSF cluster.


SCI Support

This section tells the script whether to install the SCI-related packages. If your cluster includes SCI, replace choice with yes; otherwise, replace it with no.


INSTALL_SCI="yes"

A yes entry causes the three SCI packages and two RSM packages to be installed in the /opt directory. A no causes the installation script to skip the SCI and RSM packages.


Note -

The SCI and RSM packages are installed locally on every node, not on an NFS server.


Information for NFS Installations Only

You need to complete this section only if you are installing the software on an NFS server.

NFS Server Host Name

The format for setting the NFS server host name is the same as for setting the host names for the nodes in the cluster. There are two ways to define the host name of the NFS server:

The NFS server can be one of the cluster nodes or it can be external (but connected) to the cluster. If the server will be part of the cluster--that is, will also be an execution host for the Sun HPC ClusterTools software--it must be included in the NODES field described in "Hardware Information". If the NFS server will not be part of the cluster, it must be available from all the hosts listed in NODES, but it should not be included in the NODES field.

Location of the Software on the Server

If you want to install the software on the NFS server in the same directory as the one specified in INSTALL_LOC, leave INSTALL_LOC_SERVER empty (""). If you prefer, you can override INSTALL_LOC by specifying an alternative directory in INSTALL_LOC_SERVER.


INSTALL_LOC_SERVER="directory"

Recall that the directory specified in INSTALL_LOC defines the mount point for INSTALL_LOC_SERVER on each NFS client.

Sample hpc_config Files

Example A-2 and Example A-3 illustrate the general descriptions in the preceding sections with edited hpc_config files representing two different types of installations.

Local Install - Example A-2 shows how the file would be edited for a local installation on every node in a cluster. The main characteristics of the installation illustrated by Example A-2 are summarized below:

For the purposes of initial installation, ignore the fifth section.

NFS Install - Example A-3 shows an hpc_config file for an NFS installation. The main features of this installation example are summarized below:

This example shows the nodes venice, napoli, and pisa all connected to the terminal concentrator rome via ports 5002, 5003, and 5004.

In this case, the NFS server is not one of the nodes in the Sun HPC cluster. All the nodes in the cluster must be able to communicate with it over a network.

For the purposes of initial installation, ignore the fifth section.

Run cluster_tool_setup


Note -

You can use the CCM tools to install on up to 16 nodes at a time. For clusters with more than 16 nodes, you will have to repeat the installation process on groups of up to 16 nodes at a time until you have installed the software on the entire cluster.


This step is optional. If you have chosen the cluster-tool method of installation and plan to use the CCM tools, you need to run the cluster_tool_setup script first. This loads the CCM administration tools onto a machine and creates a cluster configuration file that is used by CCM applications. See Appendix B for a description of the three CCM applications, cconsole, ctelnet, and crlogin.


Note -

cconsole requires the nodes to be connected to a terminal concentrator. The other two, ctelnet and crlogin, do not.


If you want to use cconsole to monitor messages generated while rebooting the cluster nodes, you will need to launch it from a machine outside the cluster. If you launch if from a cluster node, it will be disabled when the node from which it is launched reboots.

Perform the following steps, as root, to run cluster_tool_setup.

  1. Go to the Product directory on the Sun HPC ClusterTools 3.0 distribution CD-ROM.

    Note that this directory must be mounted on (accessible by) all nodes in the cluster.


    # cd /cdrom/hpc_3_0_ct/Product/Install_Utilities
    

  2. If you are running on a node within the cluster, perform Step a. If you are running on a machine outside the cluster, perform Step b.

    1. Within the cluster, run cluster_tool_setup -c.

      Run cluster_tool_setup; use the -c tag to specify the directory containing the hpc_config file.


      # ./cluster_tool_setup -c /config_dir_install
      

    2. Outside the cluster, run cluster_tool_setup -c -f.

      Run cluster_tool_setup; use the -c tag to specify the directory containing the hpc_config file, plus a trailing -f tag.


      # ./cluster_tool_setup -c /config_dir_install -f
      

  3. Set the DISPLAY environment variable to the machine on which you will be running the CCM tools.


    # setenv DISPLAY hostname:0
    

    (This example uses C-shell syntax.)

  4. Invoke the CCM tool of your choice: cconsole (if the nodes are connected to a terminal concentrator), ctelnet, or crlogin.

    All three tools reside in /opt/SUNWcluster/bin. For example,


    # /opt/SUNWcluster/bin/ctelnet clustername
    

    where clustername is the name of the LSF cluster. All three CCM tools require the name of the cluster as an argument.

    The CCM tool then creates a Common Window and separate Term Windows for all the nodes in the cluster.

  5. Position the cursor in the Common Window and press Return.

    This activates a prompt in each Term Window. Note that the Common Window does not echo keyboard entries. These appear only in the Term Windows.

You can now use CCM to remove previous release packages, as described in "Removing and Reinstalling Individual Packages"", or to install the software packages, as described in "Installing Software Packages"."

Installing Software Packages

This section describes the procedure for installing the Sun HPC ClusterTools packages. Note that the exact procedure for each step will depend on which installation mode you are in, cluster-tool or rsh.

See Appendix B for more information about the CCM tools that are available to you in cluster-tool mode.


Note -

The hpc_install command writes various SYNC files in the directory containing its configuration file as part of the package installation. If the installation process stops prematurely--if, for example, you press Ctrl-c--some SYNC files may be left. You must remove these files before executing hpc_install again so they don't interfere with the next software installation session.


  1. Log in to each node as root.

  2. Go to the Product directory on the Sun HPC ClusterTools 3.0 distribution CD-ROM.

    Note, this directory must be mounted on (accessible by) all nodes in the cluster.


    # cd /cdrom/hpc_3_0_ct/Product/Install_Utilities
    

  3. Run hpc_install.


    # ./hpc_install -c /config_dir_install
    

    where config_dir_install represents the directory containing the hpc_config file.

    The -c tag causes hpc_install to look for a file named hpc_config in the specified directory. If you want to install the software using a configuration file with a different name, you must specify a full path including the new file name after the -c tag.


    Note -

    If the hpc_config file contains an INSTALL_SCI="yes" entry, hpc_install will install the three SCI software packages along with the other Sun HPC ClusterTools packages. When the SCI packages are installed, the installation script will display a message telling you to reboot the nodes. Ignore this message. You must reboot the nodes only after any SCI interface cards are configured. If the system does not include SCI hardware, the nodes do not need to be rebooted.


Removing the Software

To remove LSF, see the documentation that came with the software.

The easiest way to remove Sun HPC ClusterTools 3.0 software is by using the configuration tool, install_gui. See the next section for details. If you prefer to remove the software at the command line, you may do so using the provided removal scripts. See "Removing the Software: Command Line" for instructions.

Removing the Software: Configuration Tool

  1. Locate a configuration file or files for the cluster.

    To remove the software from your cluster, you will need a configuration file that describes the cluster. Ideally you should use the configuration file you created when installing the software. If you cannot locate that file, you will have to create one. You can use the configuration tool to create the file. (See Chapter 3 of the Sun HPC ClusterTools 3.0 Installation Guide.)


    Note -

    The configuration tool will remove the software from up to 16 nodes at once. If you need to remove software from a cluster of more than 16 nodes, you must remove it first from a group of up to 16 of the nodes in your cluster. Then remove from more nodes by repeating the removal process on additional groups of nodes until you have removed the software from all the nodes in the cluster. The procedure is similar to installing the software on a cluster of more than 16 nodes. See Section 3.1.2 of the installation guide for more information.


  2. Load the Sun HPC ClusterTools 3.0 CD-ROM in the CD-ROM drawer.

    The CD-ROM mount point must be mounted on all the nodes in the cluster.

  3. Enable root login access.

    By default, most systems allow logins by root only on their console devices. To enable root login access during software removal, you must edit the /etc/default/login file on each node in the cluster. In this file on each node, find this line:


    CONSOLE=/dev/console

and make it into a comment by adding a # before it:


#CONSOLE=/dev/console

After removing the software, you should disable root login access again if your site's security guidelines require it.

  1. As root, launch the install_gui tool with the configuration file.

    You can load the configuration file either from the command line or from within the tool after it has been launched.

    • At the command line, launch the configuration tool using the name of the configuration file as an argument:


      # /cdrom/hpc_3_0_ct/Product/Install_Utilities/install_gui hpc_config
      

    • Alternatively, you can load the configuration file after launching the tool by choosing Load from the File menu.

  2. Select the Remove task and click on the Go button.

For help using the configuration tool, choose Help with Configuration Tool from the Help menu.

Removing the Software: Command Line

  1. Locate a configuration file or files for the cluster.

    To remove the software from your cluster, you will need a configuration file that describes the cluster. Ideally you should use the configuration file you created when installing the software. If you cannot locate that file, you will have to create one.


    Note -

    You can use the CCM tools to install on up to 16 nodes at a time. For clusters with more than 16 nodes, you will have to repeat the installation process on groups of up to 16 nodes at a time until you have installed the software on the entire cluster.


  2. Place the Sun HPC ClusterTools 3.0 distribution CD-ROM in the CD-ROM drive.

  3. Go to the directory on the CD-ROM containing the release packages.

    This directory must be mounted with read/execute permissions (755) on all the nodes in the cluster:


    # cd /cdrom/hpc_3_0_ct/Product/Install_Utilities/
    

  4. Run hpc_remove; use the -c option to specify the directory containing the hpc_config file.


    # ./hpc_remove -c /config_dir_install 
    

    The -c tag causes hpc_remove to look for a file named hpc_config in the specified directory. If you want to remove the software using a configuration file with a different name, you must specify a full path including the new file name after the -c tag.

Removing and Reinstalling Individual Packages

To remove a single package and install (or reinstall) another package in its place, perform the following steps:

#./hpc_remove -c hpc_config_file_path -d PACKAGE_NAME
#./hpc_install -c config_dir -d location_of_package/PACKAGE_NAME

For example:

# cd /cdrom/hpc_3_0_ct/Product
#./hpc_remove -c /home/hpc_admin -d SUNWhpmsc
#./hpc_install -c /home/hpc_admin -d /cdrom/hpc_2_0_sw/Product/SUNWhpmsc