Sun HPC ClusterTools 3.0 Administrator's Guide: With CRE

Chapter 2 Getting Started

This chapter describes the basic steps involved in getting a Sun HPC cluster ready for use. It also describes the procedure for shutting down the CRE daemons. The steps covered in this chapter include

Starting the CRE daemons - "Start the CRE Daemons"

Verifying system readiness - "Verify Basic Functionality"

Testing MPI communications - "Running Basic MPI Tests"

Shutting down the CRE - "Stopping and Restarting the CRE"

The chapter ends with a brief description of the CRE daemons.

Start the CRE Daemons

If they are not already running, start the CRE daemons on the master node and then on all the other nodes in the cluster.

If you do not know which node in your cluster is the master node, look at the line MASTER_NODE="hostname" in the hpc_config file. This file was used in the ClusterTools installation process.

On the master node, start the CRE master daemons as root.

# /etc/init.d/sunhpc.cre_master start

On all the nodes in the cluster, start the CRE nodal daemons.

Note, if available, you may want to use one of the Cluster Console Manager (CCM) tools for this step. This would allow you to broadcast the command to all the nodes from a single entry. See Appendix A for instructions on using the CCM tools.

# /etc/init.d/sunhpc.cre_node start

Verify Basic Functionality

Use the following procedure to test the cluster's ability to perform basic operations.

Run `mpinfo`

Run mpinfo -N to display information about the cluster nodes. This step requires /opt/SUNWhpc/bin to be in your path.

Example 2-1 Sample mpinfo -N Output for a Two-Node System

# mpinfo -N
NAME   UP  PARTITION  OS     OSREL  NCPU  FMEM   FSWP   LOAD1   LOAD5   LOAD15
node1 y   -          SunOS  5.6    1     7.17   74.76  0.03    0.04    0.05
node2 y   -          SunOS  5.6    1     34.70  38.09  0.06    0.02    0.02

If any nodes are missing from the list or do not have a y entry in the UP column:

Verify that the license daemons are running.

Restart their nodal daemons as described on "Start the CRE Daemons".

Create a Default Partition

You can create a cluster-wide default partition by running an initialization script named part_initialize on any node in the cluster. This will create a single partition named all, which will include all the nodes in the cluster as members.

Then, run mpinfo -N again to verify the successful creation of all. See Example 2-2 for an example of mpinfo -N output when the all partition is present.

Example 2-2 mpinfo -N Output for the Sample Partition all

# /opt/SUNWhpc/bin/part_initialization
# mpinfo -N
NAME     UP  PARTITION  OS     OSREL  NCPU  FMEM   FSWP   LOAD1  LOAD5  LOAD15
node1    y   all        SunOS  5.6    1     8.26   74.68  0.00   0.01   0.03
node2    y   all        SunOS  5.6    1     34.69  38.08  0.00   0.00   0.01

Verify That CRE Executes Jobs

Verify that the CRE can launch jobs on the cluster. For example, use the mprun command to execute hostname on all the nodes in the cluster, as shown below:

# mprun -Ns -np 0 hostnamenode1
node2

mprun is the CRE command that launches message-passing jobs. The combination of -Ns and -np 0 ensures that the CRE will start one hostname process on each node. See the mprun man page for descriptions of -Ns, -np, and the other mprun options. In this example, the cluster contains two nodes, node1 and node2, each of which returns its host name.

Note -

Note that the CRE does not sort or rank the output of mprun by default, so host name ordering may vary from one run to another.

Running Basic MPI Tests

Verify MPI Communications

You can verify MPI communications by running a simple MPI program. To do so, you must have one of the supported compilers installed on your system. See for more information about supported compilers.

Two simple Sun MPI programs are available in /opt/SUNWhpc/examples/mpi:

connectivity.c - A C program that checks the connectivity among all processes and prints a message when it finishes

monte.f - A Fortran program in which each process participates in calculating an estimate of p using a Monte-Carlo method

See the Readme file in the same directory; it provides instructions for using the examples. The directory also contains the make file, Makefile. The full text of both code examples is also included in Chapter 3 of the Sun MPI 4.0 User's Guide.

Stopping and Restarting the CRE

If you want to shut down the entire cluster with the least risk to your file systems, use the Solaris shutdown command.

However, if you prefer to stop and restart the CRE without shutting down the entire cluster, the CRE supports a pair of scripts that simplify this process:

sunhpc.cre_master - Use this command to stop or start the CRE master daemons as described on "Start the CRE Daemons" and "Stopping and Restarting the CRE".

sunhpc.cre_node - Use this command to stop or start the CRE nodal daemons as described on "Start the CRE Daemons" and "Stopping and Restarting the CRE".

To Shut Down the CRE Without Shutting Down Solaris

To shut down only the CRE daemons, execute the following commands as root:

Stop all the nodal daemons by executing the following on all the nodes.

Note that you can simplify this step by using one of the CCM tools (cconsole, ctelnet, or crlogin) to broadcast the following command entered on the master node to all the other nodes.

# /etc/init.d/sunhpc.cre_node stop

Stop the master daemons by executing the following on the master node.

# /etc/init.d/sunhpc.cre_master stop

Be sure to stop the nodal daemons before stopping the master daemons. Otherwise, if you shut the master node down before shutting down the rest of the nodes, the CCM tools will not be available and you will have to shut down each node individually.

sunhpc.cre_node and sunhpc.cre_master behave differently when executing a stop command:

sunhpc.cre_master stops all processes that have been spawned by the CRE before killing the daemons.

sunhpc.cre_node does not kill user processes. Consequently, client daemons can be restarted without affecting running jobs.

To Restart the CRE Without Rebooting Solaris

Execute the following commands on the master node:

Start the master daemons.

# /etc/init.d/sunhpc.cre_master start

Start the nodal daemons.

# /etc/init.d/sunhpc.cre_node start

Note -

Always bring up the master daemons first (before the nodal daemons). Otherwise, the nodal daemons will not be initialized properly and the CRE will not work.

When the sunhpc.cre_master and sunhpc.cre_node programs are executed with start commands, they both initiate a stop command on all currently running daemons before restarting the CRE daemons.

Overview of the CRE Daemons

The sections on "The Role of tm.rdb" through "The Role of tm.watchd" provide brief descriptions of the CRE master daemons: tm.rdb, tm.mpmd, and tm.watchd. The sections on "The Role of tm.omd" describe the nodal daemons, tm.omd, and tm.spmd.

The Role of `tm.rdb`

tm.rdb is the resource database daemon. It runs on the master node and implements the resource database used by the other parts of the CRE. This database represents the state of the cluster and the jobs running on it.

When changes are made to the cluster configuration, the tm.rdb daemon must be restarted to update the database to reflect the new conditions. For example, if a node is added to a partition, tm.rdb must be restarted to implement these changes.

The Role of `tm.mpmd`

tm.mpmd is the master process-management daemon. It runs on the master node and services user (client) requests made via the mprun command. It also interacts with the resource database via calls to tm.rdb and coordinates the operations of the nodal client daemons.

The Role of `tm.watchd`

tm.watchd is the cluster watcher daemon. It runs on the master node and monitors the states of cluster resources and jobs and, as necessary:

Marks individual nodes as online or offline by periodically executing remote procedure calls (RPCs) to all of the nodes.

Clears stale resource database (rdb) locks.

If the -Yk option has been enabled, aborts jobs that have processes on nodes determined to be down. This option is disabled by default.

The Role of `tm.omd`

tm.omd is the object-monitoring daemon. It runs on all the nodes in the cluster, including the master node, and continually updates the database with dynamic information concerning the nodes, most notably their load. It also initializes the database with static information about the nodes, such as their host names and network interfaces, when the CRE starts up.

The Role of `tm.spmd`

tm.spmd is the slave process-management daemon.It runs on all the compute nodes of the cluster and, as necessary:

Handles spawning and termination of nodal processes per requests from the tm.mpmd.

In conjunction with mprun, handles multiplexing of stdio streams for nodal processes.

Interacts with the resource database via calls to tm.rdb.