This chapter describes the basic steps involved in getting a Sun HPC cluster ready for use. It also describes the procedure for shutting down the CRE daemons. The steps covered in this chapter include
Starting the CRE daemons - "Start the CRE Daemons"
Verifying system readiness - "Verify Basic Functionality"
Testing MPI communications - "Running Basic MPI Tests"
Shutting down the CRE - "Stopping and Restarting the CRE"
The chapter ends with a brief description of the CRE daemons.
If they are not already running, start the CRE daemons on the master node and then on all the other nodes in the cluster.
If you do not know which node in your cluster is the master node, look at the line MASTER_NODE="hostname" in the hpc_config file. This file was used in the ClusterTools installation process.
# /etc/init.d/sunhpc.cre_master start
On all the nodes in the cluster, start the CRE nodal daemons.
Note, if available, you may want to use one of the Cluster Console Manager (CCM) tools for this step. This would allow you to broadcast the command to all the nodes from a single entry. See Appendix A for instructions on using the CCM tools.
# /etc/init.d/sunhpc.cre_node start
Use the following procedure to test the cluster's ability to perform basic operations.
Run mpinfo -N to display information about the cluster nodes. This step requires /opt/SUNWhpc/bin to be in your path.
# mpinfo -N NAME UP PARTITION OS OSREL NCPU FMEM FSWP LOAD1 LOAD5 LOAD15 node1 y - SunOS 5.6 1 7.17 74.76 0.03 0.04 0.05 node2 y - SunOS 5.6 1 34.70 38.09 0.06 0.02 0.02 |
If any nodes are missing from the list or do not have a y entry in the UP column:
Verify that the license daemons are running.
Restart their nodal daemons as described on "Start the CRE Daemons".
You can create a cluster-wide default partition by running an initialization script named part_initialize on any node in the cluster. This will create a single partition named all, which will include all the nodes in the cluster as members.
Then, run mpinfo -N again to verify the successful creation of all. See Example 2-2 for an example of mpinfo -N output when the all partition is present.
# /opt/SUNWhpc/bin/part_initialization # mpinfo -N NAME UP PARTITION OS OSREL NCPU FMEM FSWP LOAD1 LOAD5 LOAD15 node1 y all SunOS 5.6 1 8.26 74.68 0.00 0.01 0.03 node2 y all SunOS 5.6 1 34.69 38.08 0.00 0.00 0.01 |
Verify that the CRE can launch jobs on the cluster. For example, use the mprun command to execute hostname on all the nodes in the cluster, as shown below:
# mprun -Ns -np 0 hostnamenode1 node2
mprun is the CRE command that launches message-passing jobs. The combination of -Ns and -np 0 ensures that the CRE will start one hostname process on each node. See the mprun man page for descriptions of -Ns, -np, and the other mprun options. In this example, the cluster contains two nodes, node1 and node2, each of which returns its host name.
Note that the CRE does not sort or rank the output of mprun by default, so host name ordering may vary from one run to another.
You can verify MPI communications by running a simple MPI program. To do so, you must have one of the supported compilers installed on your system. See for more information about supported compilers.
Two simple Sun MPI programs are available in /opt/SUNWhpc/examples/mpi:
connectivity.c - A C program that checks the connectivity among all processes and prints a message when it finishes
monte.f - A Fortran program in which each process participates in calculating an estimate of p using a Monte-Carlo method
See the Readme file in the same directory; it provides instructions for using the examples. The directory also contains the make file, Makefile. The full text of both code examples is also included in Chapter 3 of the Sun MPI 4.0 User's Guide.
If you want to shut down the entire cluster with the least risk to your file systems, use the Solaris shutdown command.
However, if you prefer to stop and restart the CRE without shutting down the entire cluster, the CRE supports a pair of scripts that simplify this process:
sunhpc.cre_master - Use this command to stop or start the CRE master daemons as described on "Start the CRE Daemons" and "Stopping and Restarting the CRE".
sunhpc.cre_node - Use this command to stop or start the CRE nodal daemons as described on "Start the CRE Daemons" and "Stopping and Restarting the CRE".
To shut down only the CRE daemons, execute the following commands as root:
Stop all the nodal daemons by executing the following on all the nodes.
Note that you can simplify this step by using one of the CCM tools (cconsole, ctelnet, or crlogin) to broadcast the following command entered on the master node to all the other nodes.
# /etc/init.d/sunhpc.cre_node stop
# /etc/init.d/sunhpc.cre_master stop
Be sure to stop the nodal daemons before stopping the master daemons. Otherwise, if you shut the master node down before shutting down the rest of the nodes, the CCM tools will not be available and you will have to shut down each node individually.
sunhpc.cre_node and sunhpc.cre_master behave differently when executing a stop command:
sunhpc.cre_master stops all processes that have been spawned by the CRE before killing the daemons.
sunhpc.cre_node does not kill user processes. Consequently, client daemons can be restarted without affecting running jobs.
Execute the following commands on the master node:
# /etc/init.d/sunhpc.cre_master start
# /etc/init.d/sunhpc.cre_node start
Always bring up the master daemons first (before the nodal daemons). Otherwise, the nodal daemons will not be initialized properly and the CRE will not work.
When the sunhpc.cre_master and sunhpc.cre_node programs are executed with start commands, they both initiate a stop command on all currently running daemons before restarting the CRE daemons.
The sections on "The Role of tm.rdb" through "The Role of tm.watchd" provide brief descriptions of the CRE master daemons: tm.rdb, tm.mpmd, and tm.watchd. The sections on "The Role of tm.omd" describe the nodal daemons, tm.omd, and tm.spmd.
tm.rdb is the resource database daemon. It runs on the master node and implements the resource database used by the other parts of the CRE. This database represents the state of the cluster and the jobs running on it.
When changes are made to the cluster configuration, the tm.rdb daemon must be restarted to update the database to reflect the new conditions. For example, if a node is added to a partition, tm.rdb must be restarted to implement these changes.
tm.mpmd is the master process-management daemon. It runs on the master node and services user (client) requests made via the mprun command. It also interacts with the resource database via calls to tm.rdb and coordinates the operations of the nodal client daemons.
tm.watchd is the cluster watcher daemon. It runs on the master node and monitors the states of cluster resources and jobs and, as necessary:
Marks individual nodes as online or offline by periodically executing remote procedure calls (RPCs) to all of the nodes.
Clears stale resource database (rdb) locks.
If the -Yk option has been enabled, aborts jobs that have processes on nodes determined to be down. This option is disabled by default.
tm.omd is the object-monitoring daemon. It runs on all the nodes in the cluster, including the master node, and continually updates the database with dynamic information concerning the nodes, most notably their load. It also initializes the database with static information about the nodes, such as their host names and network interfaces, when the CRE starts up.
tm.spmd is the slave process-management daemon.It runs on all the compute nodes of the cluster and, as necessary:
Handles spawning and termination of nodal processes per requests from the tm.mpmd.
In conjunction with mprun, handles multiplexing of stdio streams for nodal processes.
Interacts with the resource database via calls to tm.rdb.