This chapter explains the basic steps for starting up message-passing programs on a Sun HPC cluster using LSF Batch services. It covers the following topics:
Using parallel job queues.
Using bsub (overview).
Submitting an MPI job in batch mode.
Submitting an MPI job in interactive batch mode.
Using the Sun HPC option -sunhpc.
For information about developing, compiling, and linking Sun MPI programs, see the Sun MPI 4.0 Programming and Reference Manual.
Running parallel jobs with LSF Suite 3.2.3 is supported on up to 1024 processors and up to 64 nodes.
Distributed MPI jobs must be submitted via batch queues that have been configured to handle parallel jobs. This parallel capability is just one of the many characteristics that a system administrator can assign when setting up a batch queue.
You can use the command bqueues -l to find out which job queues support parallel jobs, as shown in Figure 2-1.
The bqueues -l output contains status information about all the queues currently defined. Look for a queue that includes the line:
JOB_STARTER: pam
which means it is able to handle parallel (distributed MPI) jobs. In the example shown in Figure 2-1, the queue hpc is defined in this way.
The pam entry may be followed by a -t or -v. The -t option suppresses printing of process status upon completion and -v specifies that the job is to run in verbose mode.
If no queues are currently configured for parallel job support, ask the system administator to set one or more up in this way.
Once you know the name of a queue that supports parallel jobs, submit your Sun MPI jobs explicitly to them. For example, the following command submits the job hpc-job to the queue named hpc for execution on four processes.
hpc-demo% bsub -q hpc -n 4 hpc-job
Additional examples are provided in "Submitting Jobs in Batch Mode" and "Submitting Interactive Batch Jobs".
To use LSF Batch commands, your PATH variable must include the directory where the LSF Base, Batch, and Parallel components were installed. The default installation directory is /opt/SUNWlsf/bin. Likewise, your PATH variable must include the ClusterTools software installation directory; the default location for ClusterTools components is /opt/SUNWhpc/bin.
The command for submitting Sun MPI jobs to the LSF Batch system is bsub, just as it is for submitting nonparallel batch jobs. The command syntax is essentially the same as well, except for an additional option, -sunhpc, which applies specifically to Sun MPI jobs. The bsub syntax for parallel jobs is
bsub [basic_options] [-sunhpc sunhpc_args] job
The basic_options entry refers to the set of standard bsub options that are described in the LSF Batch User's Guide. The -sunhpc option allows Sun HPC-specific arguments to be passed to the MPI job job.
"Submitting Jobs in Batch Mode" and "Submitting Interactive Batch Jobs" describe how to use bsub to submit jobs in batch and interactive batch modes, respectively. The -sunhpc option is discussed in "Using the -sunhpc Option".
Refer to the LSF Batch User's Guide for a full discussion of bsub and associated job-submission topics.
The simplest way to submit a Sun MPI job to the LSF Batch system is in batch mode. For example, the following command submits hpc-job to the queue named hpc in batch mode and requests that the job be distributed across four processors.
hpc-demo% bsub -q hpc -n 4 hpc-job
Batch-mode is enabled by default, but can be disabled by the system administrator via the INTERACTIVE parameter.
You can check to see if a queue is able to handle batch-mode jobs by running bqueues -l queue_name. Then look in the SCHEDULING POLICIES: section of the bqueues output for the following entries.
ONLY_INTERACTIVE - This entry means that batch mode is disabled; interactive and interactive batch modes are enabled.
NO_INTERACTIVE - This entry means batch mode is enabled; interactive and interactive batch modes are disabled.
No reference to INTERACTIVE - If there is no entry containing the term XXX_INTERACTIVE, all modes are enabled; this is the default condition.
The example queue shown in Figure 2-1; has a SCHEDULING POLICIES:setting of NO_INTERACTIVE, which allows batch-mode jobs, but not interactive batch.
As soon as hpc-job is submitted in batch mode, LSF Batch detaches it from the terminal session that submitted it.
If you request more processors than are available, you must use process wrapping to allow multiple processes to be mapped to each processor. Otherwise, LSF Batch will wait indefinitely for the number of resources to become available and the job will never launched. Process wrapping is discussed in "Specify the Number of Processes".
The interactive batch mode makes full use of the LSF Batch system's job scheduling policies and host selection facilities, but keeps the job attached to the terminal session that submitted it. This mode is well suited to Sun MPI jobs and other resource-intensive applications.
The following example submits hpc-job to the queue named hpc in interactive batch mode. As before, this example is based on the assumption that hpc is configured to support parallel jobs.
hpc-demo% bsub -I -q hpc -n 4 hpc-job
The -I option specifies interactive batch mode.
The queue must not have interactive mode disabled. To check this, run
hpc-demo% bqueues -l hpc
and check the SCHEDULING POLICIES: section of the resulting output. If it contains either
SCHEDULING POLICIES: ONLY_INTERACTIVE
or
SCHEDULING POLICIES:
(that is, no entry), interactive batch mode is enabled.
When the queue accepts the job, it returns a job ID. You can use the job ID later as an argument to various commands that enquire about job status or that control certain aspects of job state. For example, you can suspend a job or remove it from a queue with the bstop jobid and bkill jobid commands. These commands are described in Chapter 7 of the LSF Batch User's Guide.
LSF Suite version 3.2.3 supports the bsub command-line option -sunhpc, which gives users special control over Sun MPI jobs. As mentioned earlier, the -sunhpc option and its arguments must be the last option on the bsub command line:
bsub [basic_options] [-sunhpc sunhpc_args] job
"Redirect stderr" through "Spawn a Job in the Stopped State" describe the -sunhpc arguments.
Use the -e argument to redirect stderr to a file named file.Rn, where file is the user-supplied name of the output file. The Rn extension is supplied automatically and indicates the rank of the process producing the stderr output.
For example, to redirect stderr to files named boston.R0, boston.R1, and so forth, enter
hpc-demo% bsub -I -n 4 -q hpc -sunhpc -e boston hpc-job
Use the -o argument to redirect stdout to a file named file.Rn, where file is the user-supplied name of the output file. The Rn extension is supplied automatically and indicates the rank of the process producing the stdout output.
For example, to redirect stdout to files named boston.R0, boston.R1, and so forth, enter
hpc-demo% bsub -I -n 4 -q hpc -sunhpc -o boston hpc-job
Use the -j argument to specify the job ID of another job with which the new job should collocate.
For example, to cause job hpc-job to be collocated with a job whose job ID is 4622, enter
hpc-demo% bsub -I -n 4 -q hpc -sunhpc -j 4622 hpc-job
Use bjobs to find out the job ID of a job. See the LSF Batch User's Guide for details.
Use the -J argument to specify the name of another job with which the new job should collocate.
For example, to cause job hpc-job1 to be collocated with a job named hpc-job2, enter
hpc-demo% bsub -I -n 4 -q hpc -sunhpc -J hpc-job2 hpc-job1
Use the -n argument to specify the number of processes to run. This argument can be used in concert with the bsub -n argument to cause process wrapping to occur. Process wrapping is the term used to describe a technique for distributing multiple processes to fewer processors than there are processes. As a result, each processor has multiple processes, which are spawned in a cyclical, wrap-around, fashion.
For example, the following will distribute 48 processes across 16 processors, resulting in a 3-process wrap per processor.
hpc-demo% bsub -I -n 16 -q hpc -sunhpc -n 48 hpc-job
If you specify a range of processors rather than a single quantity and a larger number of processes, the process wrapping ratio (number of processes per to processor) will depend on the number of processors that are actually allocated.
For example, the following will distribute 48 processes across at least 8 processors and possibly as many as 16.
hpc-demo% bsub -I -n 8,16 -q hpc -sunhpc -n 48 hpc-job
Consequently, the process-to-processor wrapping ratio may be as high as 6:1 (48 processes across 8 processors) or as low as 3:1 (48 processes across 16 processors).
Use the -s argument to cause a job to be spawned in the STOPPED state. It does this by setting the stop-on-exec flag for the spawned process. This feature can be of value in a program monitoring or debugging tool as a way of gaining control over a parallel program. See the proc(4) man page for details.
Do not use the -s argument with the Prism debugger. It would add nothing to Prism's capabilities and would be likely to interfere with Prism's control over the debugging session.
The following example shows the -s argument being used to spawn an interactive batch job in the STOPPED state.
hpc-demo% bsub -I -n 1 -q hpc -sunhpc -s hpc-job
To identify processes in the STOPPED state, issue the ps command with the -el argument:
hpc-demo% ps -el F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 19 T 0 0 0 0 0 SY f0274e38 0 ? 0:00 sched
Here, the sched command is in the STOPPED state, as indicated by the T entry in the S (State) column.
Note that, when spawning a process in the STOPPED state, the program's name does not appear in the ps output. Instead, the stopped process is identified as a RES daemon.
Use the -t argument to cause all output to be tagged with its MPI rank.
The -t argument cannot be used when output is redirected by the -e or -o options to -sunhpc.
For example, the following adds a rank-indicator prefix to each line of output.