C H A P T E R  5

Running Programs With mpirun in Distributed Resource Management Systems

This chapter describes the options to the mpirun command that are used for distributed resource management, and provides instructions for each resource manager. It contains the following sections:


mpirun Options for Third-Party Resource Manager Integration

ORTE is compatible with a number of other launchers, including rsh/ssh, Sun Grid Engine, and PBS.



Note - Open MPI itself supports other third-party launchers supported by Open MPI, such as SLURM and Torque. However, these launchers are currently not supported in Sun HPC ClusterTools software. To use these other third-party launchers, you must download the Open MPI source, compile, and link with the libraries for the launchers.


Checking Your Open MPI Configuration

To see whether your Open MPI installation has been configured for use with the third-party resource manager you want to use, issue the ompi_info command and pipe the output to grep. The following examples show how to use ompi_info to check for the desired third-party resource manager.


procedure icon  To Check for rsh/ssh

To see whether your Open MPI installation has been configured to use the rsh/ssh launcher:


% ompi_info | grep rsh
MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2)


procedure icon  To Check for PBS/Torque

To see whether your Open MPI installation has been configured to use the PBS/Torque launcher:


% ompi_info | grep tm
MCA ras: tm (MCA v1.0, API v1.3, Component v1.2)
MCA pls: tm (MCA v1.0, API v1.3, Component v1.2)


procedure icon  To Check for Sun Grid Engine

To see whether your Open MPI installation has been configured to use Sun Grid Engine:


% ompi_info | grep gridengine
MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2)
MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2)


Running Parallel Jobs in the PBS Environment

If your Open MPI environment is set up to include PBS, Open MPI automatically detects when mpirun is running within PBS, and will execute properly.

First reserve the number of resources by invoking the qsub command with the
-l option. The -l option specifies the number of nodes and the number of processes per node. For example, this command sequence reserves four nodes with four processes per node for the job myjob.sh:


% qsub -l nodes=4:ppn=4 myjob.sh

When you enter the PBS environment, you can launch an individual job or a series of jobs with mpirun. The mpirun command launches the job using the nodes and processes information from PBS.The resource information is accessed using the tm calls provided by PBS; hence, tm is the name used to identify the module in ORTE. The job ranks are children of PBS, not ORTE.

You can run an ORTE job within the PBS environment in two different ways: interactive and scripted.


procedure icon  To Run an Interactive Job in PBS

1. Enter the PBS environment interactively with the -I option to qsub, and use the
-l option to reserve resources for the job.

Here is an example.


% qsub -l nodes=2:ppn=2 -I 

The command sequence shown above enters the PBS environment and reserves one node called mynode with two processes for the job. Here is the output:


qsub: waiting for job 20.mynode to start
qsub: job 20.mynode ready
Sun Microsystems Inc. SunOS 5.10 Generic January 2005
pbs%

2. Launch the mpirun command.

Here is an example that launches the hostname command with a verbose output:


pbs% /opt/SUNWhpc/HPC7.1/bin/mpirun -np 4 -mca pls_tm_verbose 1 hostname

The output shows the hostname program being run on ranks r0 and r1:


% /opt/SUNWhpc/HPC7.1/bin/mpirun -np 4 -mca pls_tm_verbose 1 hostname
[hostname1:09064] pls:tm: launching on node mynode1
[hostname2:09064] pls:tm: launching on node mynode2
hostname2
hostname1
hostname2
hostname1 

The following example shows the debugging output specific to the tm module using the MCA parameter pls_tm_debug.


% cd /opt/SUNWhpc/HPC7.1/bin
% ./mpirun -np 4 -mca pls_tm_debug 1 hostname
[mynode:09074] pls:tm: final top-level argv:
[mynode:09074] pls:tm:     orted --no-daemonize --bootproxy 1 --name  --num_procs 3 --vpid_start 0 --nodename  --universe joeuser@mynode:default-universe-9074 --nsreplica "0.0.0;tcp://10.8.30.127:48225" --gprreplica "0.0.0;tcp://10.8.30.127:48225"
[mynode:09074] pls:tm: resetting PATH: /opt/SUNWhpc/HPC7.1/bin:/bin:/usr/etc:/usr/bin:/usr/sbin:/bin:/home/joeuser/bin:/workspace/joeuser/vnc-4_1_2sparc_solaris/:/ws/ompitools/SUNWspro/SOS10/bin:/ws/ompitools/bin:/hpc/tools/DET10/sparc/SUNWspro/bin:/usr/dist/share/sunstudio_sparc/SUNWspro/bin:/ws/on10tools-toolserver/SUNWspro/SOS8/bin:/ws/on10-tools/toolserver/teamware/bin:
/hpc/tools/localsparc/bin:/opt/SUNWut/bin:/usr/local/bin:/usr/ccs/bin:/usr/openwin/bin:/usr/dt/bin:/usr/dist/exe:/usr/dist/sun4/bin:/usr/dist/local/bos/exe:/usr/dist/local/exe:/usr/openwin/bin:/usr/sfw/bin:/hpc/sqa/bin:/pkg/isv/bin:/pkg/local/bin:/pkg/gnu/bin:/pkg/mail/bin:/tools/sparc/bin:/dhpg/bin:/ws/sms1tools/SUNWspro/SC5.0/bin:/usr/ucb:.
[mynode:09074] pls:tm: found /opt/SUNWhpc/HPC7.1/bin/orted
[mynode:09074] pls:tm: launching on node mynode
[mynode:09074] pls:tm: executing: orted --no-daemonize --bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0 --nodename mynode --universe joeuser@mynode:default-universe-9074 --nsreplica "0.0.0;tcp://210.8.30.127:48225" --gprreplica "0.0.0;tcp://210.8.30.127:48225"
[mynode:09074] pls:tm:launch: resolved host mynode to node ID 0
[mynode:09074] pls:tm: launching on node mynode2
[mynode:09074] pls:tm: executing: orted --no-daemonize --bootproxy 1 --name 0.0.2 --num_procs 3 --vpid_start 0 --nodename mynode2 --universe joeuser@mynode:default-universe-9074 --nsreplica "0.0.0;tcp://210.8.30.127:48225" --gprreplica "0.0.0;tcp://210.8.30.127:48225"
[mynode:09074] pls:tm:launch: resolved host mynode2 to node ID 2
[mynode:09074] pls:tm:launch: finished spawning orteds
[mynode:09074] pls:tm:launch: finished
mynode2
mynode
mynode2
mynode 


procedure icon  To Run a Batch Job in PBS

1. Write a script that calls mpirun.

In the following examples, the script is called myjob.csh. The system is called mynode. Here is an example of the script.


#!/bin/csh
 
/opt/SUNWhpc/HPC7.1/bin/mpirun -np 2 -mca pls_tm_verbose 1 hostname 

2. Enter the PBS environment and use the -l option to qsub to reserve resources for the job.

Here is an example of how to use the -l option with the qsub command.


% qsub -l nodes=2:ppn=2 myjob.csh 

This command enters the PBS environment and reserves one node with two processes for the job that will be launched by the script named myjob.csh.

Here is the output to the script myjob.csh.


% more myjob.csh.*
::::::::::::::
myjob.csh.e2365
::::::::::::::
::::::::::::::
myjob.csh.o2365
::::::::::::::
Warning: no access to tty (Bad file number).
Thus no job control in this shell.
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
hostname5
hostname4
hostname5
hostname4

After the job finishes, it generates two output files:

As you can see, the pbsrun command calls mpirun, which forks into two calls of the hostname program, one for each node.


Running Parallel Jobs in the Sun Grid Engine Environment

Sun Grid Engine 6 is the supported version of Sun Grid Engine in Sun HPC ClusterTools 7.1.

Before you can run parallel jobs, make sure that you have defined the parallel environment and queue before running the job.

Defining Parallel Environment (PE) and Queue

A PE needs to be defined for all the queues in the Sun Grid Engine cluster to be used as ORTE nodes. Each ORTE node should be installed as a Sun Grid Engine execution host. To allow the ORTE to submit a job from any ORTE node, configure each ORTE node as a submit host in Sun Grid Engine.

Each execution host must be configured with a default queue. In addition, the default queue set must have the same number of slots as the number of processors on the hosts.


procedure icon  To Use PE Commands

single-step bullet  To display a list of available PEs (parallel environments), type the following:


% qconf -spl
make

single-step bullet  To define a new PE, you must have Sun Grid Engine manager or operator privileges. Use a text editor to modify a template for the PE. The following example creates a PE named orte.


% qconf -ap orte

single-step bullet  To modify an existing PE, use this command to invoke the default editor:


% qconf -mp orte

single-step bullet  To show a particular PE that has been defined, type this command:


% qconf -sp orte
pe_name           orte
slots             8
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

The value NONE in user_lists and xuser_lists mean enable everybody and exclude nobody.

The value of control_slaves must be TRUE; otherwise, qrsh exits with an error message.

The value of job_is_first_task must be FALSE or the job launcher consumes a slot. In other words, mpirun itself will count as one of the slots and the job will fail, because only n-1 processes will start.


procedure icon  To Use Queue Commands

single-step bullet  To show all the defined queues, type the following command:


% qconf -spl
all.q

The queue all.q is set up by default in Sun Grid Engine.

single-step bullet  To configure the orte PE from the example in the previous section to the existing queue, type the following:


% qconf -mattr queue pe_list "orte" all.q

You must have Sun Grid Engine manager or operator privileges to use this command.

Submitting Jobs Under Sun Grid Engine Integration

There are two ways to submit jobs under Sun Grid Engine integration: interactive mode and batch mode. The instructions in this section describe how to submit jobs in batch mode. For information about how to use interactive mode, see Chapter 4.


procedure icon  To Set the Interactive Display

Before you submit a job, you must have your DISPLAY environment variable set so that the interactive window will appear on your desktop, if you have not already done so.

For example, if you are working in the C shell, type the following command:


% setenv DISPLAY desktop:0.0


procedure icon  To Submit Jobs in Batch Mode

1. Create the script. In this example, mpirun is embedded within a script to qsub.


mynode4% cat SGE.csh
#!/usr/bin/csh
 
# set PATH: including location of MPI program to be run
setenv PATH /opt/SUNWhpc/examples/connectivity:${PATH}
 
mpirun -np 4 -mca pls_gridengine_debug 100 connectivity.sparc -v 



Note - The --mca pls_gridengine_debug 100 setting is used in this example only to show that Sun SGE is being used. This would not be needed for normal operation.


2. Next, source the Sun Grid Engine environment variables from a settings.csh file where $SGE_ROOT is set to /opt/sge:.


% source $SGE_ROOT/default/common/settings.csh

3. To start the batch (or scripted) job, specify the parallel environment, slot number and the user executable.


% qsub -pe orte 2 sge.csh 
your job 305 (“sge.csh") has been submitted

Since this is submitted as a batch job, you would not expect to see output at the terminal. If no indication is given for where the output should go, Sun Grid Engine redirects to your home directory and creates <job_name>.o<job_number>.



Note - Before you can use the parallel environment, make sure that you have set it up before running the job. See Defining Parallel Environment (PE) and Queue for more information.


The job creates the output files. The file name with the format name_of_job.ojob_id contains the standard output. The file name with the format name_of_job.ejob_id contains the error output. If the job executes normally, the error output files will be empty.

The following example lists the files produced by a job called sge.csh with the job ID number 866:


% ls -rlt ~ | tail
-rw-r--r--   1 joeuser   mygroup       0 Jan 16 16:42 sge.csh.po866
-rw-r--r--   1 joeuser   mygroup       0 Jan 16 16:42 sge.csh.pe866
-rw-r--r--   1 joeuser   mygroup       0 Jan 16 16:42 sge.csh.e866
-rw-r--r--   1 joeuser   mygroup     194 Jan 16 16:42 sge.csh.o866

By default, the output files are located in your home directory, but you can use Sun Grid Engine software to change the location of the files, if desired.


procedure icon  To See a Running Job

single-step bullet  Type the following command:


% qstat -f


procedure icon  To Delete a Running Job

single-step bullet  Type the following command:


% qdel job-number

where job-number is the number of the job you want to delete.

For more information about Sun Grid Engine commands, refer to the Sun Grid Engine documentation.


For More Information

For more information about using the mpirun command to perform batch processing, see the following: