Running Programs With the mpirun Command

C H A P T E R 5

Running Programs With the mpirun Command

This chapter describes the general syntax of the mpirun command and lists the command’s options. This chapter also shows some of the tasks you can perform with the mpirun command. It contains the following sections:

About the mpirun Command

Syntax for the mpirun Command

mpirun Command Examples

Mapping MPI Processes to Nodes

Controlling Input/Output

Controlling Other Job Attributes

Submitting Jobs Under Sun Grid Engine Integration

Using MPI Client/Server Applications

mpirun Command Reference

For More Information

Note - The mpirun, mpiexec, and orterun commands all perform the same function, and they can be used interchangeably. The examples in this manual all use the mpirun command.

About the `mpirun` Command

The mpirun command controls several aspects of program execution in Open MPI. mpirun uses the Open Run-Time Environment (ORTE) to launch jobs. If you are running under distributed resource manager software, such as Sun Grid Engine or PBS, ORTE launches the resource manager for you.

If you are using rsh/ssh instead of a resource manager, you must use a hostfile or host list to identify the hosts on which the program will be run. When you issue the mpirun command, you specify the name of the hostfile or host list on the command line; otherwise, mpirun executes all the copies of the program on the local host, in round-robin sequence by CPU slot. For more information about hostfiles and their syntax, see Specifying Hosts By Using a Hostfile.

Both MPI programs and non-MPI programs can use mpirun to launch the user processes.

Some example programs are provided in the /opt/SUNWhpc/HPC8.1/examples directory for you to try to compile/run as sanity tests.

Syntax for the `mpirun` Command

The following example shows the general single-process syntax for mpirun:

% mpirun [options] [program-name]

For a simple SPMD (Single Process, Multiple Data) job, the typical syntax is:

% mpirun -np x program-name

For jobs involving multiple instructions, the command syntax appears similar to the following:

% mpirun [options] [program-name] : [options2] [program-name2] ...

For an MPMD (Multiple Program, Multiple Data) parallel application, the syntax follows this form:

% mpirun -np x program1 : -np y program2

This command starts x number of copies of the program program1, and then starts y copies of the program program2.

`mpirun` Options

The options control the behavior of the mpirun command. They might or might not be followed by arguments.

Caution - If you do not specify an argument for an option that expects to be followed by an argument (for example, the --app<filename> option), that option will read the next option on the command line as an argument. This might result in inconsistent behavior.

“Invalid Cross-Reference Format” lists the options in alphabetical order, with a brief description of each.

Using Environment Variables With the `mpirun` Command

Use the -x args option (where args is the environment variable(s) you want to use) to specify any environment variable you want to pass during runtime. The -x option exports the variable specified in args and sets the value for args from the current environment. For example:

% mpirun -x LD_LIBRARY_PATH=/opt/SUNWhpc/HPC8.1/lib -np 4 a.out

Using MCA Parameters With the `mpirun` Command

The mpirun command uses MCA (Multiple Component Architecture) parameters to pass environment variables. To specify an MCA parameter, use the -mca option with the mpirun command, and then specify the parameter type, the parameter you want to pass as an environment variable, and the value you want to set. For example:

% mpirun --mca mpi_show_handle_leaks 1 -np 4 a.out

This sets the MCA parameter mpi_show_handle_leaks to the value of 1 before running the program named a.out with four processes. In general, the format used on the command line is --mca parameter_name value.

Note - There are multiple ways to specify the values of MCA parameters. This chapter discusses how to use them from the command line with the mpirun command. MCA parameters are discussed in more detail in Chapter 7.

Canceling Send and Receive Operations

Open MPI supports the canceling of receive operations. However, the canceling of sends is not supported; therefore, a send will never be successfully canceled.

For more information about canceling send and receive operations, see the MPI_Cancel(3) man page.

`mpirun` Command Examples

The examples in this section show how to use the mpirun command options to specify how and where the processes and programs run.

The following table shows the process control options for the mpirun command. The procedures that follow the table explain how these options are used and show the syntax for each.

TABLE 5-1 Program/Process Control Options
Task	mpirun option	Page Number (For More Information)
To run a program with default settings	(no need to specify an option)	18
To run multiple parallel processes	`-c` or`-np` <number of processes>	19
To display command help	`-h` or `--help`	27
To change the working directory	`-wdir` or `--wdir` <directory>	26
To specify the list of hosts on which to invoke processes (also known as the rankmap string)	`-host` or `--host` or `-H`	22
To specify the list of hosts on which to execute the program (also known as the rankmap file)	`-hostfile` <filename> or `--hostfile` <filename> or `-machinefile` <filename> or `--machinefile` <filename>	21
To start up in debugging mode	-d or --debug or -debugger or --debugger <sequence>
To specify verbose output	`-v`	27
To specify multiple executables	`-np 2 exe1 : -np 6 exe2`

To Run a Program With Default Settings

To run the program with default settings, enter the command and program name, followed by any required arguments to the program:

% mpirun program-name

To Run Multiple Processes

By default, an MPI program started with mpirun runs as one process.

To run the program as multiple processes, use the -np option:

% mpirun -np process-count program-name

When you request multiple processes, ORTE attempts to start the number of processes you request, regardless of the number of CPUs available to run those processes. For more information, see Oversubscribing Nodes.

To Direct `mpirun` By Using an Appfile

You can use a type of text file (called an appfile) to direct mpirun. The appfile specifies the nodes on which to run, the number of processes to launch on each node, and the programs to execute in a parallel application. When you use the
--app option, mpirun takes all its direction from the contents of the appfile and ignores any other nodes or processes specified on the command line.

For example the following shows an appfile called my_appfile:

# Comments are supported; comments begin with #
# Application context files specify each sub-application in the
# parallel job, one per line. The first sub-application is the 2
# a.out processes:
-np 2 a.out
# The second sub-application is the 2 b.out processes:
-np 2 b.out

To use the --app option with the mpirun command, specify the name and path of the appfile on the command line. For example:

% mpirun --app my_appfile

This command produces the same results as running a.out and b.out from the command line.

Mapping MPI Processes to Nodes

When you issue the mpirun command from the command line, ORTE reads the number of processes to be launched from the -np option, and then determines where the processes will run.

To determine where the processes will run, ORTE uses the following criteria:

Available hosts (also referred to as nodes), specified by a hostfile or by the
--host option

Scheduling policy (round-robin or by-slot)

Default and maximum numbers of slots available on each host

ORTE also checks to see whether the current environment/shell runs with any third-party launcher (such as Sun Grid Engine or PBS) to determine where the processes will launch.

Specifying Available Hosts

You specify the available hosts to Open MPI in three ways:

Through the batch scheduler in your resource management software. This option is described in detail in Chapter 6.

By using a hostfile with the --hostfile option. The hostfile is a text file that contains the names of hosts, the number of available slots on each host, and the maximum slots on each host.

By using the --host option. Use this option to specify which hosts to include or exclude.

Specifying Hosts By Using a Hostfile

The hostfile lists each node, the available number of slots, and the maximum number of slots on that node. For example, the following listing shows a simple hostfile:

node0 
node1 slots=2 
node2 slots=4 max_slots=4
node3 slots=4 max_slots=20

In this example file, node0 is a single-processor machine. node1 has two slots. node2 and node3 both have 4 slots, but the values of slots and max_slots are the same (4) on node2. This disallows the processors on node2 from being oversubscribed. The four slots on node3 can be oversubscribed, up to a maximum of 20 processes.

When you use this hostfile with the --nooversubscribe option (see Oversubscribing Nodes), mpirun assumes that the value of max_slots for each node in the hostfile is the same as the value of slots for each node. It overrides the values for max_slots set in the hostfile.

Open MPI assumes that the maximum number of slots you can specify is equal to infinity, unless explicitly specified. Resource managers also do not specify the maximum number of available slots.

Note - Open MPI includes a commented default hostfile at /opt/SUNWhpc/HPC8.1/etc/openmpi-default-hostfile. Unless you specify a different hostfile at a different location, this is the hostfile that OpenMPI uses. It is empty by default, but you may edit this file to add your list of nodes. See the comments in the hostfile for more information.

Specifying Hosts By Using the `--host` Option

You can use the --host option to mpirun to specify the hosts you want to use on the command line in a comma-delimited list. For example, the following command directs mpirun to run a program called a.out on hosts a, b, and c:

% mpirun -np 3 --host a,b,c a.out

Open MPI assumes that the default number of slots on each host is one, unless you explicitly specify otherwise.

To Specify Multiple Slots Using the --host Option

To specify multiple slots with the -host option for each host repeat the host name on the command line for each slot you want to use. For example:

% mpirun -host node1,node1,node2,node2 ...

If you are using a resource manager such as Sun Grid Engine or PBS, the resource manager maintains an accurate count of available slots.

Excluding Hosts From Scheduling By Using the `--host` Option

You can also use the --host option in conjunction with a hostfile to exclude any nodes not explicitly specified on the command line. For example, assume that you have the following hostfile called my_hosts:

a slots=2 max_slots=20
b slots=2 max_slots=20
c slots=2 max_slots=20
d slots=2 max_slots=20

Suppose you issue the following command to run program a.out:

% mpirun -np 1 --hostfile my_hosts --host c a.out

This command launches one instance of a.out on host c, but excludes the other hosts in the hostfile (a, b, and d).

Note - If you use these two options (--hostfile and --host) together, make sure that the host(s) you specify using the --host option also exist in the hostfile. Otherwise, mpirun exits with an error.

Oversubscribing Nodes

If you schedule more processes to run than there are available slots, this is referred to as oversubscribing. Oversubscribing a host is not suggested, as it might result in performance degradation.

mpirun has a --nooversubscribe option. This option implicitly sets the max_slots value (maximum number of available slots) to the same value as the slots value for each node, as specified in your hostfile. If the number of processes requested is greater than the slots value, mpirun returns an error and does not execute the command. This option overrides the value set for max_slots in your hostfile.

For more information about oversubscribing, see the following URL:

http://www.open-mpi.org/faq/?category=running#oversubscribing

Scheduling Policies

ORTE uses two types of scheduling policies when it determines where processes will run:

By slot (default). This scheme schedules processes to run on each successive slot on one host. When all those slots are filled, scheduling begins on the next host in the hostfile.

By node. In this scheme, Open MPI schedules the processes by finding the first available slot on a host, then the first available slot on the next host in the hostfile, and so on, in a round-robin fashion.

Scheduling By Slot

This is the default scheduling policy for Open MPI. If you do not specify a scheduling policy, this is the policy that is used.

In by-slot scheduling, Open MPI schedules processes on a node until all of its available slots are exhausted (that is, all slots are running processes) before proceeding to the next node. In MPI terms, this means that Open MPI tries to maximize the number of adjacent ranks in MPI_COMM_WORLD on the same host without oversubscribing that host.

To Specify By-Slot Scheduling

If you want to explicitly specify by-slot scheduling for some reason, there are two ways to do it:

1. Specify the --byslot option to mpirun. For example, the following command specifies the --byslot and --hostfile options:

% mpirun -np 4 --byslot --hostfile myfile a.out

The following example uses the -host option:

% mpirun -np 4 --byslot -host node0,node0,node1,node1 a.out

2. Set the MCA parameter rmaps_base_schedule_policy to the value slot. For example:

% mpirun --mca rmaps_base_schedule_policy slot -np 4 a.out

Note - The examples in this chapter set MCA parameters on the command line. For more information about the ways in which you can set MCA parameters, see Chapter 7. In addition, the Open MPI FAQ contains information about MCA parameters at the following URL:

http://www.open-mpi.org/faq/?category=tuning#setting-mca-params

The following output example shows the contents of a simple hostfile called my-hosts and the results of the mpirun command using by-slot scheduling.

% cat my-hosts
node0 slots=2 max_slots=20
node1 slots=2 max_slots=20
% mpirun --hostfile my-hosts -np 8 --byslot hello | sort
Hello World I am rank 0 of 8 running on node0
Hello World I am rank 1 of 8 running on node0
Hello World I am rank 2 of 8 running on node1
Hello World I am rank 3 of 8 running on node1
Hello World I am rank 4 of 8 running on node0
Hello World I am rank 5 of 8 running on node0
Hello World I am rank 6 of 8 running on node1
Hello World I am rank 7 of 8 running on node1

Scheduling By Node

In by-node scheduling, Open MPI schedules a single process on each node in a round-robin fashion (looping back to the beginning of the node list as necessary) until all processes have been scheduled. Nodes are skipped once their default slot counts are exhausted.

To Specify By-Node Scheduling

There are two ways to specify by-node scheduling:

Specify the --bynode option to mpirun. For example:

% mpirun -np 4 --bynode --hostfile my-hosts a.out

Set the MCA parameter rmaps_base_schedule_policy to the value node. For example:

% mpirun --mca rmaps_base_schedule_policy node -np 4 a.out

The following output example shows the contents of the same hostfile used in the previous example and the results of the mpirun command using by-node scheduling.

% cat my-hosts
node0 slots=2 max_slots=20
node1 slots=2 max_slots=20
% mpirun --hostfile my-hosts -np 8 --bynode hello | sort
Hello World I am rank 0 of 8 running on node0
Hello World I am rank 1 of 8 running on node1
Hello World I am rank 2 of 8 running on node0
Hello World I am rank 3 of 8 running on node1
Hello World I am rank 4 of 8 running on node0
Hello World I am rank 5 of 8 running on node1
Hello World I am rank 6 of 8 running on node0
Hello World I am rank 7 of 8 running on node1

Comparing By-Slot to By-Node Scheduling

In the examples in this section, node0 and node1 each have two slots. The diagrams show the differences in scheduling between the two methods.

By-slot scheduling for the two nodes can be represented as follows:

node0

node1

By-node scheduling for the same two nodes can be represented this way:

node0

node1

Controlling Input/Output

Open MPI directs UNIX standard input to /dev/null on all processes except the rank 0 process of MPI_COMM_WORLD. The MPI_COMM_WORLD rank 0 process inherits standard input from mpirun. The node from which you invoke mpirun need not be the same as the node where the MPI_COMM_WORLD rank 0 process resides. Open MPI handles the redirection of the mpirun standard input to the rank 0 process.

Open MPI directs UNIX standard output and standard error from remote nodes to the node that invoked mpirun, and then prints the information from the remote nodes on the standard output/error of mpirun. Local processes inherit the standard output/error of mpirun and transfer to it directly.

To Redirect Standard I/O

To redirect standard I/O for Open MPI applications, use the typical shell redirection procedure on mpirun. For example:

% mpirun -np 2 my_app < my_input > my_output

In this example, only the MPI_COMM_WORLD rank 0 process will receive the stream from my_input on stdin. The stdin on all the other nodes will be tied to /dev/null. However, the stdout from all nodes will be collected into the my_output file.

Controlling Other Job Attributes

To Perform This Task	Use This Option
To change the working directory	-wdir or --wdir
To display debugging output	-d
To display command help	-h

To Change the Working Directory

Use the -wdir or --wdir option to specify the path of an alternative working directory to be used by the processes spawned when you run your program:

% mpirun --wdir working-directory program-name

Setting a path with --wdir does not affect where the runtime environment looks for executables. If you do not specify --wdir, the default is the current working directory. For example:

% mpirun --wdir /home/mystuff/bin a.out

The syntax above changes the working directory for a.out to /home/mystuff/bin.

To Specify Debugging Output

Use this syntax to specify debugging output. For example:

% mpirun -d a.out

The -d option shows the user-level debugging output for all of the ORTE modules used with mpirun. To see more information from a particular module, you can set additional MCA debugging parameters. The availability of the additional debugging information depends on how the module of interest is implemented.

For more information on MCA parameters, see Chapter 7. For more information about whether a module provides additional verbose or debug mode, run the ompi_info command on that module.

To Display Command Help (`-h`)

To display a list of mpirun options, use the -h option (alone). The following example shows the output from mpirun -h:

% ./mpirun -h
mpirun (Open MPI) 1.3r19845-ct8.1-b06a-r21
 
Usage: mpirun [OPTION]...  [PROGRAM]...
Start the given program using Open RTE
 
   -am <arg0>            Aggregate MCA parameter set file list
   --app <arg0>          Provide an appfile; ignore all other command line
                         options
   -bynode|--bynode      Whether to allocate/map processes round-robin by
                         node
   -byslot|--byslot      Whether to allocate/map processes round-robin by
                         slot (the default)
-c|-np|--np <arg0>       Number of processes to run
   -cf|--cartofile <arg0>
                         Provide a cartography file
-d|-debug-devel|--debug-devel
                         Enable debugging of OpenRTE
   -debug|--debug        Invoke the user-level debugger indicated by the
                         orte_base_user_debugger MCA parameter
   -debug-daemons|--debug-daemons
                         Enable debugging of any OpenRTE daemons used by
                         this application
   -debug-daemons-file|--debug-daemons-file
                         Enable debugging of any OpenRTE daemons used by
                         this application, storing output in files
   -debugger|--debugger <arg0>
                         Sequence of debuggers to search for when "--debug"
                         is used
   -default-hostfile|--default-hostfile <arg0>
                         Provide a default hostfile
   -display-allocation|--display-allocation
                         Display the allocation being used by this job
   -display-devel-allocation|--display-devel-allocation
                         Display a detailed list (mostly intended for
                         developers) of the allocation being used by this
                         job
   -display-devel-map|--display-devel-map
                         Display a detailed process map (mostly intended for
                         developers) just before launch
   -display-map|--display-map
                         Display the process map just before launch
   -do-not-launch|--do-not-launch
                         Perform all necessary operations to prepare to
                         launch the application, but do not actually launch
                         it
   -do-not-resolve|--do-not-resolve
                         Do not attempt to resolve interfaces
   -gmca|--gmca <arg0> <arg1>
                         Pass global MCA parameters that are applicable to
                         all contexts (arg0 is the parameter name; arg1 is
                         the parameter value)
-h|--help                This help message
-H|-host|--host <arg0>   List of hosts to invoke processes on
   --hetero              Indicates that multiple app_contexts are being
                         provided that are a mix of 32/64 bit binaries
   -hostfile|--hostfile <arg0>
                         Provide a hostfile
   -launch-agent|--launch-agent <arg0>
                         Command used to start processes on remote nodes
                         (default: orted)
   -leave-session-attached|--leave-session-attached
                         Enable debugging of OpenRTE
   -loadbalance|--loadbalance
                         Balance total number of procs across all allocated
                         nodes
   -machinefile|--machinefile <arg0>
                         Provide a hostfile
   -mca|--mca <arg0> <arg1>
                         Pass context-specific MCA parameters; they are
                         considered global if --gmca is not used and only
                         one context is specified (arg0 is the parameter
                         name; arg1 is the parameter value)
   -n|--n <arg0>         Number of processes to run
   -nolocal|--nolocal    Do not run any MPI applications on the local node
   -nooversubscribe|--nooversubscribe
                         Nodes are not to be oversubscribed, even if the
                         system supports such operation
   --noprefix            Disable automatic --prefix behavior
   -npernode|--npernode <arg0>
                         Launch n processes per node on all allocated nodes
   -ompi-server|--ompi-server <arg0>
                         Specify the URI of the Open MPI server, or the name
                         of the file (specified as file:filename) that
                         contains that info
   -path|--path <arg0>   PATH to be used to look for executables to start
                         processes
   -pernode|--pernode    Launch one process per available node on the
                         specified number of nodes [no -np => use all
                         allocated nodes]
   --prefix <arg0>       Prefix where Open MPI is installed on remote nodes
   --preload-files <arg0>
                         Preload the comma separated list of files to the
                         remote machines current working directory before
                         starting the remote process.
   --preload-files-dest-dir <arg0>
                         The destination directory to use in conjunction
                         with --preload-files. By default the absolute and
                         relative paths provided by --preload-files are
                         used.
-q|--quiet               Suppress helpful messages
   -rf|--rankfile <arg0>
                         Provide a rankfile file
-s|--preload-binary      Preload the binary on the remote machine before
                         starting the remote process.
   -server-wait-time|--server-wait-time <arg0>
                         Time in seconds to wait for ompi-server (default:
                         10 sec)
   -slot-list|--slot-list <arg0>
                         List of processor IDs to bind MPI processes to
                         (e.g., used in conjunction with rank files)
   -tmpdir|--tmpdir <arg0>
                         Set the root for the session directory tree for
                         orterun ONLY
   -tv|--tv              Deprecated backwards compatibility flag; synonym
                         for "--debug"
-v|--verbose             Be verbose
-V|--version             Print version and exit
   -wait-for-server|--wait-for-server
                         If ompi-server is not already running, wait until
                         it is detected (default: false)
   -wd|--wd <arg0>       Synonym for --wdir
   -wdir|--wdir <arg0>   Set the working directory of the started processes
-x <arg0>                Export an environment variable, optionally
                         specifying a value (e.g., "-x foo" exports the
                         environment variable foo and takes its value from
                         the current environment; "-x foo=bar" exports the
                         environment variable name foo and sets its value to
                         "bar" in the started processes)
   -xml|--xml            Provide all output in XML format
 
Report bugs to http://www.open-mpi.org/community/help/

Submitting Jobs Under Sun Grid Engine Integration

There are two ways to submit jobs under Sun Grid Engine integration: interactive mode and batch mode. The instructions in this chapter describe how to submit jobs interactively. For information about how to submit jobs in batch mode, see Chapter 6.

Defining Parallel Environment (PE) and Queue

A PE needs to be defined for all the queues in the Sun Grid Engine cluster to be used as ORTE nodes. Each ORTE node should be installed as an Sun Grid Engine execution host. To allow the ORTE to submit a job from any ORTE node, configure each ORTE node as a submit host in Sun Grid Engine.

Each execution host must be configured with a default queue. In addition, the default queue set must have the same number of slots as the number of processors on the hosts.

To Use PE Commands

To display a list of available PEs (parallel environments), type the following:

% qconf -spl
make

To define a new PE, you must have Sun Grid Engine manager or operator privileges. Use a text editor to modify a template for the PE. The following example creates a PE named orte.

% qconf -ap orte

To modify an existing PE, use this command to invoke the default editor:

% qconf -mp orte

To show a particular PE that has been defined, type this command:

% qconf -sp orte
pe_name           orte
slots             8
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

The value NONE in user_lists and xuser_lists mean enable everybody and exclude nobody.

The value of control_slaves must be TRUE; otherwise, qrsh exits with an error message.

The value of job_is_first_task must be FALSE or the job launcher consumes a slot. In other words, mpirun itself will count as one of the slots and the job will fail, because only n-1 processes will start.

To Use Queue Commands

To show all the defined queues, type the following command:

% qconf -sql
all.q

The queue all.q is set up by default in Sun Grid Engine.

To configure the orte PE from the example in the previous section to the existing queue, type the following:

% qconf -mattr queue pe_list "orte" all.q

You must have Sun Grid Engine manager or operator privileges to use this command.

Submitting Jobs in Interactive Mode

To Set the Interactive Display

Before you submit a job, you must have your DISPLAY environment variable set so that the interactive window will appear on your desktop, if you have not already done so.

For example, if you are working in the C shell, type the following command:

% setenv DISPLAY desktop:0.0

To Submit Jobs Interactively

1. Use the source command to set the Sun Grid Engine environment variables from a file:

mynode4% source /opt/sge/default/common/settings.csh

2. Use the qsh command to start the interactive X Windows session, and specify the parallel environment (in this example, ORTE) and the number of slots to use:

mynode4% qsh -pe orte 2 
waiting for interactive job to be scheduled... 
Your interactive job 324 has been successfully scheduled.

3. On a different node in the cluster, use the cd command to switch to the directory where your executable is located.

mynode5% cd /workspace/joeuser/ompi/trunk/builds/sparc32-g/bin

4. Issue the mpirun command.

mynode5% /opt/SUNWhpc/HPC8.1/sun/bin/mpirun -np 4 hostname

In the above example, Sun Grid Engine starts the user executable hostname with 4 processes on the two Sun Grid Engine assigned slots. The following example shows the output from the mpirun command with the specified options.

mynode5% /opt/SUNWhpc/HPC8.1/sun/bin/mpirun -np 4 --hostname mynode5
 
mynode5

To Verify That Sun Grid Engine Is Running

The following is not required for normal operation, but if you want to verify that Sun Grid Engine is being used, add --mca ras_gridengine_verbose to the mpirun command line. For example:

% ./mpirun -np 4 -mca ras_gridengine_verbose 100 hostname
[mynode6:04234] ras:gridengine: JOB_ID: 28
[mynode6:04234] ras:gridengine: mynode6: PE_HOSTFILE shows slots=2
[mynode6:04234] ras:gridengine: mynode7: PE_HOSTFILE shows slots=2
mynode6
mynode6
mynode7
mynode7
%

To Start an Interactive Session Using qrsh

An alternate way to start an interactive session is by using qrsh instead of qsh. For example:

% qrsh -V -pe orte 8 mpirun -np 4 -byslot hostname

Using MPI Client/Server Applications

The instructions in this section explain how to get best results when starting Open MPI client/server applications.

To Launch the Client/Server Job

1. Type the following command to launch the server application. Substitute the name of your MPI job’s universe for univ1:

% ./mpirun -np 1 --universe univ1 t_accept

2. Type the following command to launch the client application, substituting the name of your MPI job’s universe for univ1:

% ./mpirun -np 4 --universe univ1 t_connect

If the client and server jobs span more than 1 node, the first job (that is, the server job) must specify on the mpirun command line all the nodes that will be used. Specifying the node names allocates the specified hosts from the entire universe of server and client jobs.

For example, if the server runs on node0 and the client job runs on node1 only, the command to launch the server must specify both nodes (using the -host node0,node1 flag) even it uses only one process on node0.

Assuming that the persistent daemon is started on node0, the command to launch the server would look like this:

node0% ./mpirun -np 1 --universe univ1 -host node0,node1 t_accept

The command to launch the client is:

node0% ./mpirun -np 4 --universe univ1 -host node1 t_connect

Using Name Publishing

If you are planning on using name publishing, you must perform some additional tasks. You need to start up an ompi-server processon your server so that both the clients andservers can exchange information using that server.

For information about how to start the ompi-server process, type the following command on your server:

% man ompi-server

Troubleshooting Client/Server Jobs

If the MPI client/server job fails to start, you might see error messages similar to this:

node0% ./orted --persistent --seed --scope public --universe univ4 --debug
[node0:21760] procdir: (null)
[node0:21760] jobdir: (null)
[node0:21760] unidir:
/tmp/openmpi-sessions-joeuser@node0_0/univ4
[node0:21760] top: openmpi-sessions-joeuser@node0_0
[node0:21760] tmp: /tmp
[node0:21760] orte_init: could not contact the specified
universe name univ4
[node0:21760] [NO-NAME] ORTE_ERROR_LOG: Unreachable in file
/opt/SUNWhpc/HPC8.1/sun/bin/orted/runtime/orte_init_stage1.c
at line 221

These messages indicate that there is residual data left in the /tmp directory. This can happen if a previous client/server job has already run from the same node.

To empty the /tmp directory, use the orte-clean utility. For more information about orte-clean, see the orte-clean man page.

You might also need to run orte-clean if you see error messages similar to the following:

node0% ./orted --persistent --seed --scope public --universe univ4 --debug
[node0:21760] procdir: (null)
[node0:21760] jobdir: (null)
[node0:21760] unidir:
/tmp/openmpi-sessions-joeuser@node0_0/univ4
[node0:21760] top: openmpi-sessions-joeuser@node0_0
[node0:21760] tmp: /tmp
[node0:21760] orte_init: could not contact the specified
universe name univ4
[node0:21760] [NO-NAME] ORTE_ERROR_LOG: Unreachable in file
/opt/SUNWhpc/HPC8.1/sun/bin/orted/runtime/orte_init_stage1.c
at line 221
----------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is likely to abort.  There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems.  This failure appears to be an internal failure; here’s some additional information (which may only be relevant to an Open MPI developer):
   orte_sds_base_contact_universe failed
   --> Returned value -12 instead of ORTE_SUCCESS
----------------------------------------------------------------
[node0:21760] [NO-NAME] ORTE_ERROR_LOG: Unreachable in file
/opt/SUNWhpc/HPC8.1/sun/bin/orted/runtime/orte_system_init.c
at line 42
[node0:21760] [NO-NAME] ORTE_ERROR_LOG: Unreachable in file
/opt/SUNWhpc/HPC8.1/sun/bin/orte/runtime/orte_init.c
at line 52
Open RTE was unable to initialize properly.  The error occured while attempting to orte_init().  Returned value -12 instead of ORTE_SUCCESS.

`mpirun` Command Reference

This section provides a quick reference for the mpirun command options.

TABLE 5-2 `mpirun` Command Options
Option	Description
`-am` list-name	Use the MCA parameter set file list called list-name.
`--app` appfile	Directs `mpirun` to use the appfile specified by appfile and to ignore other programs specified on the command line
`-bynode` `--bynode`	Allocates (maps) the processes specified in a round-robin scheme by node. `-byslot` is the default (see below).
`-byslot` `--byslot`	Allocates (maps) the processes specified in a round-robin scheme by slot (processor) This is the default.
`-c` number	Same as the `-np` <number> option. Directs `mpirun` to run the number of copies (specified in number) of the specified program on the selected nodes. See the description of the `-np` option for more information.
-cf --cartofile filename	Run using the cartography file filename. Cartography files describe the layout of and connections between components in a cluster. For more information about cartography files, see the `mpirun(1)` man page.
`-debug` `--debug`	Invokes the user-level debugger specified in the MCA parameter `orte_base_user_debugger`. The default value for the MCA parameter is `totalview`. To change the specified debugger, change the value of the MCA parameter. (See Chapter 7 for more information.)
--debug-daemons	Enable debugging of any ORTE daemons used by this application.
-debug-daemons-file	Enable debugging of any OpenRTE daemons used by this application, storing output in files
--debug-devel	Enable debugging of OpenRTE.
`-debugger` `--debugger`	Specifies the sequence of debuggers you want to use with `mpirun`.This option is a synonym for the `orte_base_user_debugger`, and has the same default value. If you use this option, the value you specify overrides any value set in `orte_base_user_debugger`.
-default-hostfile --default-hostfile filename	Run using the provided default hostfile filename.
-display-allocation --display-allocation	Display the allocation being used by this job.
-display-devel-allocation	Intended for Open MPI/OpenRTE developers. Display a detailed list of the allocation being used by this job.
-display-map --display-map	Display the process map just before launch.
--display-devel-map	Intended for Open MPI/OpenRTE developers. Displayes a detailed process map just before launch.
--do-not-launch	Perform all necessary operations to prepare to launch the application, but do not actually launch it.
-do-not-resolve --do-not-resolve	Do not attempt to resolve interfaces.
`-gmca` `--gmca` param value	Specifies global MCA parameters. param is the name of the specified MCA parameter. value is the value for that parameter.
`-h` `--help`	Displays help for the `mpirun` command. When this option is specified on the command line, it overrides any other options and displays the command help.
`-H` host1, host2, ...hostn	Specifies the list of hosts on which to invoke processes. This is a synonym for `-host`.
`--hetero`	Indicates that multiple `app_context`s are being provided that are a mix of 32 - amd 64-bit binaries.
`-host` `--host` <host1,host2,...hostn>	Specifies the list of hosts on which to invoke processes. This is a synonym for `-H`.
`-hostfile` `--hostfile` filename	Directs mpirun to use the specified hostfile. If `-hostfile` is specified without using filename, `mpirun` uses the default hostfile located at `/opt/SUNWhpc/HPC8.1/etc/openmpi-default-hostfile`.
--launch-agent command-name	Command used to start processes on remote nodes (default: orted)
-leave-seeesion-attached	Enable debugging of OpenRTE.
-loadbalance --loadbalance	Balance total number of processes across all allocated nodes.
`-machinefile` `--machinefile` filename	Synonymous with `-hostfile`.
`-mca` `--mca` param value	Specifies an MCA parameter, where param is the name of the desired MCA parameter and value is the desired value for that parameter. These parameters and values are considered to be global parameters unless the `-gmca` option appears on the same command line.
`-n, --n` number	Specifies the number of processes to run. Synonymous with `-np`.
`--no-daemonize`	Keeps the ORTE daemons used by this application from being detached and used by other processes.
`-nolocal, --nolocal`	Specifies that MPI applications should not be run on the local node (the same node on which `mpirun` is running).
`-nooversubscribe` `--nooversubscribe`	Never oversubscribe the nodes, even if the system supports such operations. This option sets the effective value of `max_slots` to equal the value of `slots`, and overrides the settings for that node in the hostfile.
`--noprefix` value	Cancels any previously specified directory options specified by the `--prefix` option.
`-npernode --npernode`number	Launch number processes per node on all allocated nodes.
-ompi-server --ompi-server name	Specify the URL of the Open MPI server, or the name of the file (specified as file:filename) that contains that information needed to run the job.
`-path` `--path` pathname	Specifies to `mpirun` that the executables to be used for the current job are stored in pathname.
-pernode --pernode	Launch one process per available node on the number of nodes specified in the `-np` option. If no `-np` option is used, then use all allocated nodes.
`--prefix` pathname	Specifies the path to the directory where Open MPI is located on remote node(s). This option is used to run Open MPI on remote nodes (as opposed to running on the local node).
`--preload-files` filename	Preload the comma separated list of files (specified by filename) to the remote machine’s current working directory before starting the remote process.
`--preload-files-dest-dir` directory	Specifies the destination directory (specified by directory) that contains the list of files (specified by `--preload-files` filename) to be used with the `--preload-files` option. By default, this option uses both absolute and relative paths..
`-q, -quiet`	Suppresses output messages from Open MPI.
-rf, --rankfile filename	Provide a rankfile file.
`-s, --preload-binary`	Preload the binary on the remote machine before starting the remote process.
--server-wait-time seconds	Time in seconds to wait for ompi-server (default: 10 sec).
--slot-list id-list	List of processor IDs to which you want to bind MPI processes (for example, a list of processors used in conjunction with rankfile files)
`--tmpdir` pathname	Specifies the root for the session directory tree for `mpirun` only. This applies only to the current job.
`-tv, --tv`	Synonymous with `--debug`. This option is deprecated; use `--debug` instead, if possible.
`--universe` username@hostname:universe_name	Sets the Open MPI universe for this application to username@hostname:universe_name.
`-v, --verbose`	Specifies verbose output.
`-V, --version`	Displays the `mpirun` version number. If no other options are specified on the same command line, this option also causes `mpirun` to exit.
--wait-for-server	If ompi-server is not already running, wait until it is detected (default: false)
`-wd` directory-name	Change to the specified directory before executing the application.
`-wdir, --wdir`	Synonymous with `-wd`.
`-x` variable `-x` variable=value	Exports the environment variable variable and its value in the current environment to the started processes. If value is specified, the option sets the variable’s value to value in the started processes.
-xml, --xml	Provide all output in XML format.

For More Information

For more information about the mpirun command and its options, see the following:

Chapter 7, Using MCA Parameters With mpirun

the mpirun(3) man page

Open MPI FAQ at http://www.open-mpi.org

About the mpirun Command

Syntax for the mpirun Command

mpirun Options

Using Environment Variables With the mpirun Command

Using MCA Parameters With the mpirun Command