Sun MPI 4.0 User's Guide: With CRE

Chapter 3 Executing Programs

This chapter describes how to issue commands to execute programs on a Sun HPC cluster. You can execute programs on any node or nodes in any partitions to which you have access. A major difference between the Sun HPC cluster and a collection of workstations is that the Sun Cluster Runtime Environment (CRE) provides you with a simple, interactive interface for specifying where and how your program should run.

All programs written for Solaris 2.6 or Solaris 7 can run without recompilation on a Sun HPC cluster.

Note -

Running parallel jobs with the CRE is supported on up to 256 processors and up to 64 nodes.

Choosing Where to Execute

The Sun CRE provides you with considerable flexibility in choosing where you want your program to execute. For example, you can specify

The partition in which you want to execute your program.

The number of processes you want to start, and how you want to map them to nodes.

The characteristics of the node or nodes on which you want to run--for example, the minimum amount of memory required or the maximum acceptable load.

See "Specifying Where a Program Is to Run " for additional information on specifying where a program is to run.

You can specify default execution criteria via the MPRUN_FLAGS environment variable; see "Specifying Default Execution Options". You can also override these criteria via options to the mprun command.

Authentication Methods

Sun HPC Software includes two optional forms of user authentication that require the execution of user-level commands. The two methods are Kerberos Version 4 and DES. If one of these authentication methods is enforced on your Sun HPC cluster, use the commands listed in Table 3-1.

Table 3-1 User Commands Required by Authentication Methods


Authentication Method	Required Command
DES	While DES authentication is in use, you must issue the `keylogin` command before issuing any commands beginning with `mp`, such as `mprun` or `mpps`.
Kerberos 4	While Kerberos Version 4 authentication is in use, you must issue a `kinit` command before running any command beginning with `mp`, such as `mprun` or `mpps`.

See your system administrator for details.

Specifying Default Execution Options

You can use the environment variable MPRUN_FLAGS to specify one or more default options to the program execution command, mprun. Then, you need not specify any option contained in MPRUN_FLAGS. mprun will be interpreted as if the options contained in MPRUN_FLAGS were included on the command line (preceding any options that are on the command line).

You can override any default option by including a new value for the option on the mprun command line.

Note -

For the -R option, the interaction between MPRUN_FLAGS and mprun is somewhat more complicated. This special relationship is "Expressing More Complex Resource Requirements" and "Running on the Same Node(s) as a Another Specified Job".

The setting of the environment variable can be any number of valid mprun options. If you use more than one word, enclose the list in quotation marks. These options are described in more detail in the remainder of the chapter and are listed in "mprun Options ".

For example, the following makes part2 the default partition to be used for mprun.

C shell

% setenv MPRUN_FLAGS "-p part2"

Bourne shell

# MPRUN_FLAGS = "-p part2"; export MPRUN_FLAGS

You can check the current setting of MPRUN_FLAGS by issuing the command printenv.

C shell

% printenv MPRUN_FLAGS

Bourne shell

# printenv MPRUN_FLAGS

All MPRUN_FLAGS settings can be overridden by specifying the corresponding option on the mprun command line.

In addition, the default partition setting can be determined in two other ways. If -p is not specified on the mprun command line and MPRUN_FLAGS is not set to a default partition, the default partition is

First, the one where you are logged in.

Failing that, the one specified by the system administrator via the cluster administration command mpadmin.

Executing Programs via `mprun`

This section provides general information about executing programs via mprun.

Execution via mprun is similar to standard Solaris program execution. For example,

Your environment is used as if you executed the program from a traditional shell.

Signals are treated as they are in standard Solaris; for multiprocess programs, if one process is killed via a signal, all processes are killed.

You can run a program in the background:

% mprun a.out &

CRE commands do differ slightly from standard Solaris execution. These differences are discussed in "Moving mprun Processes to the Background" through "SMP Characteristics of Sun HPC clusters".

Moving `mprun` Processes to the Background

When you move either a process started with mprun or a script that issues mprun commands to the background, you must do one of the following:

Redirect stdin away from the terminal.

Specify the -n option to mprun so that standard in will be read from /dev/null. See "Specifying the Behavior of I/O Streams" for a detailed discussion of standard I/O issues.

If you do not take one of these steps, the mprun process will contend with your shell for characters typed at the shell, leading to unexpected results.

Shell-Specific Actions

If you want to perform actions that are shell specific, such as executing compound commands, you must first invoke the appropriate shell as part of the mprun command. For example,

% mprun csh -c `echo $USER`

% mprun csh -c `cd /foo ; bar`

Core Files

Core files are produced as they normally are in Solaris. However, if more than one process dumps core in a multiprocess program, the resulting core file may be invalid.

Standard Output and Standard Error

By default, mprun handles standard output and standard error the way rsh does: The output and error streams are merged and are displayed on your terminal screen. Note that this is slightly different from the standard Solaris behavior when you are not executing remotely; in that case, the stdout and stderr streams are separate. You can obtain this behavior with mprun via the -D option. You can also specify other methods for handling I/O streams, including the three standard ones. See "Specifying the Behavior of I/O Streams" for additional information.

File Descriptors

If your job consists of a large number of processes, you may need to consider the number of file descriptors the job is using and, if necessary, increase the default number available to you.

For merged standard I/O, each process in a job requires two descriptors. For separate stderr and stdout streams, each process requires three descriptors. You also need three file descriptors for interacting with your terminal.

You can find out the default number of file descriptors available in your shell by issuing the command

C shell

% limit descriptors

Bourne shell

# ulimit -n

The default for most shells is 64. This limits you to about 30 processes for merged standard I/O and about 20 processes for separate standard I/O. If this isn't sufficient, you can increase your limit by issuing the command

C shell

% limit descriptors 128

Bourne shell

# ulimit -n 128

Or you can set it to the maximum value

C shell

% unlimit descriptors

Bourne shell

# ulimit -n `ulimit -Hn`

The file descriptor maximum in Solaris 2.6 and Solaris 7 is 1024.

SMP Characteristics of Sun HPC clusters

Since your Sun HPC cluster consists of symmetric multiprocessors (SMPs), the CRE takes into consideration the number of CPUs per node by default. In general, mprun will assign more processes to larger SMPs. For information about how the CRE allocates processes to CPUs, see "When Number of Processes Exceeds Number of CPUs" and "Default Process Spawning".

Executing Programs

The basic format of the mprun command is

% mprun [options] [-] executable [args ... ]

Note -

When the name of your program conflicts with the name of an mprun option, use the - (dash) symbol to separate the program name from the option list.

`mprun` Options

The following table lists and briefly describes the mprun options. Their use is described more fully in "Specifying Where a Program Is to Run " through "Specifying the Behavior of I/O Streams".

Table 3-2 Options for mprun


Option	Meaning
-A `aout`	Execute `aout` and use a different argument as the argv[0] argument to the program. See "Specifying a Different Argument Vector".
-B	Send `stderr` and `stdout` output streams to files. See "Specifying the Behavior of I/O Streams".
-c `cluster_name`	Run on the specified cluster. See "Specifying the Cluster".
-C `path`	Use the specified directory as the current working directory for the job. See "Changing the Working Directory".
-D	Provide separate stdout and stderr streams. See "Specifying the Behavior of I/O Streams".
-G `group`	Execute with the specified group ID or group name. See "Executing with a Different User or Group Name".
-h	Display help. See "Getting Information".
-i	Standard input to mprun is sent only to rank 0, and not to all other ranks.
-I `file_descr_string`	Use the specified I/O file descriptor string to control I/O stream handling. See "Specifying the Behavior of I/O Streams".
-j `jid`	Run on the same node(s) as the job with job ID `jid`. See "Running on the Same Node(s) as a Another Specified Job".
-J	Show the jid, cluster name, and number of processes after executing. See "Getting Information".
-n	Read stdin from . See "Specifying the Behavior of I/O Streams".
-N	Do not open any standard I/O connections. See "Specifying the Behavior of I/O Streams".
`-`np `number`	Request the specified number of processes. See "Controlling Process Spawning".
-Ns	Disable spawning of multiple processes from a job on SMPs; see "Default Process Spawning".
-o	Prefix each output line with the rank of the process that wrote it.
`-`p `partition`	Run in the specified partition. See "Specifying the Partition".
-R "`resource_string"`	Specify conditions for choosing nodes. See "Expressing More Complex Resource Requirements".
-S	Settle for the available number of nodes (used with -np). See "Controlling Process Spawning".
-U `user`	Execute with the specified user ID or user name. See "Executing with a Different User or Group Name".
-V	Display version information. See "Getting Information".
-W	Wrap the requested processes on the available CPUs (used with -np). See "Controlling Process Spawning".
-Ys	Allow spawning on SMPs. See "Default Process Spawning".
-Z `rank`	Run processes, by groups of size `rank,` together on the same node. (incompatible with -S and -W) See "Mapping MPI Ranks to Nodes".

Specifying Where a Program Is to Run

The mprun command provides you with considerable flexibility in specifying where you want your job to run.

"Specifying the Partition" describes how to choose the partition in which a program is to run.

"Specifying the Cluster" describes how to choose the cluster on which you want your program to run.

"Controlling Process Spawning" describes how to specify how many processes are to be started and how they should be mapped to nodes.

"Expressing More Complex Resource Requirements" describes a syntax for specifying complex requirements that can't be encapsulated in the basic command-line options.

In cases where your specified requirements can be met by more than one node, the cluster chooses the least-loaded node, unless you have specified other sorting criteria.

Specifying the Partition

Use mprun -p to specify the partition in which you want your program to run. The partition must be in the enabled state. For example,

% mprun -p part2 a.out

specifies that a.out is to be run in the partition part2.

The mpinfo command will tell you the names of enabled partitions in the cluster, along with other useful information about cluster resources. See "mpinfo: Configuration and Status" for a description of mpinfo.

Specifying the Cluster

By default, your job will run on the cluster where you are logged in.

If you are logged in on a machine that is connected to the Sun HPC cluster on which you want to run your job, but is not part of the cluster, use mprun -c cluster_name to specify the cluster.

Note -

Use the hostname of the cluster's master node as the cluster name. You can find the cluster's master node by running mpinfo -C on any node in the cluster. See "Specifying the Partition" for additional details.

Controlling Process Spawning

Specify the Number of Processes

Use the -np option to specify the number of processes you want to start; the default is 1. This option is typically used with a Sun MPI program.

For example,

% mprun -p part2 -np 4 a.out

specifies that you want four copies of a.out to start on the nodes of the partition named part2.

You can also specify 0 as the -np value. The CRE will start one process per CPU on each available CPU. Thus, if the partition part2 has six available CPUs, the command

% mprun -p part2 -np 0 a.out

will start six copies of a.out.

Limit to One Process Per Node

Use the -Ns option to limit the number of processes to one per node. This prevents nodes from spawning more processes regardless of the number of CPUs they have.

When Number of Processes Exceeds Number of CPUs

When you request multiple processes (via the -np option), the CRE attempts to start one process per CPU. If you request more processes than the number of available CPUs, you must include either the -W or -S option. Otherwise, mprun will fail.

Use the -W option if you want the processes to wrap--that is, to allocate multiple processes to each CPU, which will execute their respective sets of processes one by one. For example, if the partition part2 has six available CPUs and you specify

% mprun -p part2 -np 10 -W a.out

the CRE will start 10 processes on the six CPUs.

Note -

When the CRE wraps processes, it distributes them according to load-balancing rules. Therefore, you will not be able to predict where they will execute.

If you prefer to have a certain number of processes started, but are willing to settle for however many CPUs are available, use the -S option. The CRE will start one process on each available CPU. Thus, if you issue the same command as above, but substitute -S for -W:

% mprun -p part2 -np 10 -S a.out

and six CPUs are available on part2, then six copies of a.out will start, one per CPU.

Note -

If you specify -np number, but not -np 0, -S, or -W, and there are not enough nodes within the partition, the CRE will look for nodes outside the partition to make up the difference. To be elegible, an external node must be both enabled and independent. That is, the node must not be a member of another partition that is enabled. If you specify -np 0, -S, or -W, the search will be restricted to the partition you are in.

Expressing More Complex Resource Requirements

Use the -R option to express complex node requirements that are not accessible via the basic options discussed above.

The -R option takes a resource requirement specifier (RRS) as an argument. The RRS is enclosed in quotation marks and provides the settings for any number of attributes that you want to use to control the selection of nodes. You combine multiple attribute settings using the logical & (AND) and | (OR) operators.

The CRE parses the attribute settings in the order in which they are listed in the RRS, along with other options you specify. The CRE merges these results with the results of an internally specified RRS that controls load balancing.

Note -

One option is an exception to this merging behavior, -j. This exception is discussed later.

The result is an ordered list of CPUs that meet the specified criteria. If you are starting a single process, the CRE starts the process on the CPU that's first in the list. If you are starting n processes, the CRE starts them on the first n CPUs, wrapping if necessary.

Note -

Unless -Ns is specified, the RRS specifies node resources but generates a list of CPUs. If -Ns is specified, the list refers only to nodes.

Specifying Resource Attributes

Table 3-3 lists predefined attributes you can include in an RRS. Your system administrator may also have defined attributes specific to your Sun HPC cluster. You can see what settings these administrator-defined attributes have with the mpinfo command.

Table 3-3 Standard RRS Attributes


Attribute	Meaning
`cpu_idle`	Percent of time that the CPU is idle.
cpu_iowait	Percent of time that the CPU spends waiting for I/O.
cpu_kernel	Percent of time that the CPU spends in the kernel.
cpu_scale	Performance rating of the CPU.
cpu_swap	Percent of time that the CPU spends waiting for swap.
cpu_type	CPU architecture.
cpu_user	Percent of time that the CPU spends running user's program.
load1	Node's load average for the past minute.
load5	Node's load average for the past 5 minutes.
load15	Node's load average for the past 15 minutes.
manufacturer	Hardware manufacturer.
mem_free	Nodes's available memory, in Mbytes.
mem_total	Node's total physical memory, in Mbytes.
name	Node's hostname.
os_max_proc	Maximum number of processes allowed on the node, including cluster daemons.
os_arch_kernel	Node's kernel architecture.
os_name	Operating system's name.
os_release	Operating system's release number.
os_release_maj	The major number of the operating system's release number.
os_release_min	The minor number of the operating system's release number.
os_version	Operating system's version.
serial_number	Node's serial number.
swap_free	Node's available swap space, in Mbytes.
swap_total	Node's total swap space, in Mbytes.

The CRE recognizes two types of attributes, value and boolean.

Value-Based Attributes

Value attributes can take a literal value or a numeric value. Or, depending on the operator used, they may take no value.

Attributes with a literal value take a name as a setting. Use an equal sign and the name after the attribute to show the setting. For example,

% mprun -R "name = hpc-demo" a.out

Attributes with a numeric value include an operator and a value. For example,

% mprun -R "load5 < 4" a.out

specifies that you only want nodes whose individual load averages over the previous 5 minutes were less than 4.

Attributes that use either << or >> take no value. For example,

% mprun -R "mem_total>>" a.out

specifies that you prefer nodes with the largest physical memory available.

Table 3-4 identifies the operators that can be used in RRS expressions.

Table 3-4 Operators Valid for Use in RRS


Operator	Meaning
<	Select all nodes where the value of the specified attribute is less than the specified value.
<=	Select all nodes where the value of the specified attribute is less than or equal to the specified value.
=	Select all nodes where the value of the specified attribute is equal to the specified value.
>=	Select all nodes where the value of the specified attribute is greater than or equal to the specified value.
>	Select all nodes where the value of the specified attribute is greater than the specified value.
!=	Attribute must not be equal to the specified value. (Precede with a backslash in the C shell.)
<<	Select the node(s) that have the lowest value for this attribute.
>>	Select the node(s) that have the highest value for this attribute.

The operators have the following precedence, from strongest to weakest:

unary -
*, /
+, binary -
=, !=, >=, <=, >, <, <<, >>
!
&, |
?

If you use the << or >> operator, the CRE does not provide load balancing. In the previous example, the CRE would choose the node with the most free swap space, regardless of its load. If you use << or >> more than once, only the last use has any effect--it overrides the previous uses. For example,

% mprun -R "mem_free>> swap_free>>" a.out

initially selects the nodes that have the most free memory, but then selects nodes that have the largest amount of available swap space. The second selection may yield a different set of nodes than were selected initially.

You can also use arithmetic expressions for numeric attributes anywhere. For example,

% mprun -R "load1 / load5 < 2" a.out

specifies that the ratio between the one-minute load average and the five-minute load average must be less than 2. In other words, the load average on the node must not be growing too fast.

You can use standard arithmetic operators as well as the C ?: conditional operator.

Note -

Because some shell programs interpret characters used in RRS arguments, you may need to protect your RRS entries from undesired interpretation by your shell program. For example, if you use csh, write "-R \!private" instead of "-R !private".

Boolean Attributes

Boolean attributes are either true or false. If you want the attribute to be true, simply list the attribute in the RRS. For example, if your system administrator has defined an attribute called ionode, you can request a node with that attribute:

% mprun -R "ionode" a.out

If you want the attribute to be false (that is, you do not want a resource with that attribute), precede the attribute's name with !. (Precede this with a backslash in the C shell; the backslash is an escape character to prevent the shell from interpreting the exclamation point as a "history" escape.) For example,

% mprun -R "\!ionode" a.out

For example,

% mprun -R "mem_free > 256" a.out

specifies that the node must have over 256 megabytes of available RAM.

% mprun -R "swap_free >>" a.out

specifies that the node picked must have the highest available swap space.

Examples

Here are some examples of the -R option in use.

The following example specifies that the program must run on a node in the partition with 512 Mbytes of memory:

% mprun -p part2 -R "mem_total=512" a.out

The following example specifies that you want to run on any of the three nodes listed:

% mprun -R "name=node1 | name=node2 | name=node3" a.out

The following example chooses nodes with over 300 Mbytes of free swap space. Of these nodes, it then chooses the one with the most total physical memory:

% mprun -R "swap_free > 300 & mem_total>>" a.out

The following example assumes that your system administrator has defined an attribute called framebuffer, which is set (TRUE) on any node that has a frame buffer attached to it. You could then request such a node via the command

% mprun -R "framebuffer" a.out

The -`R` Option and `MPRUN_FLAGS`

With the exception of the -j option, specifying -R on the command line as well as in the MPRUN_FLAGS environment variable combines the two sets of values--that is, the command line does not override the environment variable settings. For example, if you have

% setenv MPRUN_FLAGS '-R "load1 < 1"'

and issue the command

% mprun -R "load5 < 1" -R "load15 < 1" a.out

this would be the same as issuing the command

% mprun -R "(load1<1) & (load5<1) & (load15<1)" a.out

This combining behavior does not happen with the -j option. When -j is specified by MPRUN_FLAGS as well as on the mprun command line, the command line use overrides the environment variable setting.

Running on the Same Node(s) as a Another Specified Job

Use the -j option to specify that the program you want to execute should run on the same node or nodes as a particular job ID (jid). For example, to run a.out on the same node(s) as a job whose job ID is 85, issue the command

% mprun -j 85 a.out

If -j follows the -np or -R option on the command line, it overrides those options. If -np, together with -W or -S, follows -j on the command line, -j determines which nodes to run on, while the other options determine the number of processes to map onto these nodes.

You can use the mpps command to find out the job ID of any job.

Default Process Spawning

By default, mprun spawns multiple processes on SMPs. For example, if you have a two-node partition in which one node has two CPUs and the other has four CPUs, then the command

% mprun -np 6 a.out

runs six copies of a.out, two on the two-CPU node and four on the four-CPU node.

The -j and -R options override this behavior.

Alternatively, you can use the -Ns option to disable spawning of processes on individual CPUs of a node. Instead, -Ns will cause only one process to be started on each node.

Use the -Ys option to force spawning on nodes when used with -R. -Ys does not override -j.

Mapping MPI Ranks to Nodes

Using the `-Z` Option

The -Z option causes the CRE to organize a job's processes into subsets of a specified size and to group all processes in a subset on the same node. You specify the subset size with a numerical argument to -Z. For example,

% mprun -Z 3 -np 8 a.out

groups the job's processes by threes. These groups may be distributed onto different nodes, but there is no guarantee that they will be; two or more groups may be started on the same CPU.

Note -

The -Z option is incompatible with the -S and -W options.

Using RRS to Map Ranks to Nodes

You can construct an RRS expression (see "Expressing More Complex Resource Requirements") that causes mprun to distribute a specified number of processes (MPI ranks) to a set of nodes in a specified order. The RRS expression assigns to each node in the set a single-character alias preceded by a number, which together make up a sequence of count/alias pairs. For example:

"[2a2b2c2d]:a.name=hpc-node0 & b.name=hpc-node1 & c.name=hpc-node2 & d.name=hpc-node3"

The number that precedes a node's alias tells the CRE how many processes to start on that node. In this example, it assigns two processes to each of the nodes defined by the aliases a, b, c, and d. This number can be different for each node, but it must not exceed the number of CPUs on that node.

The CRE distributes processes to the nodes in the order in which they are listed in the RRS expression, starting the rank 0 process on the first node in the list. Once the prescribed number of processes have been started on the first node, the CRE moves to the second node and then to subsequent nodes, starting the specified number of processes on each node in turn. An alias cannot be repeated in the sequence, but one node can be defined with more than one alias.

The RRS rank-mapping expression must satisfy the following conditions:

Up to 26 node aliases can be defined; aliases are not case-sensitive. Every node alias must be preceded by a number, which may have more than one digit.

The number of processes assigned to a given node cannot be greater than the number of CPUs on that node.

The -np value cannot be greater than the total number of processes allocated by the RRS expression. You cannot use use the -W option to get around this restriction by wrapping the processes.

The following example shows this technique being applied on a 4x4 partition. Two processes are started on each of four, four-CPU nodes.

% mprun -o -np 8 -R "[2a2b2c2d]:a.name=hpc-node0 & b.name=hpc-node1 & c.name=hpc-node2 & d.name=hpc-node3" uname -n
r0:hpc-node0
r1:hpc-node0
r2:hpc-node1
r3:hpc-node1
r4:hpc-node2
r5:hpc-node2
r6:hpc-node3
r7:hpc-node3

The -o option prepends each output line with the MPI rank of the process that writes it. Two CPUs on each node are not participants in this job.

The next example shows different numbers of processes being allocated to each node. One process is started on the first node, two on the second, and so forth.

% mprun -o -np 10 -R "[1a2b3c4d]:a.name=hpc-node0 & b.name=hpc-node1 & c.name=hpc-node2 & d.name=hpc-node3" uname -n
r0:hpc-node0
r1:hpc-node1
r2:hpc-node1
r3:hpc-node2
r4:hpc-node2
r5:hpc-node2
r6:hpc-node3
r7:hpc-node3
r8:hpc-node3
r9:hpc-node3

The following example shows the error message that is returned when the number of processes assigned to a node exceeds the number of CPUs on that node.

% mprun -o -np 6 -R "[2a1b3c]:a.name=hpc-node0 & b.name=hpc-node1 & c.name=hpc-node0" uname -n
mprun: no_mp_jobs: No nodes in partition satisfy RRS

In this case, the node hpc-node0 is aliased twice--as 2a and 3c--so that it can be repeated in the sequence. This use of multiple aliases is legal, but hpc-node0 has four CPUs and the total number of processes assigned by 2a and 3c is five, which violates the second condition listed above.

The next example shows what happens when an alias does not start with a number. In this case, the alias for hpc-node0 violates the first condition listed above.

% mprun -o -np 6 -R "[a2b3c]:a.name=hpc-node0 & b.name=hpc-node1 & c.name=hpc-node2" uname -n
mprun: no_mp_jobs: No nodes in partition satisfy RRS

Specifying the Behavior of I/O Streams

Introducing `mprun` I/O

By default, all standard output (stdout) and standard error (stderr) from an mprun-launched job will be merged and sent to mprun's standard output. This is ordinarily the user's terminal. Likewise, mprun's standard input (stdin) is sent to the standard input of all the processes.

You can redirect mprun's standard input, output, and error using the standard shell syntax. For example,

% mprun -np 4 echo hello > hellos

You can also change what happens to the standard input, output, and error of each process in the job. For example,

% mprun echo hello > message

sends hello across the network from the echo process to the mprun process, which writes it to a file called message.

The mprun command's own options allow you to control I/O in other ways. For example, rather than making remote processes communicate with mprun (when it may not be necessary), you can make each process write to or read from a file on the node on which it is running. For example, you can make each process send its standard output or standard error to a file on its own node. In the following example, each node will write hello to a local file called message:

% mprun -I "1w=message" echo hello

mprun also provides options that you can use to control standard output and standard error streams. For example, you can

Use the -D option to make the standard error from each process go to the standard error of mprun, instead of its standard output. For example,

% mprun -D a.out

sends standard output from a.out to the standard output of mprun and sends the standard error of a.out to the standard error of mprun.

Use the -B option to merge the standard output and standard error streams from each process and direct them to files named out.jid.rank, where jid is the job ID of the job and rank is the rank of this process within the job. The files are located in the job's working directory. There is no standard input stream.

Use the -N option to shut off all standard I/O to all the processes. That is, with this option, you specify that there are to be no stdin, stdout, and stderr connections. Use the -N option for situations in which standard I/O is not necessary; you can reduce the overhead incurred by establishing standard I/O connections for each remote process and then closing those connections as each process ends.

Use the -n option to cause stdin to be read from /dev/null. This can be useful when running mprun in the background, either directly or through a script. Without -n, mprun will block in this situation, even if no reads are posted by the remote job. When -n is specified, the user process encounters an EOF if it attempts to read from stdin. This is comparable to the behavior of the -n option to rsh.

Note -
The set of mprun options that control stdio handling cannot be combined. These options override one another. If more than one is given on a command line, the last one overrides all of the rest. The relevant options are: -D, -N, -B, -n, -i, -o, and -I.

Creating a Custom Configuration

Use the -I option to specify a custom configuration for the I/O streams associated with a job, including standard input, output, and error. The -I option takes as an argument a comma-separated series of file descriptor strings. These strings specify what is to happen with each of the job's I/O streams.

In Solaris, each process has a numbered set of file descriptors associated with it. The standard I/O streams are assigned the first three file descriptors:

0 - standard input (stdio)
1 - standard output (stdout)
2 - standard error (stderr)

The argument list to -I can include a string for each file descriptor associated with a job; if any file descriptor is omitted, its stream won't be connected to any device.

Restriction: If you include strings to redirect both standard output and standard error, you must also redirect standard input. If the job has no standard input, you can redirect file descriptor 0 to /dev/null.

The file descriptor strings in the -I argument list can be in any order. Quotation marks around the strings are optional.

File Descriptor Attributes

The file descriptor string assigns one or more of the following attributes to a file descriptor:

r - File descriptor is to be read from.
w - File descriptor is to be written to.
p - File descriptor is to be attached to a pseudo-terminal (pty).

You must specify either r or w for each file descriptor--that is, whether the file descriptor is to be written to or read from.

Thus, the string

5w

means that the stream associated with file descriptor 5 is to be written. And

0rp

means that the standard input is to be read from the pseudo-terminal.

If you use the p (pty) attribute, you must have one rp and one wp in the complete series of file descriptor strings. In other words, you must specify both reading from and writing to the pty. No other attributes can be associated with rp and wp.

The following attributes are output-related and thus can only be used in conjunction with w:

l - Line-buffered output.
t - Tag the line-buffered output with process rank information.
a - Stream is to be appended to the specified file.

Note -
NFS does not support append operations.

Use the l attribute in combination with the w attribute to line-buffer the output of multiple processes. This takes care of the situation in which output from one process arrives in the middle of output from another process. For example,

% mprun -np 2 echo "Hello"
HelHello
lo

With the l attribute, you ensure that processes don't intrude on each other's output. The following example shows how using the l attribute could prevent the problem illustrated in the previous example:

% mprun -np 2 -I "0r, 1wl" echo "Hello"
Hello
Hello

Use the t attribute in place of l to force line-buffering and, additionally, to prefix each line with the rank of the process producing the output. For example,

% mprun -np 2 -I "0r, 1wt" echo "Hello"
r0:Hello
r1:Hello

The b attribute is input-related and thus can be used only in combination with r. In multiprocess jobs, the b attribute specifies that input is to go only to the first process, rather than to all processes, which is the default behavior.

The m attribute pertains to reading from a pseudo-terminal and thus can be used only with rp. The m attribute in combination with rp causes keystrokes to be echoed multiple times when multiple processes are running. The default is to display multiple keystrokes only once.

File Descriptor String Syntax

You can direct one file descriptor's output to the same location as that specified by another file descriptor by using the syntax

fdattr=@other_fd

For example,

2w=@1

means that the standard error is to be sent wherever the standard output is going. You cannot do this for a file descriptor string that uses the p attribute.

If the behavior of the second file descriptor in this syntax is changed later in the -I argument list, the change does not affect the earlier reference to the file descriptor. That is, the -I argument list is parsed from left to right.

You can tie a file descriptor's output to a file by using the syntax

fdattr=filename

For example,

10w=output

says that the stream associated with file descriptor 10 is to be written to the file output. Once again, however, you cannot use this feature for a file descriptor defined with the p attribute.

In the following example, the standard input is read from the pty, the standard output is written to the pty, and the standard error is sent to the file named errors:

% mprun -I "0rp,1wp,2w=errors" a.out

If you use the w attribute without specifying a file, the file descriptor's output is written to the corresponding output stream of the parent process; the parent process is typically a shell, so the output is typically written to the user's terminal.

For multiprocess jobs, each process creates its own file; the file is opened on the node on which the process runs.

Note -

If output is redirected such that multiple processes open the same file over NFS, the processes will overwrite each other's output.

In specifying the individual file names for processes, you can use the following symbols:

&J - The job ID of the job
&R - The rank of the process within the job

The symbols will be replaced by the actual values. For example, assuming the job ID is 15, this file descriptor string

1w=myfile.&J.&R

redirects standout output from a multiprocess job to a series of files named myfile.15.0, myfile.15.1, myfile.15.2, and so on, one file for each rank of the job.

In the following example, there is no standard input (it comes from /dev/null), and the standard output and standard error are written to the files out.job.rank:

% mprun -I "0r=/dev/null,1w=out.&J.&R,2w=@1" a.out

This is the behavior of the -B option. See "Introducing mprun I/O ". Note the inclusion in this example of a file descriptor string for standard input even though the job has none. This is required because both standard output and standard error are redirected.

`mprun` Options versus Shell Syntax

The default I/O behavior of mprun (merged standard error and standard output) is equivalent to

% mprun -I "0rp,1wp,2w=@1" a.out

The -D option provides separate standard output and standard error streams; it is equivalent to:

% mprun -I "0rp,1wp,2w" a.out

You can use the -o option to force each line of output to be prepended with the rank of the process writing it. This is equivalent to

% mprun -I "0rp,1wt,2w=@1" a.out

If you redirect output to a shared file, you must use standard shell redirection rather than the equivalent -I formulation (-I "lwt=outfile"). The same restriction also applies to the linebuffer formulation (-I "lwt=outfile").

For example, the following command line concatenates the outputs of the individual processes of a job and writes them to outfile.dat:

% mprun -np 4 myprogram > outfile.dat

The following command line concatenates the outputs of the individual processes and appends them to the previous content of the output file:

% mprun -np 4 myprogram >> outfile.dat

The following table describes three mprun command-line options that provide the same control over standard I/O as some -I constructs, but are much simpler to express. Their -I equivalents are also shown.

Table 3-5 mprun Shortcut Summary


Command	Description
mprun -i	Standard input to `mprun` is sent only to rank 0, and not to all other ranks. Equivalent to mprun -I "0rpb,1wp,2w=@1" a.out
mprun -B	Standard output and standard error are written to the file `out.job.rank`. Equivalent to mprun -I "0r=/dev/null,1w=out.&J.&R,2w=@1" a.out
mprun -o	Use line buffering on standard output, prefixing each line with the rank of the process that wrote it. Equivalent to mprun -I "0rp,1wt,2w=@1" a.out

Note -

Specifying -o (forcing processes to prepend rank on output lines), or the equivalent -I syntax (such as -I1wt) will not work if redirection is also specified with -I (such as with -I1w=outfile). Use the standard shell redirection operator instead.

These shortcuts are not exact substitutions. The CRE uses ptys correctly, whether the -I option is present or absent. Also, the CRE merges standard error with standard output when it is appropriate. If either stderr or stdout is redirected (but not both), ptys are not used and stderr and stdout are separated. If both stderr and stdout are redirected, ptys are still not used, but stderr and stdout are combined.

Caution Regarding the Use of `-i` Option

Use the -i option to mprun with caution, since the -i option provides only one stdin connection (to rank 0). If that connection is closed, keyboard signals are no longer forwarded to those remote processes. To signal the job, you must go to another window and issue the mpkill command. For example, if you issue the command mprun -np 2 -i cat and then type the Ctrl-d character (which causes cat to close its stdin and exit), rank 0 will exit. However, rank 1 is still running, and can no longer be signaled from the keyboard.

Changing the Working Directory

Use the -C option to specify the path of an alternative working directory to be used by the program. If you don't specify -C, the default is the current working directory. For example,

% mprun -C /home/collins/bin a.out

changes the working directory for a.out to /home/collins/bin.

Executing with a Different User or Group Name

Use the -U option to execute with the specified user ID or user name. For example,

% mprun -U traveler a.out

executes a.out as the user traveler.

Use the -G option to execute with the specified group ID or group name.

% mprun -G qa-team a.out

executes a.out as the group qa-team.

You must have the appropriate level of permissions to use these options. For example, you must belong to the group you specify, or be the superuser.

Getting Information

Use the -h option to display a list of mprun options and their meanings.

Use the -V option to display the command's version number.

If you specify either -h or -V, it must be the only option on the command line.

Use the -J option to display the program's jid, along with the name of the cluster and the number of processes, after executing mprun.

Specifying a Different Argument Vector

By default, mprun passes the vector of a program's command-line arguments to the program in the standard way. For example, if you issue the command

% mprun a.out arg1 arg2

mprun passes an array in which the name of the program, a.out, is the first element (argv[0]), and arg1 and arg2 are the second and third elements.

In cluster-level programming, it is sometimes useful to specify an argv[0] that is not the name of the program. You can use the -A option to do this. The argument to -A is the name of the program to be executed. You can then follow this with an argument of your choice in the arg0 position. For example, if you want to pass newarg as the argv[0] to the program a.out, along with arg1 and arg2, you could issue the command

% mprun -A a.out newarg arg1 arg2

Exit Status

The exit status of mprun specifies the number of processes that exited with nonzero exit status.

Omitting `mprun`

You can execute a serial program without using mprun. For example, you could simply type

% a.out

In that case, the program executes locally, on the node where you are logged in. By doing this, however, you give up the benefits of load-balancing provided by the CRE.

Note -

You cannot run Sun MPI programs in this way; you must use mprun.

Sending a Signal to a Process

The mpkill command is comparable to the Solaris kill command. You use it to terminate all processes of the jobs with the specified job IDs running on the Sun HPC cluster, or to send a signal to it.

You can send any standard Solaris signal. Use the -l option to obtain a list of the supported signals, or the -d option to list them along with brief descriptions.

Specify the signal's name or number, followed by the job ID, to send that signal to the job. For example,

% mpkill -CONT 59

sends a SIGCONT to the processes that constitute job 59.

Issuing mpkill without specifying a signal sends a SIGTERM to the job.

To find out a job's job ID, use the command mpps or the -J option to mprun.

mpkill returns the following status values:

0 - The command executed successfully.

1 - An error was encountered during execution. For example, the job was not known.

2 - The command was partially successful. This typically occurs when you send a signal to a job in which one or more of the processes has already exited and therefore could not receive the signal.

Note, this is usually not an error, since the reason you are using mpkill is most likely to eliminate a job that has hung in this intermediate state.

Chapter 3 Executing Programs

Choosing Where to Execute

Authentication Methods

Specifying Default Execution Options

Executing Programs via mprun

Moving mprun Processes to the Background

Shell-Specific Actions

Core Files

Standard Output and Standard Error

File Descriptors

SMP Characteristics of Sun HPC clusters

Executing Programs

mprun Options

Specifying Where a Program Is to Run

Specifying the Partition

Specifying the Cluster

Controlling Process Spawning

Specify the Number of Processes

Limit to One Process Per Node

When Number of Processes Exceeds Number of CPUs

Expressing More Complex Resource Requirements

Specifying Resource Attributes

Value-Based Attributes

Boolean Attributes

Examples

The -R Option and MPRUN_FLAGS

Running on the Same Node(s) as a Another Specified Job

Default Process Spawning

Mapping MPI Ranks to Nodes

Using the -Z Option

Using RRS to Map Ranks to Nodes

Specifying the Behavior of I/O Streams

Introducing mprun I/O

Creating a Custom Configuration

File Descriptor Attributes

File Descriptor String Syntax

mprun Options versus Shell Syntax

Caution Regarding the Use of -i Option

Changing the Working Directory

Executing with a Different User or Group Name

Getting Information

Specifying a Different Argument Vector

Exit Status

Omitting mprun

Sending a Signal to a Process

Executing Programs via `mprun`

Moving `mprun` Processes to the Background

`mprun` Options

The -`R` Option and `MPRUN_FLAGS`

Using the `-Z` Option

Introducing `mprun` I/O

`mprun` Options versus Shell Syntax

Caution Regarding the Use of `-i` Option

Omitting `mprun`