The -Z option causes the CRE to organize a job's processes into subsets of a specified size and to group all processes in a subset on the same node. You specify the subset size with a numerical argument to -Z. For example,
% mprun -Z 3 -np 8 a.out
groups the job's processes by threes. These groups may be distributed onto different nodes, but there is no guarantee that they will be; two or more groups may be started on the same CPU.
The -Z option is incompatible with the -S and -W options.
You can construct an RRS expression (see "Expressing More Complex Resource Requirements") that causes mprun to distribute a specified number of processes (MPI ranks) to a set of nodes in a specified order. The RRS expression assigns to each node in the set a single-character alias preceded by a number, which together make up a sequence of count/alias pairs. For example:
"[2a2b2c2d]:a.name=hpc-node0 & b.name=hpc-node1 & c.name=hpc-node2 & d.name=hpc-node3"
The number that precedes a node's alias tells the CRE how many processes to start on that node. In this example, it assigns two processes to each of the nodes defined by the aliases a, b, c, and d. This number can be different for each node, but it must not exceed the number of CPUs on that node.
The CRE distributes processes to the nodes in the order in which they are listed in the RRS expression, starting the rank 0 process on the first node in the list. Once the prescribed number of processes have been started on the first node, the CRE moves to the second node and then to subsequent nodes, starting the specified number of processes on each node in turn. An alias cannot be repeated in the sequence, but one node can be defined with more than one alias.
The RRS rank-mapping expression must satisfy the following conditions:
Up to 26 node aliases can be defined; aliases are not case-sensitive. Every node alias must be preceded by a number, which may have more than one digit.
The number of processes assigned to a given node cannot be greater than the number of CPUs on that node.
The -np value cannot be greater than the total number of processes allocated by the RRS expression. You cannot use use the -W option to get around this restriction by wrapping the processes.
The following example shows this technique being applied on a 4x4 partition. Two processes are started on each of four, four-CPU nodes.
% mprun -o -np 8 -R "[2a2b2c2d]:a.name=hpc-node0 & b.name=hpc-node1 & c.name=hpc-node2 & d.name=hpc-node3" uname -n r0:hpc-node0 r1:hpc-node0 r2:hpc-node1 r3:hpc-node1 r4:hpc-node2 r5:hpc-node2 r6:hpc-node3 r7:hpc-node3
The -o option prepends each output line with the MPI rank of the process that writes it. Two CPUs on each node are not participants in this job.
The next example shows different numbers of processes being allocated to each node. One process is started on the first node, two on the second, and so forth.
% mprun -o -np 10 -R "[1a2b3c4d]:a.name=hpc-node0 & b.name=hpc-node1 & c.name=hpc-node2 & d.name=hpc-node3" uname -n r0:hpc-node0 r1:hpc-node1 r2:hpc-node1 r3:hpc-node2 r4:hpc-node2 r5:hpc-node2 r6:hpc-node3 r7:hpc-node3 r8:hpc-node3 r9:hpc-node3
The following example shows the error message that is returned when the number of processes assigned to a node exceeds the number of CPUs on that node.
% mprun -o -np 6 -R "[2a1b3c]:a.name=hpc-node0 & b.name=hpc-node1 & c.name=hpc-node0" uname -n mprun: no_mp_jobs: No nodes in partition satisfy RRS
In this case, the node hpc-node0 is aliased twice--as 2a and 3c--so that it can be repeated in the sequence. This use of multiple aliases is legal, but hpc-node0 has four CPUs and the total number of processes assigned by 2a and 3c is five, which violates the second condition listed above.
The next example shows what happens when an alias does not start with a number. In this case, the alias for hpc-node0 violates the first condition listed above.
% mprun -o -np 6 -R "[a2b3c]:a.name=hpc-node0 & b.name=hpc-node1 & c.name=hpc-node2" uname -n mprun: no_mp_jobs: No nodes in partition satisfy RRS