Sun Studio 12: Performance Analyzer

Collecting Data From MPI Programs

The Collector can collect performance data from multi-process programs that use the Message Passing Interface (MPI) library. The Open MPI library is included in the Sun HPC ClusterToolsTM 7 software, and the Sun MPI library is included in ClusterTools 5 and ClusterTools 6 software. The ClusterTools software is available at http://www.sun.com/software/products/clustertools/.

To start the parallel jobs when using ClusterTools 7, use the Open Run-Time Environment (ORTE) command mpirun.

To start parallel jobs when using ClusterTools 5 or ClusterTools 6, use the Sun Cluster Runtime Environment (CRE) command mprun.

See the Sun HPC ClusterTools documentation for more information.

For information about MPI and the MPI standard, see the MPI web site http://www.mcs.anl.gov/mpi/ . For more information about Open MPI, see the web site http://www.open-mpi.org/ .

Because of the way MPI and the Collector are implemented, each MPI process records a separate experiment. Each experiment must have a unique name. Where and how the experiment is stored depends on the kinds of file systems that are available to your MPI job. Issues about storing experiments are discussed in the next section, Storing MPI Experiments.

To collect data from MPI jobs, you can either run the collect command under MPI or start dbx under MPI and use the dbx collector subcommands. These tasks are discussed in Running the collect Command Under MPI and Collecting Data by Starting dbx Under MPI.

Storing MPI Experiments

Because multiprocessing environments can be complex, you should be aware of some issues about storing MPI experiments when you collect performance data from MPI programs. These issues concern the efficiency of data collection and storage, and the naming of experiments. See Where the Data Is Stored for information on naming experiments, including MPI experiments.

Each MPI process that collects performance data creates its own experiment. When an MPI process creates an experiment, it locks the experiment directory. All other MPI processes must wait until the lock is released before they can use the directory. Thus, if you store the experiments on a file system that is accessible to all MPI processes, the experiments are created sequentially, but if you store the experiments on file systems that are local to each MPI process, the experiments are created concurrently.

Issues with experiment names and storage location can be avoided if you allow the Collector to create the experiment names. See the following section Default MPI Experiment Names.

Default MPI Experiment Names

If you do not specify an experiment name, the default experiment name is used. The Collector uses the MPI rank to construct an experiment name with the standard form experiment.m.er, where m is the MPI rank. The stem, experiment, is the stem of the experiment group name if you specify an experiment group, otherwise it is test. The experiment names are unique, regardless of whether you use a common file system or a local file system. Thus, if you use a local file system to record the experiments and copy them to a common file system, you do not have to rename the experiments when you copy them and reconstruct any experiment group file. In most cases, you should allow the Collector to create the experiment names to ensure unique names across all file systems.

Specifying Non-Default MPI Experiment Names

If you store the experiments on a common file system and specify an experiment name in the standard format, experiment. n.er, each experiment is given a unique name when the value of n is incremented for each experiment. Experiments are numbered according to the order in which MPI processes obtain the lock on the experiment directory, and cannot be guaranteed to correspond to the MPI rank of the process. If you attach dbx to MPI processes in a running MPI job, experiment numbering is determined by the order of attachment.

If you store each experiment on its own local file system and specify an explicit experiment name, each experiment might receive that same name. For example, suppose you ran an MPI job across a cluster with four single-processor nodes labelled node0, node1, node2 and node3. Each node has a local disk called /scratch, and you store the experiments in directory username on this disk. The experiments created by the MPI job have the following full path names.


node0:/scratch/username/test.1.er
node1:/scratch/username/test.1.er
node2:/scratch/username/test.1.er
node3:/scratch/username/test.1.er

The full name including the node name is unique, but in each experiment directory there is an experiment named test.1.er. If you move the experiments to a common location after the MPI job is completed, you must make sure that the names remain unique. For example, to move these experiments to your home directory, which is assumed to be accessible from all nodes, and rename the experiments, type the following commands.


rsh node0 ’er_mv /scratch/username/test.1.er test.0.er’
rsh node1 ’er_mv /scratch/username/test.1.er test.1.er’
rsh node2 ’er_mv /scratch/username/test.1.er test.2.er’
rsh node3 ’er_mv /scratch/username/test.1.er test.3.er’

For large MPI jobs, you might want to move the experiments to a common location using a script. Do not use the UNIX® commands cp or mv; use er_cp or er_mv as shown in the example above, and described in Manipulating Experiments.

If you do not know which local file systems are available to you, use the df -lk command or ask your system administrator. Always make sure that the experiments are stored in a directory that already exists, that is uniquely defined and that is not in use for any other experiment. Also make sure that the file system has enough space for the experiments. See Estimating Storage Requirements for information on how to estimate the space needed.


Note –

If you copy or move experiments between computers or nodes you cannot view the annotated source code or source lines in the annotated disassembly code unless you have access to the load objects and source files that were used to run the experiment, or a copy with the same path and timestamp.


Running the collect Command Under MPI

To collect data with the collect command under the control of MPI, use the following syntax, as appropriate for your version of ClusterTools.

On Sun HPC ClusterTools 7:


% mpirun -np n 
collect [collect-arguments] program-name
 [program-arguments]

On Sun HPC ClusterTools 6 and earlier:


% mprun -np n 
collect [collect-arguments] program-name
 [program-arguments]

In each case, n is the number of processes to be created by MPI. This procedure creates n separate instances of collect, each of which records an experiment. Read the section Where the Data Is Stored for information on where and how to store the experiments.

To ensure that the sets of experiments from different MPI runs are stored separately, you can create an experiment group with the -g option for each MPI run. The experiment group should be stored on a file system that is accessible to all MPI processes. Creating an experiment group also makes it easier to load the set of experiments for a single MPI run into the Performance Analyzer. An alternative to creating a group is to specify a separate directory for each MPI run with the -d option.

Collecting Data by Starting dbx Under MPI

To start dbx and collect data under the control of MPI, use the following syntax.

On Sun HPC ClusterTools 7:


% mpirun -np n dbx program-name < collection-script

On Sun HPC ClusterTools 6 or earlier:


% mprun -np n dbx program-name < collection-script

In each case, n is the number of processes to be created by MPI and collection-script is a dbx script that contains the commands necessary to set up and start data collection. This procedure creates n separate instances of dbx, each of which records an experiment on one of the MPI processes. If you do not define the experiment name, the experiment is labelled with the MPI rank. Read the section Storing MPI Experiments for information on where and how to store the experiments.

You can name the experiments with the MPI rank by using the collection script and a call to MPI_Comm_rank() in your program. For example, in a C program you would insert the following line.


ier = MPI_Comm_rank(MPI_COMM_WORLD,&me);

In a Fortran program you would insert the following line.


call MPI_Comm_rank(MPI_COMM_WORLD, me, ier)

If this call was inserted at line 17, for example, you could use a script like this.


stop at 18
run program-arguments
rank=$[me]
collector enable
collector store filename experiment.$rank.er
cont
quit