Sun Studio 12 Update 1: Performance Analyzer

Collecting Data From MPI Programs

The Collector can collect performance data from multi-process programs that use the Message Passing Interface (MPI). The Collector supports OpenMPI-based MPIs, including Sun HPC ClusterToolsTM 7, Sun HPC ClusterTools 8 software, and MPICH2–based MPIs including MVAPICH2 and Intel MPI. The ClusterTools MPI software is available at http://www.sun.com/software/products/clustertools

For information about MPI and the MPI standard, see the MPI web site http://www.mcs.anl.gov/mpi/ . For more information about Open MPI, see the web site http://www.open-mpi.org/ .

To collect data from MPI jobs, you must use the collect command; the dbx collector subcommands cannot be used to start MPI data collection. Details are provided in Running the collect Command for MPI.

Running the collect Command for MPI

The collect command can be used to trace and profile MPI applications.

To collect data, use the following syntax:


collect [collect-arguments] mpirun [mpirun-arguments] -- program-name [program-arguments]

For example, the following command runs MPI tracing and profiling on each of the 16 MPI processes, storing the data in a single MPI experiment:


collect -M CT8.2 mpirun -np 16 -- a.out 3 5

The initial collect process reformats the mpirun command to specify running the collect command with appropriate arguments on each of the individual MPI processes.

The ‐‐ argument immediately before the program_name is required for MPI profiling. If you do not include the ‐‐ argument, the collect command displays an error message and no experiment is collected.


Note –

The technique of using the mpirun command to spawn explicit collect commands on the MPI processes is no longer supported for collecting MPI trace data. You can still use this technique for collecting other types of data.


Storing MPI Experiments

Because multiprocessing environments can be complex, you should be aware of some issues about storing MPI experiments when you collect performance data from MPI programs. These issues concern the efficiency of data collection and storage, and the naming of experiments. See Where the Data Is Stored for information on naming experiments, including MPI experiments.

Each MPI process that collects performance data creates its own subexperiment. While an MPI process creates an experiment, it locks the experiment directory; all other MPI processes must wait until the lock is released before they can use the directory. Store your experiments on a file system that is accessible to all MPI processes.

If you do not specify an experiment name, the default experiment name is used. Within the experiment, the Collector will create one subexperiment for each MPI rank. The Collector uses the MPI rank to construct a subexperiment name with the form M_rm.er, where m is the MPI rank.

If you plan to move the experiment to a different location after it is complete, then specify the -A copy option with the collect command. To copy or move the experiment, do not use the UNIX® cp or mv command; instead, use the er_cp or er_mv command as described in Chapter 9, Manipulating Experiments.

MPI tracing creates temporary files in /tmp/a.*.z on each node. These files are removed during the MPI_finalize() function call. Make sure that the file systems have enough space for the experiments. Before collecting data on a long running MPI application, do a short duration trial run to verify file sizes. Also see Estimating Storage Requirements for information on how to estimate the space needed.

MPI profiling is based on the open source VampirTrace 5.5.3 release. It recognizes several supported VampirTrace environment variables, and a new one, VT_STACKS, which controls whether or not call stacks are recorded in the data. For further information on the meaning of these variables, see the VampirTrace 5.5.3 documentation.

The default values of the environment variables VT_BUFFER_SIZE and VT_MAX_FLUSHES limit the internal buffer of the MPI API trace collector to 64 Mbytes and the number of times that the buffer is flushed to 1, respectively. After the limit has been reached for a particular MPI process, events are no longer written into the trace file for that process. The result can be an incomplete experiment, and in some cases, the experiment might not be readable.

To remove the limit and get a complete trace of an application, set VT_MAX_FLUSHES to 0. This setting causes the MPI API trace collector to flush the buffer to disk whenever the buffer is full. To change the size of the buffer, use the environment variable VT_BUFFER_SIZE. The optimal value for this variable depends on the application that is to be traced. Setting a small value will increase the memory available to the application but will trigger frequent buffer flushes by the MPI API trace collector. These buffer flushes can significantly change the behavior of the application. On the other hand, setting a large value, like 2 Gbytes, will minimize buffer flushes by the MPI API trace collector, but decrease the memory available to the application. If not enough memory is available to hold the buffer and the application data this might cause parts of the application to be swapped to disk leading also to a significant change in the behavior of the application.

Another important variable is VT_VERBOSE, which turns on various error and status messages. Set this variable to 2 or higher if problems arise.


Note –

If you copy or move experiments between computers or nodes, you cannot view the annotated source code or source lines in the annotated disassembly code unless you have access to the source files or a copy with the same timestamp. You can put a symbolic link to the original source file in the current directory in order to see the annotated source. You can also use settings in the Set Data Presentation dialog box: the Search Path tab (see Search Path Tab) lets you manage a list of directories to be used for searching for source files, the Pathmaps tab (see Pathmaps Tab) enables you to map the leading part of a file path from one location to another.