Storing MPI Experiments (Sun Studio 12: Performance Analyzer)

Sun Studio 12: Performance Analyzer

Storing MPI Experiments

Because multiprocessing environments can be complex, you should be aware of some issues about storing MPI experiments when you collect performance data from MPI programs. These issues concern the efficiency of data collection and storage, and the naming of experiments. See Where the Data Is Stored for information on naming experiments, including MPI experiments.

Each MPI process that collects performance data creates its own experiment. When an MPI process creates an experiment, it locks the experiment directory. All other MPI processes must wait until the lock is released before they can use the directory. Thus, if you store the experiments on a file system that is accessible to all MPI processes, the experiments are created sequentially, but if you store the experiments on file systems that are local to each MPI process, the experiments are created concurrently.

Issues with experiment names and storage location can be avoided if you allow the Collector to create the experiment names. See the following section Default MPI Experiment Names.

Default MPI Experiment Names

If you do not specify an experiment name, the default experiment name is used. The Collector uses the MPI rank to construct an experiment name with the standard form experiment.m.er, where m is the MPI rank. The stem, experiment, is the stem of the experiment group name if you specify an experiment group, otherwise it is test. The experiment names are unique, regardless of whether you use a common file system or a local file system. Thus, if you use a local file system to record the experiments and copy them to a common file system, you do not have to rename the experiments when you copy them and reconstruct any experiment group file. In most cases, you should allow the Collector to create the experiment names to ensure unique names across all file systems.

Specifying Non-Default MPI Experiment Names

If you store the experiments on a common file system and specify an experiment name in the standard format, experiment. n.er, each experiment is given a unique name when the value of n is incremented for each experiment. Experiments are numbered according to the order in which MPI processes obtain the lock on the experiment directory, and cannot be guaranteed to correspond to the MPI rank of the process. If you attach dbx to MPI processes in a running MPI job, experiment numbering is determined by the order of attachment.

If you store each experiment on its own local file system and specify an explicit experiment name, each experiment might receive that same name. For example, suppose you ran an MPI job across a cluster with four single-processor nodes labelled node0, node1, node2 and node3. Each node has a local disk called /scratch, and you store the experiments in directory username on this disk. The experiments created by the MPI job have the following full path names.

node0:/scratch/username/test.1.er
node1:/scratch/username/test.1.er
node2:/scratch/username/test.1.er
node3:/scratch/username/test.1.er

The full name including the node name is unique, but in each experiment directory there is an experiment named test.1.er. If you move the experiments to a common location after the MPI job is completed, you must make sure that the names remain unique. For example, to move these experiments to your home directory, which is assumed to be accessible from all nodes, and rename the experiments, type the following commands.

rsh node0 ’er_mv /scratch/username/test.1.er test.0.er’
rsh node1 ’er_mv /scratch/username/test.1.er test.1.er’
rsh node2 ’er_mv /scratch/username/test.1.er test.2.er’
rsh node3 ’er_mv /scratch/username/test.1.er test.3.er’

For large MPI jobs, you might want to move the experiments to a common location using a script. Do not use the UNIX® commands cp or mv; use er_cp or er_mv as shown in the example above, and described in Manipulating Experiments.

If you do not know which local file systems are available to you, use the df -lk command or ask your system administrator. Always make sure that the experiments are stored in a directory that already exists, that is uniquely defined and that is not in use for any other experiment. Also make sure that the file system has enough space for the experiments. See Estimating Storage Requirements for information on how to estimate the space needed.

Note –

If you copy or move experiments between computers or nodes you cannot view the annotated source code or source lines in the annotated disassembly code unless you have access to the load objects and source files that were used to run the experiment, or a copy with the same path and timestamp.