Prism 6.0 User's Guide

Appendix C General Profiling Methodology, Timers, And Other Profiling Utilities

General Profiling Methodology and the Use of Timers

It is generally desirable to perform code profiling and tuning on "stripped down" runs so that many profiling experiments may be run. The following precautions are recommended.

Using Timers

A high-quality timer called gethrtime is available on UltraSPARC-based systems. From C, your code can call it using:

#include 
hrtime_t gethrtime(void); 

This timer returns time in nanoseconds since some arbitrary point in the past (since system power-on). This time is well-defined for any node, but varies considerably from node to node. Consult the gethrtime man page for gethrtime for details. From Sun Fortran 77, you can write

INTEGER*8 GETHRTIME !$PRAGMA C(GETHRTIME)
DOUBLE PRECISION TSECONDS 
TSECONDS = 1.D-9 * GETHRTIME()

which converts nanoseconds to seconds.

The overhead and resolution for gethrtime from user code are usually better than one microsecond.

MPI Profiling Interface

The MPI standard supports a profiling interface, which allows any user to profile either individual MPI calls or the entire library. This facility is provided by supporting two equivalent APIs for each MPI routine. One has the prefix MPI_, while the other has PMPI_. User codes typically call the MPI_ routines. A profiling routine or library will typically provide wrappers for the MPI_ APIs that simply call the PMPI_ ones, with timer calls around the PMPI_ call.

More generally, you may use this interface to change the behavior of MPI routines without modifying your source code. For example, suppose you believe that most of the time spent in some collective call such as MPI_Allreduce is due to the synchronization of the processes that is implicit to such a call. Then, you might compile a wrapper such as the one shown below, and link it into your code before -lmpi. The effect will be that time profiled by MPI_Allreduce calls will be due will be due exclusively" to the all-reduce operation, with synchronization costs attributed to barrier operations.

subroutine MPI_Allreduce(x,y,n,type,op,comm,ier)
integer x(*), y(*), n, type, op, comm, ier
call PMPI_Barrier(comm,ier)
call PMPI_Allreduce(x,y,n,type,op,comm,ier)
end 

Profiling wrappers or libraries may be used even with application binaries that have already been linked. See the Solaris man page for ld for more information on the environment variable LD_PRELOAD.

Profiling libraries are available from independent sources for use with Sun MPI. Typically, their functionality is rather limited compared to that of Prism with TNF, but for certain applications their use may be more convenient or they may represent useful springboards for particular, customized profiling activities. An example of a profiling library is included in the multiprocessing environment (MPE) from Argonne National Laboratory. For more information on this library and on the MPI profiling interface, see the Sun MPI 4.0 Programming and Reference Guide.

Using gprof

The Solaris utility gprof may be used for multiprocess codes, such as those that use MPI. It can be helpful for profiling user routines, which are not automatically instrumented with TNF probes by Sun HPC ClusterTools 3.0 software. Several points should be noted:

For more information about gprof, see the gprof man page.

The tnfdump Utility

You can implement custom post-processing of TNF data using the tnfdump utility, which converts TNF trace files, such as the one produced by Prism, into an ASCII listing of timestamps, time differentials, events, and probe arguments.

To use this command, specify

% tnfdump filename 

where filename is the name of the TNF trace data file produced by Prism (not by prex).

The resulting ASCII listing can be several times larger than the tracefile and may require a wide window for viewing. Nevertheless, it is full of valuable information.

For more information about the tnfdump command, see the tnfdump(1) man page.

TNF Data Collection Without Prism

Prism invokes TNF utilities to perform data collection, so it is possible to profile MPI programs directly without using Prism. Although Prism provides a number of ease-of-use facilities, such as representing process timelines according to MPI rank, and it reconciling timestamps when a job is distributed over many nodes and uses multiple clocks that are not synchronized with one another, Prism's own processes may affect profiling activity, so in certain cases bypassing Prism during data collection is desirable.The utility to perform TNF data collection directly is prex. To enable all probes, place the following commands in your .prexrc file. (Note the leading "." in the file name.)

enable $all
trace $all 
continue 

Then, remove old buffer files, run prex, and merge and view the data, as shown below.

Because prex does not correct for the effects of clock skew, it is useful only for MPI programs running on individual SMPs. Also, data collected by prex does not identify MPI ranks in the data--if you attempt to display prex data in tnfview, the VIDs (ranks) will be displayed in random order.

Using prex With CRE

% rm /tmp/trace-*
% mprun -np 4 prex -s 128 a.out

Using prex with LSF

% bsub -I -n 4 prex -s 128 a.out
% /opt/SUNWhpc/bin/sparcv7/tnfmerge -o a.tnf /tmp/trace-*
% /opt/SUNWhpc/bin/sparcv7/tnfview a.tnf

For more information on prex, see its Solaris man page.