The following sections offer cautions and suggestions about using TNF probes to analyze the performance of your Sun MPI programs.
You can reuse TNF trace files. A few considerations:
TNF output files can be saved and viewed, but not updated.
You can redisplay TNF trace files. You should take the normal precautions to name your trace files in order to avoid confusing versions of trace data gathered in different sessions.
To display data from multiple TNF files, open multiple instances of tnfview.
Enable probes based on the characteristics of your source code. For example, if you are interested in the performance of a specific function in your code, and the routines that precede and follow that function are collective routines, enable the collective probes.
When examining a trace file from an MPI program in tnfview, look for events in the Timeline view where synchronization is poor, or where processes are idle. Look for places where sends, receives, or waits spend too much time idle. Create intervals of the start and end probes of blocking sends, receives, and waits, then generate a histogram and look for the taller columns.
In many, if not all programs, enabling only probes on point-to-point routines and collectives will provide enough information to initiate performance analysis.
When collecting TNF data, Prism creates a trace file for every process. Using the optional size argument, You can specify the maximum size (in kilobytes) of the output trace files used by each process. The default size is 128Kbytes. The output trace files are limited in size--once a file has been filled, more recent trace events overwrite the oldest ones. The following tnffile command example requests a trace file of 8192 Kbytes (8 Mbytes):
(prism all) tnffile myfile.tnf 8192
Since the TNF trace data buffer is limited in size, beware of allowing the trace data from the probes you are interested in to be overwritten by trace data from subsequent probes. For example, data from interesting events may be lost if those events occurred just prior to an area of your code that generates a lot of probe data. To reduce the chance that your probe data buffers are overwhelmed by especially busy sections of your code, use the tnfcollection command as an event action specifier (as described in "Collecting Performance Data") to focus attention on the most interesting routines.
You can also set the optional tnffile size argument to as large a value as your /usr/tmp allows. By enlarging the size of the trace data buffers with this command, you can reduce some of the probability that interesting data will get overwritten.
You may change the timing characteristics of your program by adding probes (even when those probes are disabled). This can be especially significant when your code includes loops that contain MPI calls.
Changing which probes you have enabled or disabled also changes the timing of your program. Perturbations can be especially significant when probing MPI routines that have very fine-grained communications.
The operating overhead incurred when collecting, processing, and viewing performance analysis trace data has effects on both storage and time.
The volume of trace data can exceed the storage capacity of the target directory. It may be important to monitor the capacity of /usr/tmp (or an alternative directory, if you have specified one) to avoid encountering capacity limits.
The activity of generating probe records slows performance by a predictable amount. Assuming that you run TNF-instrumented code, compiled by version 4.2 compilers, on a 167 mHz SPARC, the operating overhead introduced by TNF probes is shown below:,
Table 6-8 Operating Overhead Introduced by TNF ProbesProbe Status | SPARC Instructions | Time (in nanoseconds) |
---|---|---|
Disabled | 5 | 12 |
Enabled | 24 | 27 |
Highly cyclical code is a good example of code that can benefit from TNF performance analysis, such as in a program that alternates between broadcasts and gathers. For example, look for evidence of bad load balancing, such as barrier:compute cycles where the compute phase in one rank is far shorter than others, spending more time in barrier than the other ranks.
You can create intervals based on library routines that enable you to measure the timing of your own code, not just the timing of the library routines themselves. Create intervals that combine an *_End event that precedes the routines you want to measure with a corresponding *_Start event following those routines (the reverse of normal order).
You can use Prism's TNF performance analysis features with or without using the -g compiler option. For further information about the effects of using the -g option, see " Compiling and Linking Your Program". For information on combining the -g option with optimizations, see "Combining Debug and Optimization Options".
Ragged edges can appear in your data. Since message passing activity in different processes can vary, the earliest time when a trace file contains interesting data can vary from process to process