C H A P T E R 1 |
Overview of Program Performance Analysis Tools |
Developing high performance applications requires a combination of compiler features, libraries of optimized functions, and tools for performance analysis. Program Performance Analysis Tools describes the tools that are available to help you assess the performance of your code, identify potential performance problems, and locate the part of the code where the problems occur.
This manual deals primarily with the Collector and Performance Analyzer, a pair of tools that you use to collect and analyze performance data for your application. Both tools can be used from the command line or from a graphical user interface.
The Collector collects performance data using a statistical method called profiling and by tracing function calls. The data can include call stacks, microstate accounting information, thread-synchronization delay data, hardware-counter overflow data, MPI function call data, memory allocation data and summary information for the operating system and the process. The Collector can collect all kinds of data for C, C++ and Fortran programs, and it can collect profiling data for Java programs. It can collect data for dynamically-generated functions and for descendant processes. See Chapter 3 for information about the data collected and Chapter 4 for detailed information about the Collector. The Collector can be run from the IDE, from the dbx command line tool, and using the collect command.
The Performance Analyzer displays the data recorded by the Collector, so that you can examine the information. The Performance Analyzer processes the data and displays various metrics of performance at the level of the program, the functions, the source lines, and the instructions. These metrics are classed into five groups: timing metrics, hardware counter metrics, synchronization delay metrics, memory allocation metrics, and MPI tracing metrics. The Performance Analyzer also displays the raw data in a graphical format as a function of time. The Performance Analyzer can create a mapfile that you can use to improve the order of function loading in the program's address space. See Chapter 5 for detailed information about the Performance Analyzer, and Chapter 6 for information about the command-line analysis tool, er_print. Annotated source code listings and disassembly code listings that include compiler commentary but do not include performance data can be viewed with the er_src utility (see Chapter 8 for more information).
These two tools help to answer the following kinds of questions:
The Performance Analyzer window consists of a multi-tabbed display, with a menu bar and a toolbar. The tab that is displayed when the Performance Analyzer is started shows a list of functions for the program with exclusive and inclusive metrics for each function. The list can be filtered by load object, by thread, by LWP, and by time slice. For a selected function, another tab displays the callers and callees of the function. This tab can be used to navigate the call tree--in search of high metric values, for example. Two more tabs display source code that is annotated line-by-line with performance metrics and interleaved with compiler commentary, and disassembly code that is annotated with metrics for each instruction and interleaved with both source code and compiler commentary if they are available. The performance data is displayed as a function of time in another tab. Other tabs show details of the experiments and load objects, summary information for a function, and statistics for the process. The Performance Analyzer can be navigated from the keyboard as well as using a mouse.
The er_print command presents in plain text all the displays that are presented by the Performance Analyzer, with the exception of the Timeline display.
The Collector and Performance Analyzer are designed for use by any software developer, even if performance tuning is not the developer's main responsibility. These tools provide a more flexible, detailed, and accurate analysis than the commonly used profiling tools prof and gprof, and are not subject to an attribution error in gprof.
This manual also includes information about the following performance tools:
For more information about prof, gprof, and tcov, see Appendix A.
Note - The Performance Analyzer GUI and the IDE are part of the Forte for Java 4, Enterprise Edition for the Solaris operating environment, versions 8 and 9. |
Copyright © 2002, Sun Microsystems, Inc. All rights reserved.