C H A P T E R  1

Overview of the Performance Analyzer

Developing high performance applications requires a combination of compiler features, libraries of optimized functions, and tools for performance analysis. The Performance Analyzer manual describes the tools that are available to help you assess the performance of your code, identify potential performance problems, and locate the part of the code where the problems occur.

Starting the Performance Analyzer From the Integrated Development Environment

For information on starting the Performance Analyzer from the Integrated Development Environment (IDE), see the Performance Analyzer Readme, which is available through the documentation index at /installation_directory/docs/index.html. The default installation directory on Solaris platforms is /opt/SUNWspro. The default installation directory on Linux platforms is /opt/sun/sunstudio11. If the Sun Studio 11 compilers and tools are not installed in the /opt directory, ask your system administrator for the equivalent path on your system.

The Tools of Performance Analysis

This manual deals primarily with the Collector and Performance Analyzer, a pair of tools that you use to collect and analyze performance data for your application. Both tools can be used from the command line or from a graphical user interface.

The Collector and Performance Analyzer are designed for use by any software developer, even if performance tuning is not the developer's main responsibility. These tools provide a more flexible, detailed, and accurate analysis than the commonly used profiling tools prof and gprof, and are not subject to an attribution error in gprof.

These two tools help to answer the following kinds of questions:

The Collector Tool

The Collector tool collects performance data using a statistical method called profiling and by tracing function calls. The data can include call stacks, microstate accounting information, thread synchronization delay data, hardware counter overflow data, Message Passing Interface (MPI) function call data, memory allocation data, and summary information for the operating system and the process. The Collector can collect all kinds of data for C, C++ and Fortran programs, and it can collect profiling data for applications written in the Javatrademark programming language. It can collect data for dynamically-generated functions and for descendant processes. See Chapter 2 for information about the data collected and Chapter 3 for detailed information about the Collector. The Collector can be run from the Performance Analyzer GUI, from the IDE, from the dbx command line tool, and using the collect command.

The Performance Analyzer Tool

The Performance Analyzer tool displays the data recorded by the Collector, so that you can examine the information. The Performance Analyzer processes the data and displays various metrics of performance at the level of the program, the functions, the source lines, and the instructions. These metrics are classed into five groups:

The Performance Analyzer also displays the raw data in a graphical format as a function of time. The Performance Analyzer can create a mapfile that you can use to change the order of function loading in the program's address space, to improve performance.

See Chapter 4 and the online help in the IDE or the Performance Analyzer GUI for detailed information about the Performance Analyzer, and Chapter 6 for information about the command-line analysis tool, er_print.

Chapter 5 describes how you can use the Sun Studio performance tools to profile the kernel while the Solaristrademark Operating System (Solaris OS) is running a load.

Chapter 7 discusses topics related to understanding the performance analyzer and its data, including: how data collection works, interpreting performance metrics, call stacks and program execution, and annotated code listings. Annotated source code listings and disassembly code listings that include compiler commentary but do not include performance data can be viewed with the er_src utility (see Chapter 9 for more information).

Chapter 8 provides an understanding of the annotated source and disassembly, providing explanations about the different types of index lines and compiler commentary that the Performance Analyzer displays.

Chapter 9 describes how to copy, move, delete, archive, and export experiments.

The er_print Utility

The er_print utility presents in plain text all the displays that are presented by the Performance Analyzer, with the exception of the Timeline display.

The Performance Analyzer Window

Note - The following is a brief overview of the Performance Analyzer window. See Chapter 4 and the online help for a complete and detailed discussion of the functionality and features of the tabs discussed below.

The Performance Analyzer window consists of a multi-tabbed display, with a menu bar and a toolbar. The tab that is displayed when the Performance Analyzer is started shows a list of functions for the program with exclusive and inclusive metrics for each function. The list can be filtered by load object, by thread, by lightweight process (LWP), by CPU, and by time slice.

For a selected function, another tab displays the callers and callees of the function. This tab can be used to navigate the call tree--in search of high metric values, for example.

Two other tabs display source code that is annotated line-by-line with performance metrics and interleaved with compiler commentary, and disassembly code that is annotated with metrics for each instruction and interleaved with both source code and compiler commentary if they are available.

The performance data is displayed as a function of time in another tab.

Other tabs show details of the experiments and load objects, summary information for a function, memory leaks, and statistics for the process.

You can navigate the Performance Analyzer from the keyboard as well as with a mouse.