Developing high performance applications requires a combination of compiler features, libraries of optimized functions, and tools for performance analysis. The Performance Analyzer manual describes the tools that are available to help you assess the performance of your code, identify potential performance problems, and locate the part of the code where the problems occur.
For information on starting the Performance Analyzer from the Integrated Development Environment (IDE), see the Performance Analyzer Readme, which is available through the documentation index at /installation_directory/docs/index.html. The default installation directory on Solaris platforms is /opt/SUNWspro. The default installation directory on Linux platforms is /opt/sun/sunstudio12. If the Sun Studio 12 compilers and tools are not installed in the /opt directory, ask your system administrator for the equivalent path on your system.
This manual describes the Collector and Performance Analyzer, a pair of Sun Studio tools that you use to collect and analyze performance data for your application. Both tools can be used from the command line or from a graphical user interface.
The Collector and Performance Analyzer are designed for use by any software developer, even if performance tuning is not the developer’s main responsibility. These tools provide a more flexible, detailed, and accurate analysis than the commonly used profiling tools prof and gprof, and are not subject to an attribution error in gprof.
The Collector and Performance Analyzer tools help to answer the following kinds of questions:
How much of the available resources does the program consume?
Which functions or load objects are consuming the most resources?
Which source lines and instructions are responsible for resource consumption?
How did the program arrive at this point in the execution?
Which resources are being consumed by a function or load object?
The Collector tool collects performance data using a statistical method called profiling and by tracing function calls. The data can include call stacks, microstate accounting information, thread synchronization delay data, hardware counter overflow data, Message Passing Interface (MPI) function call data, memory allocation data, and summary information for the operating system and the process. The Collector can collect all kinds of data for C, C++ and Fortran programs, and it can collect profiling data for applications written in the JavaTM programming language. It can collect data for dynamically-generated functions and for descendant processes. See Chapter 2, Performance Data for information about the data collected and Chapter 3, Collecting Performance Data for detailed information about the Collector. The Collector can be run from the Performance Analyzer GUI, from the IDE, from the dbx command line tool, and using the collect command.
The Performance Analyzer tool displays the data recorded by the Collector, so that you can examine the information. The Performance Analyzer processes the data and displays various metrics of performance at the level of the program, the functions, the source lines, and the instructions. These metrics are classed into five groups:
Clock profiling metrics
Hardware counter metrics
Synchronization delay metrics
Memory allocation metrics
MPI tracing metrics
The Performance Analyzer also displays the raw data in a graphical format as a function of time. The Performance Analyzer can create a mapfile that you can use to change the order of function loading in the program’s address space, to improve performance.
See Chapter 4, The Performance Analyzer Tool and the online help in the IDE or the Performance Analyzer GUI for detailed information about the Performance Analyzer.
Chapter 5, Kernel Profiling describes how you can use the Sun Studio performance tools to profile the kernel while the SolarisTM Operating System (Solaris OS) is running a load.
Chapter 6, The er_print Command Line Performance Analysis Tool describes how to use the er_print command line interface to analyze the data collected by the Collector.
Chapter 7, Understanding the Performance Analyzer and Its Data discusses topics related to understanding the performance analyzer and its data, including: how data collection works, interpreting performance metrics, call stacks and program execution, and annotated code listings. Annotated source code listings and disassembly code listings that include compiler commentary but do not include performance data can be viewed with the er_src utility (see Chapter 9, Manipulating Experiments for more information).
Chapter 8, Understanding Annotated Source and Disassembly Data provides an understanding of the annotated source and disassembly, providing explanations about the different types of index lines and compiler commentary that the Performance Analyzer displays.
Chapter 9, Manipulating Experiments describes how to copy, move, delete, archive, and export experiments.
The er_print utility presents in plain text all the displays that are presented by the Performance Analyzer, with the exception of the Timeline display.
The Sun Studio software includes an additional profiling tool called tcov, which generates exact counts of the number of times each statement in a program is executed. See the tcov(1) man page for more information about this tool.
The following is a brief overview of the Performance Analyzer window. See Chapter 4, The Performance Analyzer Tool and the online help for a complete and detailed discussion of the functionality and features of the tabs discussed below.
The Performance Analyzer window consists of a multi-tabbed display, with a menu bar and a toolbar. The tab that is displayed when the Performance Analyzer is started shows a list of functions for the program with exclusive and inclusive metrics for each function. The list can be filtered by load object, by thread, by lightweight process (LWP), by CPU, and by time slice.
For a selected function, another tab displays the callers and callees of the function. This tab can be used to navigate the call tree—in search of high metric values, for example.
Two other tabs display source code that is annotated line-by-line with performance metrics and interleaved with compiler commentary, and disassembly code that is annotated with metrics for each instruction and interleaved with both source code and compiler commentary if they are available.
The performance data is displayed as a function of time in another tab.
Other tabs show details of the experiments and load objects, summary information for a function, memory leaks, and statistics for the process.
Other tabs show Index Objects, Memory Objects, Data Objects, Data Layout, Lines, and PCs. See the Analyzer Data Displays for more information about each tab.
For experiments that have recorded Thread Analyzer data, tabs for data Races and Deadlocks are also available. Tabs are shown only if the loaded experiments have data supporting them.
See the Sun Studio 12: Thread Analyzer User’s Guide for more information about Thread Analyzer.
You can navigate the Performance Analyzer from the keyboard as well as with a mouse.
The Solaris OS has long provided two standard UNIX® profiling tools prof and gprof. The prof utility generates a statistical profile of the CPU time used by a program and an exact count of the number of times each function is entered. The gprof utility generates a statistical profile of the CPU time used by a program, along with an exact count of the number of times each function is entered and the number of times each arc (caller-callee pair) in the program’s call graph is traversed. These tools, while useful for simple programs, are not adequate for tuning complex programs. See the prof(1) and gprof(1) man pages for more information about these standard tools.