|C H A P T E R 4|
The Performance Analyzer Tool
The Performance Analyzer is a graphical data-analysis tool that analyzes performance data collected by the Collector using the collect command, the IDE, or the collector commands in dbx. The Collector gathers performance information to create an experiment during the execution of a process, as described in Chapter 3. The Performance Analyzer reads in such experiments, analyzes the data, and displays the data in tabular and graphical displays. A command-line version of the Analyzer is available as the er_print utility, which is described in Chapter 6
To start the Performance Analyzer, type the following on the command line:
Alternatively, use the Explorer in the IDE to navigate to an experiment and open it. The experiment-list command argument is a blank-separated list of experiment names, experiment group names, or both.
You can specify multiple experiments or experiment groups on the command line. If you specify an experiment that has descendant experiments inside it, all descendant experiments are automatically loaded, but the display of data for the descendant experiments is disabled. To load individual descendant experiments you must specify each experiment explicitly or create an experiment group. To create an experiment group, create a plain text file whose first line is as follows:
#analyzer experiment group
Then add the names of the experiments on subsequent lines. The file extension must be erg.
You can also use the File menu in the Analyzer window to add experiments or experiment groups. To open experiments recorded on descendant processes, you must type the file name in the Open Experiment dialog box (or Add Experiment dialog box) because the file chooser does not permit you to open an experiment as a directory.
When the Analyzer displays multiple experiments, however they were loaded, data from all the experiments is aggregated.
You can preview an experiment or experiment group for loading by single-clicking on its name in either the Open Experiment dialog or the Add Experiment dialog.
You can also start the Performance Analyzer from the command line to record an experiment as follows:
The Analyzer starts up with the Performance Tools Collect window showing the named target and its arguments, and settings for collecting an experiment. See Recording Experiments for details.
These options control the behavior of the Analyzer and are divided into three groups:
Specify the path to the JVM software for running the Analyzer. The default path is taken first by first examining environment variables for a path to Java, in the order JDK_HOME and then JAVA_PATH. If neither environment variable is set, the default path is where the Java2 Software Development Kit was installed by the Sun Studio installer, if any, and if not, as set by the user's PATH.
Specify the JVM options.
Specify the font size to be used in the Analyzer GUI.
Print version information and Java runtime arguments before starting.
These options do not invoke the Performance Analyzer GUI, but print information about analyzer to standard output. The individual options below are stand-alone options; they cannot be combined with other analyzer options nor combined with target or experiment-list arguments.
Print version information and Java runtime arguments before starting.
Print usage information and exit.
The Analyzer window has a menu bar, a tool bar, and a split pane that contains tabs for the various data displays.
The menu bar contains a File menu, a View menu, a Timeline menu, and a Help menu.
The File menu is for opening, adding, and dropping experiments and experiment groups. The File menu allows you to collect data for an experiment using the Performance Analyzer GUI. For details on using the Performance Analyzer to collect data, refer to Recording Experiments. From the File menu, you can also create a mapfile, which is used to optimize the size of an executable or optimize its effective cache behavior. For more details on mapfiles, refer to Generating Mapfiles and Function Reordering.
The View menu allows you to configure how experiment data is displayed.
The Timeline menu, as its name suggests, helps you to navigate the timeline display, described in Analyzer Data Displays.
The Help menu provides online help for the Performance Analyzer, provides a summary of new features, has quick-reference and shortcut sections, and has a troubleshooting section.
The toolbar provides sets of icons as menu shortcuts, and includes a Find function to help you find functions when using the data displays. For more details about the Find function, refer to Finding Text and Data
The Performance Analyzer uses a split-window to divide the data presentation into two panes. Each pane is tabbed to allow you to select different data displays for the same experiment or experiment group.
The left pane contains tabs for the principal Analyzer displays:
If you invoke the Analyzer without a target, you are prompted for an experiment to open.
If dataspace-profiling data is recorded into the experiment being read, the Data Layout and Data Objects tabs will also show in addition to the tabs listed above.
The Functions tab shows a list consisting of functions and their metrics. The metrics are derived from the data collected in the experiment. Metrics can be either exclusive or inclusive. Exclusive metrics represent usage within the function itself. Inclusive metrics represent usage within the function and all the functions it called.
The list of available metrics for each kind of data collected is given in the collect(1) man page. Only the functions that have non-zero metrics are listed.
Time metrics are shown as seconds, presented to millisecond precision. Percentages are shown to a precision of 0.01%. If a metric value is precisely zero, its time and percentage is shown as "0." If the value is not exactly zero, but is smaller than the precision, its value is shown as "0.000" and its percentage as "0.00". Because of rounding, percentages may not sum to exactly 100%. Count metrics are shown as an integer count.
The metrics initially shown are based on the data collected and on the default settings read from various .er.rc files. When the Performance Analyzer is initially installed, the defaults are as follows:
If more than one type of data has been collected, the default metrics for each type are shown.
The metrics that are shown can be changed or reorganized; see the online help for details.
To search for a function, use the Find tool in the toolbar. For further details about the Find tool, refer to Finding Text and Data.
The Callers-Callees tab shows the selected function in a pane in the center, with callers of that function in a pane above, and callees of that function in a pane below.
In addition to showing exclusive and inclusive metric values for each function, the tab also shows attributed metrics. For the selected function, the attributed metric represents the exclusive metric for that function. For the callees, the attribute metric represents the portion of the callee's inclusive metric that is attributable to calls from the center function. The sum of attributed metrics for the callees and the selected function will add up to the inclusive metric for the selected function.
For the callers, the attributed metrics represent the portion of the selected function's inclusive metric that is attributable to calls from the callers. The sum of the attributed metrics for all callers should also add up to the inclusive metric for the selected function.
The metrics shown in the Callers-Callees tab are can be changed or reorganized; see the online help for details.
Clicking once on a function in the caller or callee pane selects that function, causing the window contents to be redrawn so that the selected function appears in the center pane.
If available, the Source tab shows the file containing the source code of the selected function, annotated with performance metrics for each source line. The full names of the source file, the corresponding object file and the load object are given in the column heading for the source code. In the rare case where the same source file is used to compile more than one object file, the Source tab shows the performance data for the object file containing the selected function.
The Analyzer looks for the file containing the selected function under the absolute pathname as recorded in the executable. If the file is not there, the Analyzer tries to find a file of the same basename in the current working directory. If you have moved the sources, or the experiment was recorded in a different file system, you can put a symbolic link from the current directory to the real source location in order to see the annotated source.
When you select a function in the Functions tab and the Source tab is opened, the source file displayed is the default source context for that function. The default source context of a function is the file containing the function's first instruction, which, for C code, is the function's opening brace. Immediately following the first instruction, the annotated source file adds an index line for the function. The source window displays index lines as text in red italics within angle brackets in the form:
A function might have an alternate source context, which is another file that contains instructions attributed to the function. Such instructions might come from include files or from other functions inlined into the selected function. If there are any alternate source contexts, the beginning of the default source context includes a list of extended index lines that indicate where the alternate source contexts are located.
<Function: f, instructions from source file src.h>
Double clicking on an index line that refers to another source context opens the file containing that source context, at the location associated with the indexed function.
To aid navigation, alternate source contexts also start with a list of index lines that refer back to functions defined in the default source context and other alternate source contexts.
The source code is interleaved with any compiler commentary that has been selected for display. The classes of commentary shown can be set in the Set Data Presentation dialog box. The default classes can be set in a defaults file.
The metrics displayed in the Source tab can be changed or reorganized; see the online help for details.
Lines with metrics that are equal to or exceed a threshold percentage of the maximum of that metric for any line in the source file are highlighted to make it easier to find the important lines. The threshold can be set in the Set Data Presentation dialog box. The default threshold can be set in a defaults file. Tick marks are shown next to the scrollbar, corresponding to the position of over-threshold lines within the source file. For example, if there were two over-threshold lines near the end of the source file, two ticks would be shown next to the scrollbar near the bottom of the source window. Positioning the scrollbar next to a tick mark will position the source lines displayed in the source window so that the corresponding over-threshold line is displayed.
The Lines tab shows a list consisting of source lines and their metrics. Source lines are labeled with the function from which they came and the line number and source file name. If no line-number information is available for a function, or the source file for the function is not known, all of the function's PCs appear aggregated into a single entry for the function in the lines display. PCs from functions that are from load-objects whose functions are hidden appear aggregated as a single entry for the load-object in the lines display. Selecting a line in the Lines tab shows all the metrics for that line in the Summary tab. Selecting the Source or Disassembly tab after selecting a line from the Lines tab positions the display at the appropriate line.
The Disassembly tab shows a disassembly listing of the object file containing the selected function, annotated with performance metrics for each instruction.
Interleaved within the disassembly listing is the source code, if available, and any compiler commentary chosen for display. The algorithm for finding the source file in the Disassembly tab is the same as the algorithm used in the Source tab.
Just as with the Source tab, index lines are displayed in Disassembly tab. But unlike with the Source tab, index lines for alternate source contexts cannot be used directly for navigation purposes. Also, index lines for alternate source contexts are displayed at the start of where the #included or inlined code is inserted, rather than just being listed at the beginning of the Disassembly view. Code that is #included or inlined from other files shows as raw disassembly instructions without interleaving the source code. However, placing the cursor on one of these instructions and selecting the Source tab opensthe source file containing the #included or inlined code. Selecting the Disassembly tab with this file displayed opens the Disassembly view in the new context, thus displaying the disassembly code with interleaved source code.
The classes of commentary shown can be set in the Set Data Presentation dialog box. The default classes can be set in a defaults file.
The Analyzer highlights lines with metrics that are equal to or exceed a metric-specific threshold, to make it easier to find the important lines. You can set the threshold in the Set Data Presentation dialog box. You can set the default threshold in a defaults file. As with the Source tab, tick marks are shown next to the scrollbar, corresponding to the position of over-threshold lines within the disassembly code.
The PCs tab shows a list consisting of PCs and their metrics. PCs are labeled with the function from which they came and the offset within that function. PCs from functions that are from load-objects whose functions are hidden appear aggregated as a single entry for the load-object in the PCs display. Selecting a line in the PCs tab shows all the metrics for that PC in the Summary tab. Selecting the Source tab or Disassembly tab after selecting a line from the PCs tab positions the display at the appropriate line.
The DataObjects Tab shows the list of data objects with their metrics. The tab is visible by default if dataspace data is recorded in the Formats tab of the Set Data Presentation dialog box, or setting a datamode on command in one of the .er.rc files read when the Analyzer starts. The tab is applicable only to hardware counter overflow experiments where the aggressive backtracking option was enabled, and for source files that were compiled with the -xhwcprof option in the C compiler.
When enabled, it shows hardware counter memory operation metrics against the various data structures and variables in the program.
The DataLayout tab shows the annotated data object layouts for all program data objects with data-derived metric data. The layouts appear in the order they are defined in the experiment's load objects. The tab shows each aggregate data object with the total metrics attributed to it, followed by all of its elements in offset order. Each element, in turn, has its own metrics and an indicator of its size and location in 32-byte blocks.
As with the DataObjects tab, the DataLayout tab is visible by default if dataspace data is recorded in the experiment. Also, the tab can be shown by turning Data Space Display to on in the Data Presentation Panel, or setting a datamode on command in one of the er.rc files read when the Analyzer starts.
The Timeline tab shows a chart of the events and the sample points recorded by the Collector as a function of time. Data is displayed in horizontal bars. For each experiment there is a bar for sample data and a set of bars for each LWP. The set for an LWP consists of one bar for each data type recorded: clock-based profiling, hardware counter profiling, synchronization tracing, heap tracing and MPI tracing.
The bars that contain sample data show a color-coded representation of the time spent in each microstate for each sample. Samples are displayed as a period of time because the data in a sample point represents time spent between that point and the previous point. Clicking a sample displays the data for that sample in the Event tab.
The profiling data or tracing data bars show an event marker for each event recorded. The event markers consist of a color-coded representation of the call stack recorded with the event, as a stack of colored rectangles. Clicking a colored rectangle in an event marker selects the corresponding function and PC and displays the data for that event and that function in the Event tab. The selection is highlighted in both the Event tab and the Legend tab, and selecting the Source tab or Disassembly tab positions the tab display at the line corresponding to that frame in the call stack.
For some kinds of data, events may overlap and not be visible. Whenever two or more events would appear at exactly the same position, only one is drawn; if there are two or more events within one or two pixels, all are drawn. although they may not be visually distinguishable. In either case, a small gray tick mark appeas below the drawn events indicating the overlap.
You can change the types of event-specific data shown in the Timeline tab, as well as the colors mapped to selected functions. For details about using the Timeline tab, refer to the online help.
The LeakList tab shows two lines, the upper one representing leaks, and the lower one representing allocations. Each contains a call stack, similar to that shown in the Timeline tab, in the center with a bar above proportional to the bytes leaked or allocated, and a bar below proportional to the number of leaks or allocations.
Selection of a leak or allocation displays the data for the selected leak or allocation in the Leak tab, and selects a frame in the call stack, just as it does in the Timeline tab.
The Statistics tab shows totals for various system statistics summed over the selected experiments and samples. The totals are followed by the statistics for the selected samples of each experiment. For information on the statistics presented, see the getrusage(3C) and proc(4) man pages.
The Experiments tab is divided into two panels. The top panel contains a tree that is divided into two areas: a Notes area and an Info area.
The Notes area displays the contents of any notesfile in the experiment. You can edit the notes by typing directly in the Notes area. The Notes area includes its own toolbar with buttons for saving or discarding the notes and for undoing or redoing any edits since the last save.
The Info area contains information about the experiments collected and the load objects accessed by the collection target, including any error messages or warning messages generated during the processing of the experiment or the load objects.
The bottom panel lists error and warning messages from the Analyzer session.
The right pane contains the Summary tab, the Event tab, and the Legend tab. By default the Summary tab is displayed. The other two tabs are dimmed unless the Timeline tab is selected.
The Summary tab shows all the recorded metrics for the selected function or load object, both as values and percentages, and information on the selected function or load object. The Summary tab is updated whenever a new function or load object is selected in any tab.
The Event tab shows detailed data for the event that is selected in the Timeline tab, including the event type, leaf function, LWP ID, thread ID, and CPU ID. Below the data panel the call stack is displayed with the color coding for each function in the stack. Clicking a function in the call stack makes it the selected function.
When a sample is selected in the Timeline tab, the Event tab shows the sample number, the start and end time of the sample, and the microstates with the amount of time spent in each microstate and the color coding.
The Legend tab shows a legend for the mapping of colors to functions and to microstates in the Timeline tab. You can change the color that is mapped to an item by selecting the item in the legend and selecting the color chooser from the Timeline menu, or by double-clicking the color box.
The Leak tab shows detailed data for the selected leak or allocation in the Leaklist tab. Below the data panel, the Leak tab shows the call stack at the time when the selected leak or allocation was detected. Clicking a function in the call stack makes it the selected function.
You can control the presentation of data from the Set Data Presentation dialog box. To open this dialog box, click the Set Data Presentation button in the toolbar or choose View Set Data Presentation.
The Set Data Presentation dialog box has a tabbed pane with six tabs:
The Metrics tab shows all of the available metrics. Each metric has check boxes in one or more of the columns labeled Time, Value and%, depending on the type of metric. Alternatively, instead of setting individual metrics, you can set all metrics at once by selecting or deselecting the check boxes in the bottom row of the dialog box and then clicking on the Apply to all metrics button.
The Sort tab shows the order of the metrics presented, and the choice of metric to sort by.
The Source/Disassembly tab presents a list of checkboxes that you can use to select the information presented, as follows:
The Formats tab presents a choice for the long form, short form, or mangled form of C++ function names and Java method names. In addition, a checkbox labelled Append SO name to Function name adds the name of the shared object, where the function or method is located, to the end of the function or method name. The Formats tab also presents a choice for Java Mode of On, Expert, or Off; and a choice for Data Space Display of either Enable or Disable.
The Timeline tab presents choices for the types of event-specific data that are shown, the display of event-specific data for threads, LWP, or CPUs; the alignment of the call stack representation at the root or at the leaf; and the number of levels of the call stack that are displayed.
The Search Path tab allows you to manage a list of directories to be used for searching for source and object files. The special name $expts refers to the experiments loaded; all other names should be paths in the file system.
The Set Data Presentation dialog box has a Save button to store the current settings.
Note - Since the defaults for the Analyzer, the er_print utility and the er_src utility are set by a common .er.rc file, output from the er_print utility and the er_src utility is affected as a result of saving changes in the Analyzer's Set Data Preferences dialog box.
The Analyzer has a Find tool in the toolbar, with two options for search targets that are given in a combo box. You can search for text in the Name column of the Functions tab or Callers-Callees tabs and in the code column of the Source tab and Disassembly tab. You can search for a high-metric item in the Source tab and Disassembly tab. The metric values on the lines containing high-metric items are highlighted in green. Use the arrow buttons next to the Find field to search up or down.
By default, all functions in each load object are shown in the Functions tab and Callers-Callees tab. You can hide all the functions in a load object using the Show/Hide Functions dialog box; see the online help for details.
When the functions in a load object are hidden, the Functions tab and Callers-Callees tab show a single entry representing the aggregate of all functions from the load object. Similarly, the Lines tab and PCs tab show a single entry aggregating all PCs from all functions from the load object.
In contrast to filtering, metrics corresponding to hidden functions are still represented in some form in all displays.
By default, data is shown in each tab for all experiments, all samples, all threads, all LWPs, and all CPUs. A subset of data can be selected using the Filter Data dialog box. For details about using the Filter Data dialog box, refer to the online help.
The Analyzer allows filtering by experiment when more than one experiment is loaded. The experiments can be loaded individually, or by naming an experiment group.
Samples are numbered from 1 to N, and you can select any set of samples. The selection consists of a comma-separated list of sample numbers or ranges such as 15.
Threads are numbered from 1 to N, and you can selecte any set of threads. The selection consists of a comma-separated list of thread numbers or ranges. Profile data for threads only covers that part of the run where the thread was actually scheduled on an LWP.
LWPs are numbered from 1 to N, and you can select any set of LWPs. The selection consists of a comma-separated list of LWP numbers or ranges. If synchronization data is recorded, the LWP reported is the LWP at entry to a synchronization event, which might be different from the LWP at exit from the synchronization event.
On Linux systems, threads and LWPs are synonymous.
Where CPU information is recorded (Solaris 9 OS), any set of CPUs can be selected. The selection consists of a comma-separated list of CPU numbers or ranges.
When you invoke the Analyzer with a target name and target arguments, it starts up with the Performance Tools Collect window open, which allows you to record an experiment on the named target. If you invoke the Analyzer with no arguments, or with an experiment list, you can record a new experiment by choosing File Collect Experiment to open the Performance Tools Collect window
The Collect Experiment tab of the Performance Tools Collect window has a panel you use to specify the target, its arguments, and the various parameters to be used to run the experiment. They correspond to the options available in the collect command, as described in Chapter 3.
Immediately below the panel is a Preview Command button, and a text field. When you click the button, the text field is filled in with the collect command that would be used when you click the Run button.
In the Data to Collect tab, you can select the types of data you want to collect.
The Input/Output tab has two panels: one that receives output from the Collector itself, and a second for output from the process.
A set of buttons allows the following operations:
If you close the window while an experiment is in progress, the experiment continues. If you reopen the window, it shows the experiment in progress, as if it had been left open during the run. If you attempt to exit the Analyzer while an experiment is in progress, a dialog box is posted asking whether you want the run terminated or allowed to continue.
In addition to analyzing the data, the Analyzer also provides a function-reordering capability. Based on the data in an experiment, the Analyzer can generate a mapfile which, when used with the static linker (ld) to relink the application, creates an executable with a smaller working set size, or better I-cache behavior, or both.
The order of the functions that is recorded in the mapfile and used to reorder the functions in the executable is determined by the metric that is used for sorting the function list. Exclusive User CPU time or Exclusive CPU Cycle time are normally used for producing a mapfile. Some metrics, such as those from synchronization delay or heap tracing, or name or address do not produce meaningful ordering for a mapfile.
The Analyzer processes directives from an .er.rc file in the current directory, if present; from a .er.rc file in your home directory, if present; and from a system-wide .er.rc file. These files can contain default settings for metrics, for sorting, and for specifying compiler commentary options and highlighting thresholds for source and disassembly output. They also specify default settings for the Timeline tab, and for name formatting, setting Java mode (javamode) and Data Space Display mode (datamode). The files can also contain directives to control the search path for source files and object files.
In the Analyzer GUI, you can save an .er.rc file by clicking the Save button in the Set Data Presentation dialog, which you can open from the View menu. Saving an .er.rc file from the Set Data Presentation dialog not only affects subsequent invocations of the Analyzer, but also the er_print utility and er_src utility.
The Analyzer puts a message into its Errors/Warning Logs areas naming the user .er.rc files it processed.