C H A P T E R 4 - The Performance Analyzer Tool

C H A P T E R 4

The Performance Analyzer Tool

The Performance Analyzer is a graphical data-analysis tool that analyzes performance data collected by the Collector using the collect command, or the IDE, or the collector commands in dbx. The Collector gathers performance information to create an experiment during the execution of a process, as described in Chapter 3. The Performance Analyzer reads in such experiments, analyzes the data, and displays the data in tabular and graphical displays. A command-line version of the analyzer is available as er_print, which is described in Chapter 5

Starting the Performance Analyzer

To start the Performance Analyzer, type the following on the command line:

% analyzer [control-options][experiment-list]

Alternatively, use the Explorer in the IDE to navigate to an experiment and open it. The experiment-list command argument is a blank-separated list of experiment names, experiment group names, or both.

Multiple experiments or experiment groups can be specified on the command line. If you specify an experiment that has descendant experiments inside it, all descendant experiments are automatically loaded, but the display of data for the descendant experiments is disabled. To load individual descendant experiments you must specify each experiment explicitly or create an experiment group. To create an experiment group, create a plain text file whose first line is as follows:

#analyzer experiment group

Then add the names of the experiments on subsequent lines. The file extension must be erg.

You can also use the File menu in the Analyzer window to add experiments or experiment groups. To open experiments recorded on descendant processes, you must type the file name in the Open Experiment dialog box (or Add Experiment dialog box) because the file chooser does not permit you to open an experiment as a directory.

When the Analyzer displays multiple experiments, however they were loaded, data from all the experiments is aggregated.

You may preview an experiment or experiment group for loading by single-clicking on its name in either the Open Experiment dialog or the Add Experiment dialog.

You may also start the Performance Analyzer to record an experiment, from the command line as follows:

% analyzer [Java-options] [control-options] target [target-arguments]

The analyzer will initialize with the Collect Data panel showing the named target and its arguments, and settings for collecting an experiment. See Recording Experiments for details.

Analyzer Options

These options control the behavior of the Analyzer and are divided into 3 option groups:

Java options

Control options

Information options

Java Options

-j | --jdkhome jvm-path

Specify the path to the Java virtual machine (JVM) for running the analyzer. The default path is taken first by first examining environment variables for a path to Java, in the order JDK_1_4_HOME, then JDK_HOME, and then JAVA_PATH. If none is set, the default path is where Java was installed with the release, if any, and if not, as set by the user's PATH. (The terms "Java virtual machine" and "JVM" mean a virtual machine for the Java platform.)

-J jvm-options

Specify the JVM options.

Control Options

-f|--fontsize size

Specify the font size to be used in the Analyzer GUI.

-v|--verbose

Print version information and Java runtime arguments before starting.

Information Options

These options do not invoke the Performance Analyzer GUI, but print information about analyzer to standard output. The individual options below are stand-alone options; they cannot be combined with other analyzer options nor combined with target or experiment-list arguments.

-V|--version

Print version information and Java runtime arguments before starting.

-?|--h|--help

Print usage information and exit.

Performance Analyzer GUI

The Analyzer window has a menu bar, a tool bar, and a split pane that contains tabs for the various data displays.

The Menu Bar

The menu bar contains a File menu, a View menu, a Timeline menu, and a Help menu.

The File menu is for opening, adding, and dropping experiments and experiment groups. The File menu allows you to collect data for an experiment using the Performance Analyzer GUI. For details on using the Performance Analyzer to collect data, refer to Recording Experiments. From the File menu, a mapfile can also be created, which is used to optimize the size of an executable or optimize its effective cache behavior. For more details on mapfiles, refer to Generating Mapfiles and Function Reordering.

The View menu allows you to configure how experiment data is displayed.

The Timeline menu, as its name suggests, helps you to navigate the timeline display, described in the Analyzer Data Displays section below.

The Help menu provides an online help for the Performance Analyzer, provides a summary of new features, has a quick-reference and shortcut sections, and a troubleshooting section.

Toolbar

The toolbar provides sets of icons as menu shortcuts, and includes a Find function to help you find functions when using the data displays described below. For more details about the Find function, refer to Finding Text and Data

Analyzer Data Displays

The Performance Analyzer uses a split-window to divide the data presentation into two panes. Each pane is tabbed to allow you to select different data displays for the same experiment or experiment group.

Data Display, Left Pane

The left pane contains tabs for the principal Analyzer displays. These are as follows:

The Functions tab

The Callers-Callees tab

The Source tab

The Lines tab

The Disassembly tab

The PCs tab

The Timeline tab

The Leaklist tab

The Statistics tab

The Experiments tab

By default, the Functions tab is displayed. If the Analyzer is invoked without a target, you will be prompted for an experiment to open. Once the experiment is opened, the Experiments tab is displayed.

If dataspace-profiling data is recorded into the experiment being read, the Data Layout and Data Objects tabs will also show in addition to the tabs listed above.

The Functions Tab

The Functions tab shows a list consisting of functions and their metrics. The metrics are derived from the data collected in the experiment. Metrics can be either exclusive or inclusive. Exclusive metrics represent usage within the function itself. Inclusive metrics represent usage within the function and all the functions it called.

The list of available metrics for each kind of data collected is given in the collect(1) man page. Only the functions that have non-zero metrics are listed.

Time metrics are shown as seconds, presented to millisecond precision. Percentages are shown to a precision of 0.1%. If a metric value is precisely zero, its time and percentage is shown as "0." If the value is not exactly zero, but is smaller than the precision, its value is shown as "0.000" and its percentage as "0.0". Count metrics are shown as an integer count.

The metrics initially shown are based on the data collected and on the default settings read from various .rc files. When the Performance Analyzer is initially installed, the defaults are as follows:

For clock-based profiling, the default set consists of inclusive and exclusive User CPU time.

For synchronization delay tracing, the default set consists of inclusive synchronization wait count and inclusive synchronization time.

For hardware counter overflow profiling, the default set consists of inclusive and exclusive times (for counters that count in cycles) or event counts (for other counters).

For heap tracing, the default set consists of heap leaks and bytes leaked.

If more than one type of data has been collected, the default metrics for each type are shown.

The metrics that are shown can be changed or reorganized; see the online help for details.

To search for a function, use the Find tool in the toolbar. For further details about the Find tool, refer to Finding Text and Data.

The Callers-Callees Tab

The Callers-Callees tab shows the selected function in a pane in the center, with callers of that function in a pane above, and callees of that function in a pane below.

In addition to showing exclusive and inclusive metric values for each function, the tab also shows attributed metrics. For the selected function, the attributed metric represents the exclusive metric for that function. For the callees, the attribute metric represents the portion of the callee's inclusive metric that is attributable to calls from the center function. The sum of attributed metrics for the callees and the selected function will add up to the inclusive metric for the selected function.

For the callers, the attributed metrics represent the portion of the selected function's inclusive metric that is attributable to calls from the callers. The sum of the attributed metrics for all callers should also add up to the inclusive metric for the selected function.

The metrics shown in the Callers-Callees tab are can be changed or reorganized; see the online help for details.

Clicking once on a function in the caller or callee pane will select that function, causing the window contents to be redrawn so that the selected function appears in the center pane.

The Source Tab

If available, the Source tab shows the file containing the source code of the selected function, annotated with performance metrics for each source line. The full names of the source file, the corresponding object file and the load object are given in the column heading for the source code. In the rare case where the same source file is used to compile more than one object file, the Source tab shows the performance data for the object file containing the selected function.

The Analyzer looks for the file containing the selected function under the absolute pathname as recorded in the executable. If it is not there, it tries to find a file of the same basename in the current working directory. If you have moved the sources, or the experiment was recorded in a different file system, you can put a symbolic link from the current directory to the real source location in order to see the annotated source.

The source code is interleaved with any compiler commentary that has been selected for display. The classes of commentary shown can be set in the Set Data Presentation dialog box. The default classes can be set in a defaults file.

The metrics displayed in the Source tab can be changed or reorganized; see the online help for details.

Lines with metrics that are equal to or exceed a threshold percentage of the maximum of that metric for any line in the source file are highlighted, to make it easier to find the important lines. The threshold can be set in the Set Data Presentation dialog box. The default threshold can be set in a defaults file. Tick marks are shown next to the scrollbar, corresponding to the position of over-threshold lines within the source file. For example, if there were two over-threshold lines near the end of the source file, two ticks would be shown next to the scrollbar near the bottom of the source window. Positioning the scrollbar next to a tick mark will position the source lines displayed in the source window so that the corresponding over-threshold line is displayed.

The Lines Tab

The Lines tab shows a list consisting of source lines and their metrics. Source lines are labeled with the function from which they came and the line number and source file name. If no line-number information is available for a function, or the source file for the function is not known, all of the function's PCs appear aggregated into a single entry for the function in the lines display. PCs from functions that are from load-objects whose functions are hidden appear aggregated as a single entry for the load-object in the lines display. Selecting a line in the Lines tab shows all the metrics for that line in the Summary tab. Selecting the Source or Disassembly tab after selecting a line from the Lines tab positions the display at the appropriate line.

The Disassembly Tab

The Disassembly tab shows a disassembly listing of the object file containing the selected function, annotated with performance metrics for each instruction.

Interleaved within the disassembly listing is the source code, if available, and any compiler commentary chosen for display. The algorithm for finding the source file in the Disassembly tab is the same as the algorithm used in the Source tab. The classes of commentary shown can be set in the Set Data Presentation dialog box. The default classes can be set in a defaults file.

The analyzer highlights lines with metrics that are equal to or exceed a metric-specific threshold, to make it easier to find the important lines. The threshold can be set in the Set Data Presentation dialog box. The default threshold can be set in a defaults file. As with the source window, tick marks are shown next to the scrollbar, corresponding to the position of over-threshold lines within the disassembly code.

The PCs Tab

The PCs tab shows a list consisting of PCs and their metrics. PCs are labeled with the function from which they came and the offset within that function. PCs from functions that are from load-objects whose functions are hidden appear aggregated as a single entry for the load-object in the PCs display. Selecting a line in the PCs tab shows all the metrics for that PC in the Summary tab. Selecting the Source or Disassembly tab after selecting a line from the PCs tab positions the display at the appropriate line.

The DataObjects Tab

The DataObjects Tab shows the list of data objects with their metrics. The tab is visible by default if dataspace data is recorded in the experiment. Also, the tab can be shown by turning "Data Space Display" to "on" in the Data Presentation Panel, or setting a "datamode on" command in one of the er.rc files read when the Analyzer starts. The tab is applicable only to hardware-counter experiments where the aggressive backtracking option was enabled, and for source files that were compiled with the -xhwcprof option in the C compiler.

When enabled, it shows hardware-counter memory operation metrics against the various data structures and variables in the program.

The DataLayout Tab

The DataLayout tab shows the annotated dataobject layouts for all program data objects with data-derived metric data. The layouts appear in the order they are defined in the experiment's load objects. The tab shows each aggregate data object with the total metrics attributed to it, followed by all of its elements in offset order. Each element, in turn, has its own metrics and an indicator of its size and location in 32-byte blocks.

As with the DataObjects tab, the DataLayout tab is visible by default if dataspace data is recorded in the experiment. Also, the tab can be shown by turning "Data Space Display" to "on" in the Data Presentation Panel, or setting a "datamode on" command in one of the er.rc files read when the Analyzer starts.

The Timeline Tab

The Timeline tab shows a chart of the events and the sample points recorded by the Collector as a function of time. Data is displayed in horizontal bars. For each experiment there is a bar for sample data and a set of bars for each LWP. The set for an LWP consists of one bar for each data type recorded: clock-based profiling, hardware counter profiling, synchronization tracing, heap tracing and MPI tracing.

The bars that contain sample data show a color-coded representation of the time spent in each microstate for each sample. Samples are displayed as a period of time because the data in a sample point represents time spent between that point and the previous point. Clicking a sample displays the data for that sample in the Event tab.

The profiling data or tracing data bars show an event marker for each event recorded. The event markers consist of a color-coded representation of the call stack recorded with the event, as a stack of colored rectangles. Clicking a colored rectangle in an event marker selects the corresponding function and PC and displays the data for that event and that function in the Event tab. The selection is highlighted in both the Event tab and the Legend tab, and selecting the Source or Disassembly tab positions the tab display at the line corresponding to that frame in the call stack.

For some kinds of data, events may overlap and not be visible. Whenever there are two or more events at exactly the same position, only one is drawn; if there are two or more events within one or two pixels, all will be drawn. although they may not be visually distinguishable. In either case, a small gray tick mark will appear below the drawn events indicating the overlap.

The types of event-specific data shown in the Timeline tab can be changed, as can the colors mapped to selected functions. For details about using the Timeline tab, refer to the online help.

The LeakList Tab

The LeakList tab shows two lines, the upper one representing leaks, and the lower one representing allocations. Each contains a callstack, similar to that shown in the Timeline tab, in the center with a bar above proportional to the bytes leaked or allocated, and a bar below proportional to the number of leaks or allocations.

Selection of a leak or allocation displays the data for the selected leak or allocation in the Leak tab, and selects a frame in the call stack, just as it does in the Timeline tab.

The Statistics Tab

The Statistics tab shows totals for various system statistics summed over the selected experiments and samples. The totals are followed by the statistics for the selected samples of each experiment. For information on the statistics presented, see the getrusage(3C) and proc(4) man pages.

The Experiments Tab

The Experiments tab is divided into two panels. The top panel contains a tree that shows information on the experiments collected and on the load objects accessed by the collection target. The information includes any error messages or warning messages generated during the processing of the experiment or the load objects. The bottom panel lists error and warning messages from the analyzer session.

Data Display, Right Pane

The right pane contains the Summary tab, the Event tab, and the Legend tab. By default the Summary tab is displayed. The other two tabs are dimmed unless the Timeline tab is selected.

The Summary Tab

The Summary tab shows all the recorded metrics for the selected function or load object, both as values and percentages, and information on the selected function or load object. The Summary tab is updated whenever a new function or load object is selected in any tab.

The Event Tab

The Event tab shows detailed data for the event that is selected in the Timeline tab, including the event type, leaf function, LWP, thread and CPU IDs. Below the data panel the call stack is displayed with the color coding for each function in the stack. Clicking a function in the call stack makes it the selected function.

When a sample is selected in the Timeline tab, the Event tab shows the sample number, the start and end time of the sample, and the microstates with the amount of time spent in each microstate and the color coding.

This tab is only available when the Timeline tab is selected.

The Legend Tab

The Legend tab shows a legend for the mapping of colors to functions and to microstates in the Timeline tab. This tab is only available when the Timeline tab is selected in the left pane. You can change the color that is mapped to an item by selecting the item in the legend and selecting the color chooser from the Timeline menu, or by double-clicking the color box.

The Leak Tab

The Leak tab shows detailed data for the selected leak or allocation in the Leaklist tab. Below the data panel, the Leak tab shows the callstack at the time when the selected leak or allocation was detected. Clicking a function in the call stack makes it the selected function.

Setting Data Presentation Options

You can control the presentation of data from the Set Data Presentation dialog box. To open this dialog box, click on the Set Data Presentation button in the toolbar or choose Set Data Presentation from the View menu.

The Set Data Presentation dialog box has a tabbed pane with six tabs:

Metrics

Sort

Source/Disassembly

Formats

Timeline

The Metrics tab shows all of the available metrics. Each metric has check boxes in one or more of the columns labeled Time, Value and%, depending on the type of metric.

The Sort tab shows the order of the metrics presented, and the choice of metric to sort by.

The Source/Disassembly tab presents a list of checkboxes that you can use to select the information presented, as follows:

The compiler commentary that is shown in the source listing and the disassembly listing

The threshold for highlighting important lines in the source listing and the disassembly listing

The interleaving of source code in the disassembly listing

The metrics on the source lines in the disassembly listing

The display of instructions in hexadecimal in the disassembly listing.

The Formats tab presents a choice for the long form or the short form of C++ function names and Java method names. The tab also presents a choice for Java Mode, one of "on", "expert", or "off", and a choice for Data Space Display of either "Enable" or "Disable".

The Timeline tab presents choices for the types of event-specific data that are shown, the display of event-specific data for threads, LWPs or CPUs, the alignment of the call stack representation at the root or at the leaf, and the number of levels of the call stack that are displayed.

The Search Path tab allows the user to manage a list of directories to be used for searching for source and object files. The special name "$expts" refers to the experiments loaded; all other names should be paths in the file system.

The Set Data Presentation dialog box has a Save button to store the current settings.

Finding Text and Data

The analyzer has a Find tool in the toolbar, with two options for search targets that are given in a combo box. You can search for text in the Name column of the Functions or Callers-Callees tabs and in the code column of the Source and Disassembly tabs. You can search for a high-metric item in the Source and Disassembly tabs. The metric values on the lines containing high-metric items are highlighted in green. Use the arrow buttons next to the Find field to search up or down.

Showing or Hiding Functions

By default, all functions in each load object are shown in the Functions and Callers-Callees tabs. You can hide all the functions in a load object using the Show/Hide Functions dialog box; see the online help for details.

When the functions in a load object are hidden, the Functions and Callers-Callees tabs show a single entry representing the aggregate of all functions from the load object. Similarly, the Lines and PCs tabs show a single entry aggregating all PCs from all functions from the load object.

In contrast to filtering, metrics corresponding to hidden functions are still represented in some form in all displays.

Filtering Data

By default, data is shown in each tab for all experiments, all samples, all threads all LWPs, and all CPUs. A subset of data can be selected using the Filter Data dialog box. For details about using the Filter Data dialog box, refer to the online help.

Experiment Selection

The Analyzer allows filtering by experiment when more than one experiment is loaded. The experiments can be loaded individually, or by naming an experiment group.

Sample Selection

Samples are numbered from 1 to N, and any set of samples can be selected. The selection consists of a comma-separated list of sample numbers or ranges such as 15.

Thread Selection

Threads are numbered from 1 to N, and any set of threads can be selected. The selection consists of a comma-separated list of thread numbers or ranges. Profile data for threads only covers that part of the run where the thread was actually scheduled on an LWP.

LWP Selection

LWPs are numbered from 1 to N, and any set of LWPs can be selected. The selection consists of a comma-separated list of LWP numbers or ranges. If synchronization data is recorded, the LWP reported is the LWP at entry to a synchronization event, which might be different from the LWP at exit from the synchronization event.

On Linux systems, threads and LWPs are synonymous.

CPU Selection

Where CPU information is recorded (Solaris trademark 9), any set of CPUs can be selected. The selection consists of a comma-separated list of CPU numbers or ranges.

Recording Experiments

When the Analyzer is invoked with a target name and target arguments, it will come up with the Collect Experiment Dialog box posted. This allows you to record an experiment on the named target. If the Analyzer is invoked with no arguments, or with an experiment-list, a new experiment by recorded by opening the Collect Experiment Dialog Box.

The Collect Experiment dialog box has an upper panel used to specify the target, its arguments, and the various parameters to be used to run the experiment. They correspond to the options available in the collect command, as described in Chapter 3

Immediately below the panel is a Preview Command button, and a text field. When the button is pressed, the text field will be filled in with the collect command that would be used when the Run button is pushed.

There are two panels: one that will get output from the collector itself, and a second for output from the process.

A set of buttons, allows the following operations:

Launch the Run

Send Pause, Resume and Sample signals to the process during the run (enabled if the corresponding signals are specified),

Terminate the run

Close the panel.

If the panel is closed while an experiment is in progress, the experiment will continue. If the panel is reinvoked, it will show the experiment in progress, as if it had been left open during the run. If you attempt to exit the Analyzer while an experiment is in progress, a dialog box is posted asking whether you want the run terminated or allowed to continue.

Generating Mapfiles and Function Reordering

In addition to analyzing the data, the Analyzer also provides a function-reordering capability. Based on the data in an experiment, the Analyzer can generate a mapfile which, when used with the static linker (ld) to relink the application, creates an executable with a smaller working set size, or better I-cache behavior, or both.

The order of the functions that is recorded in the mapfile and used to reorder the functions in the executable is determined by the metric that is used for sorting the function list. Exclusive User CPU time or Exclusive CPU Cycle time are normally used for producing a mapfile. Some metrics, such as those from synchronization delay or heap tracing, or name or address do not produce meaningful ordering for a mapfile.

Defaults

The Analyzer processes directives from a .er.rc file in the current directory, if present; from a .er.rc file in the user's home directory, if present; and from a system-wide .er.rc file. These files can contain default settings for metrics, sorting, and for specifying compiler commentary options and highlighting thresholds for source and disassembly output. They also specify default settings for the Timeline tab, and for name formatting, setting Java trademark mode (javamode) and Data Space Display mode (datamode). It may also contain directives to control the search path for source and object files.