1. Introducing The Oracle Solaris Studio 12.2 Release
Changes to Performance Analyzer Tool
Enhancements to the Callers-Callees Tab
8. Known Problems, Limitations, and Workarounds in This Release
This section describes the new and changed features in this release of the Solaris Studio Performance Analyzer and related tools. For details, see the Oracle Solaris Studio 12.2: Performance Analyzer manual.
The experiment format has been extended, but the version number is currently unchanged (10.1).
The tools can read experiments created with the FCS version of Oracle Solaris Studio 12.2, as well as with the FCS and patched versions of Studio 12 Update 1, and Studio 12.
Experiments created with a version earlier than Sun Studio 12 cannot be read with Oracle Solaris Studio 12.2 tools.
The Performance Analyzer tool features the following enhancements.
The new Call Tree tab displays a dynamic call graph of the program as a tree with each function call shown as a node that you can expand and collapse. An expanded function node shows all the function calls made by the function, plus performance metrics for those function calls. When you select a node, the Summary tab on the right displays metrics for the function call and its callees. The percentages given for attributed metrics are the percentages of the total program metrics.
To easily find the branch that is consuming the most time, right click any node and select Expand Hottest Branch.
You can construct a call stack fragment in the center Stack Fragment panel, one call at a time, by adding callers and callees to the call stack. Callers are functions that call the fragment; callees are functions called from that fragment. Features include:
As you add and remove functions in the stack fragment, the metrics are computed for the entire fragment and displayed next to the last function in the fragment.
You can right-click on a caller to add a function to the top of the stack fragment, and right-click on a callee to add a function at the bottom. You can also use buttons above the Stack Fragment panel to manipulate the call stack fragment.
You can use the Back and Forward buttons located above the Stack Fragment panel to go through the history of your changes to the call stack fragment.
You can filter data in the Callers-Callees tab from the context (right-click) menu.
Performance Analyzer now enables you to compare experiments that have been collected on the same executable. This feature is only partially implemented and might change in a subsequent release. In the current release, comparing experiments works as follows:
If you open two or more experiments or experiment-groups, the data is aggregated by default.
If you add compare on to your .er.rc file, and open two or more experiments or experiment-groups in the Performance Analyzer, the data is shown in a comparison mode.
In comparison mode, the data from the experiments or groups is shown in adjacent columns with an additional header line that shows the experiment or group name. The columns are shaded in color to distinguish the experiments or groups.
The tabs that support comparing experiments are Functions, Callers-Callees, Source, Disassembly, Lines, and PCs. You can disable and enable comparison mode from a context menu in the any of these tabs.
You can also enable and disable comparison mode in Analyzer's Set Data Presentation dialog using the Compare Experiments option in the Formats tab.
Highlighting in the Source tab shows hot (highest CPU usage) lines in orange, and shows non-zero metric lines in yellow.
A context menu in the Source tab allows navigation to the next or previous hot line, or non-zero metric line.
You can create JPG files of the Timeline, the MPI Timeline, and the MPI Charts through the Print menu.
Source and Dissasembly of HotSpot-compiled code exploits better mappings, where recorded.
The er_print command is changed in this release as follows:
New commands for controlling the Callers-Callees list now support call stack-building. The new er_print subcommands cprepend, cappend, crmfirst, and crmlast add or remove functions from the call stack fragment you are building. After each command, the caller-callee data for the current fragment is written.
A new calltree command prints the dynamic call graph of the target showing the hierarchical metrics for all functions.
A new describe command describes the recorded data from the experiments, and prints the tokens available for filtering.
Source and Dissasembly of HotSpot-compiled code exploits better mappings, where recorded.
The er_print command now enables you to compare experiments that have been collected on the same executable. This feature is only partially implemented and might change in a subsequent release. In the current release, comparing experiments works as follows:
When er_print is invoked on two or more experiments or experiment-groups, the data will be aggregated.
If you put compare on in your .er.rc file, and run er_print on two or more experiments or experiment-groups, the data is shown in a comparison mode.
In comparison mode, the data from the experiments or groups is shown in adjacent columns on the Functions list, the Caller-callees list, and the Source and Disassembly lists. The columns are shown in the order of the loading of the experiments or groups, with an additional header line giving the experiment or group name. Comparison mode is enabled and disabled with the compare command.
The collect command is changed in this release as follows:
The default setting for following descendants has been changed to -F on .
MPI experiments with any release of Sun HPC ClusterTools, now known as Oracle Message Passing Toolkit, can be specified by -M OMPT or -M CT
MPI experiments now also follow descendant processes by default.
Handling of post-processing for MPI tracing experiments is improved.
Support is added for hardware counter profiling on Oracle Enterprise Linux.
Hardware counter aliases have been improved, and support is added for hardware counter profiling on the following processors:
SPARC64 VI and VII
Intel Core i7: Family 6, Models 30, 31, 37, 44, and 46 (including Nehalem EP and EX)
AMD Family 10h and 11h
Experimental support for profiling scripts has been implemented, and may change in a subsequent release. To profile a script, set the environment variable SP_COLLECTOR_SKIP_CHECKEXEC and pass the script name to collect.
Java profiling has been enhanced to provide more detailed information for source-line mappings for HotSpot-compiled code. The Java profiling enhancement is supported for JDK 1.6u20 or later JDK 1.6 updates, and from JDK 1.7.0-ea-b85 or later JDK 1.7 updates.
The default size limit for experiments has been removed. You can use the -L option to set a size limit.
The collector subcommand for the dbx debugger is changed as follows:
Hardware counter aliases have been improved, and support is added for hardware counter profiling on the following processors:
SPARC64 VI and VII
Intel Core i7: Family 6, Models 30, 31, 37, 44, and 46 (including Nehalem EP and EX)
AMD Family 10h and 11h
The default size limit for experiments has been removed. The collector limit command can be used to set a size limit.
The command for profiling a Solaris kernel is changed so that er_kernel will do the following when any of the signals SIGINT, SIGTERM or SIGQUIT are sent to the process:
Catch SIGINT, SIGTERM or SIGQUIT
Terminate the experiment
Run er_archive if -A off is not specified
The er_generic command generates an experiment from text files containing profile information. The simulated experiment can then be examined using the Performance Analyzer or er_print command. See the er_generic(1) man page for more information.
By default, the en_desc command now reads all descendants.