Sun Studio 12 Update 1: Performance Analyzer

Limitations on Data Collection

This section describes the limitations on data collection that are imposed by the hardware, the operating system, the way you run your program, or by the Collector itself.

There are no limitations on simultaneous collection of different data types: you can collect any data type with any other data type, with the exception of count data.

The Collector can support up to 16K user threads. Data from additional threads is discarded, and a collector error is generated. To support more threads, set the SP_COLLECTOR_NUMTHREADS environment variable to a larger number.

By default, the Collector collects stacks that are, at most, up to 256 frames deep. To support deeper stacks, set the SP_COLLECTOR_STACKBUFSZ environment variable to a larger number.

Limitations on Clock-Based Profiling

The minimum value of the profiling interval and the resolution of the clock used for profiling depend on the particular operating environment. The maximum value is set to 1 second. The value of the profiling interval is rounded down to the nearest multiple of the clock resolution. The minimum and maximum value and the clock resolution can be found by typing the collect command with no arguments.

Runtime Distortion and Dilation with Clock-profiling

Clock-based profiling records data when a SIGPROF signal is delivered to the target. It causes dilation to process that signal, and unwind the call stack. The deeper the call stack, and the more frequent the signals, the greater the dilation. To a limited extent, clock-based profiling shows some distortion, deriving from greater dilation for those parts of the program executing with the deepest stacks.

Where possible, a default value is set not to an exact number of milliseconds, but to slightly more or less than an exact number (for example, 10.007 ms or 0.997 ms) to avoid correlations with the system clock, which can also distort the data. Set custom values the same way on SPARC platforms (not possible on Linux platforms).

Limitations on Collection of Tracing Data

You cannot collect any kind of tracing data from a program that is already running unless the Collector library, libcollector.so, had been preloaded. See Collecting Tracing Data From a Running Program for more information.

Runtime Distortion and Dilation with Tracing

Tracing data dilates the run in proportion to the number of events that are traced. If done with clock-based profiling, the clock data is distorted by the dilation induced by tracing events.

Limitations on Hardware Counter Overflow Profiling

Hardware counter overflow profiling has several limitations:

You can only collect hardware counter overflow data on processors that have hardware counters and that support overflow profiling. On other systems, hardware counter overflow profiling is disabled. UltraSPARC® processors prior to the UltraSPARC III processor family do not support hardware counter overflow profiling.
You cannot collect hardware counter overflow data on a system running the Solaris OS while the cpustat(1) command is running, because cpustat takes control of the counters and does not let a user process use the counters. If cpustat is started during data collection, the hardware counter overflow profiling is terminated and an error is recorded in the experiment.
You cannot use the hardware counters in your own code if you are doing hardware counter overflow profiling. The Collector interposes on the libcpc library functions and returns with a return value of -1 if the call did not come from the Collector. Your program should be coded so as to work correctly if it fails to get access to the hardware counters. If not coded to handle this, the program will fail under hardware counter profiling, or if the superuser invokes system-wide tools that also use the counters, or if the counters are not supported on that system.
If you try to collect hardware counter data on a running program that is using the hardware counter library by attaching dbx to the process, the experiment may be corrupted.

Note –
To view a list of all available counters, run the collect command with no arguments.

Runtime Distortion and Dilation With Hardware Counter Overflow Profiling

Hardware counter overflow profiling records data when a SIGEMT signal (on Solaris platforms) or a SIGIO signal (on Linux platforms) is delivered to the target. It causes dilation to process that signal, and unwind the call stack. Unlike clock-based profiling, for some hardware counters, different parts of the program might generate events more rapidly than other parts, and show dilation in that part of the code. Any part of the program that generates such events very rapidly might be significantly distorted. Similarly, some events might be generated in one thread disproportionately to the other threads.

Limitations on Data Collection for Descendant Processes

You can collect data on descendant processes subject to some limitations.

If you want to collect data for all descendant processes that are followed by the Collector, you must use the collect command with the one of the following options:

-F on option enables you to collect data automatically for calls to fork and its variants and exec and its variants.
-F all option causes the Collector to follow all descendant processes, including those due to calls to system, popen, and sh.
-F '=regexp' option enables data to be collected on all descendant processes whose name or lineage matches the specified regular expression.

See Experiment Control Options for more information about the -F option.

Limitations on OpenMP Profiling

Collecting OpenMP data during the execution of the program can be very expensive. You can suppress that cost by setting the SP_COLLECTOR_NO_OMP environment variable. If you do so, the program will have substantially less dilation, but you will not see the data from slave threads propagate up to the caller, and eventually to main()(), as it normally will if that variable is not set.

A new collector for OpenMP 3.0 is enabled by default in this release. It can profile programs that use explicit tasking. Programs built with earlier compilers can be profiled with the new collector only if a patched version of libmtsk.so is available. If this patched version is not installed, you can switch data collection to use the old collector by setting the SP_COLLECTOR_OLDOMP environment variable.

OpenMP profiling functionality is available only for applications compiled with the Sun Studio compilers, since it depends on the Sun Studio compiler runtime. For applications compiled with GNU compilers, only machine-level call stacks are displayed.

Limitations on Java Profiling

You can collect data on Java programs subject to the following limitations:

You should use a version of the Java 2 Software Development Kit (JDK) no earlier than JDK 6, Update 3. The Collector first looks for the JDK in the path set in either the JDK_HOME environment variable or the JAVA_PATH environment variable. If neither of these variables is set, it looks for a JDK in your PATH. If there is no JDK in your PATH, it looks for the java executable in /usr/java/bin/java. The Collector verifies that the version of the java executable it finds is an ELF executable, and if it is not, an error message is printed, indicating which environment variable or path was used, and the full path name that was tried.
You must use the collect command to collect data. You cannot use the dbx collector subcommands or the data collection capabilities of the IDE.
Applications that create descendant processes that run JVM software cannot be profiled.
If you want to use the 64-bit JVM software, you must use the -j on flag and specify the 64-bit JVM software as the target. Do not use java -d64 to collect data using the 64-bit JVM software. If you do so, no data is collected.
Some applications are not pure Java, but are C or C++ applications that invoke dlopen() to load libjvm.so, and then start the JVM software by calling into it. To profile such applications, set the SP_COLLECTOR_USE_JAVA_OPTIONS environment variable, and add the -j on option to the collect command line. Do not set the LD_LIBRARY_PATH environment variable for this scenario.

Runtime Performance Distortion and Dilation for Applications Written in the Java Programming Language

Java profiling uses the Java Virtual Machine Tools Interface (JVMTI), which can cause some distortion and dilation of the run.

For clock-based profiling and hardware counter overflow profiling, the data collection process makes various calls into the JVM software, and handles profiling events in signal handlers. The overhead of these routines, and the cost of writing the experiments to disk will dilate the runtime of the Java program. Such dilation is typically less than 10%.