Skip Navigation Links | |
Exit Print View | |
Oracle Solaris Studio 12.3: Performance Analyzer Oracle Solaris Studio 12.3 Information Library |
1. Overview of the Performance Analyzer
3. Collecting Performance Data
Compiling and Linking Your Program
Preparing Your Program for Data Collection and Analysis
Using Dynamically Allocated Memory
Program Control of Data Collection
The C, C++, Fortran, and Java API Functions
Limitations on Data Collection
Limitations on Clock-Based Profiling
Runtime Distortion and Dilation with Clock-profiling
Limitations on Collection of Tracing Data
Runtime Distortion and Dilation with Tracing
Limitations on Hardware Counter Overflow Profiling
Runtime Distortion and Dilation With Hardware Counter Overflow Profiling
Limitations on Data Collection for Descendant Processes
Experiments for Descendant Processes
Experiments on the Kernel and User Processes
Estimating Storage Requirements
Collecting Data Using the collect Command
-h counter_definition_1...[,counter_definition_n]
Collecting Data From a Running Process Using the collect Utility
To Collect Data From a Running Process Using the collect Utility
Collecting Data Using the dbx collector Subcommands
To Run the Collector From dbx:
Experiment Control Subcommands
Collecting Data From a Running Process With dbx on Oracle Solaris Platforms
To Collect Data From a Running Process That is Not Under the Control of dbx
Collecting Tracing Data From a Running Program
Collecting Data From MPI Programs
Running the collect Command for MPI
4. The Performance Analyzer Tool
5. The er_print Command Line Performance Analysis Tool
6. Understanding the Performance Analyzer and Its Data
This section describes the limitations on data collection that are imposed by the hardware, the operating system, the way you run your program, or by the Collector itself.
There are no limitations on simultaneous collection of different data types: you can collect any data type with any other data type, with the exception of count data.
The Collector can support up to 16K user threads. Data from additional threads is discarded, and a collector error is generated. To support more threads, set the SP_COLLECTOR_NUMTHREADS environment variable to a larger number.
By default, the Collector collects stacks that are, at most, up to 256 frames deep. To support deeper stacks, set the SP_COLLECTOR_STACKBUFSZ environment variable to a larger number.
The minimum value of the profiling interval and the resolution of the clock used for profiling depend on the particular operating environment. The maximum value is set to 1 second. The value of the profiling interval is rounded down to the nearest multiple of the clock resolution. The minimum and maximum value and the clock resolution can be found by typing the collect -h command with no additional arguments.
On Linux systems, clock-profiling of multithreaded applications might report inaccurate data for threads. The profile signal is not always delivered by the kernel to each thread at the specified interval; sometimes the signal is delivered to the wrong thread. If available, hardware counter profiling using the cycle counter will give more accurate data for threads.
Clock-based profiling records data when a SIGPROF signal is delivered to the target. It causes dilation to process that signal, and unwind the call stack. The deeper the call stack, and the more frequent the signals, the greater the dilation. To a limited extent, clock-based profiling shows some distortion, deriving from greater dilation for those parts of the program executing with the deepest stacks.
Where possible, a default value is set not to an exact number of milliseconds, but to slightly more or less than an exact number (for example, 10.007 ms or 0.997 ms) to avoid correlations with the system clock, which can also distort the data. Set custom values the same way on SPARC platforms (not possible on Linux platforms).
You cannot collect any kind of tracing data from a program that is already running unless the Collector library, libcollector.so, had been preloaded. See Collecting Tracing Data From a Running Program for more information.
Tracing data dilates the run in proportion to the number of events that are traced. If done with clock-based profiling, the clock data is distorted by the dilation induced by tracing events.
Hardware counter overflow profiling has several limitations:
You can only collect hardware counter overflow data on processors that have hardware counters and that support overflow profiling. On other systems, hardware counter overflow profiling is disabled. UltraSPARC processors prior to the UltraSPARC III processor family do not support hardware counter overflow profiling.
You cannot collect hardware counter overflow data on a system running Oracle Solaris while the cpustat(1) command is running, because cpustat takes control of the counters and does not let a user process use the counters. If cpustat is started during data collection, the hardware counter overflow profiling is terminated and an error is recorded in the experiment.
You cannot use the hardware counters in your own code if you are doing hardware counter overflow profiling. The Collector interposes on the libcpc library functions and returns with a return value of -1 if the call did not come from the Collector. Your program should be coded so as to work correctly if it fails to get access to the hardware counters. If not coded to handle this, the program will fail under hardware counter profiling, or if the superuser invokes system-wide tools that also use the counters, or if the counters are not supported on that system.
If you try to collect hardware counter data on a running program that is using the hardware counter library by attaching dbx to the process, the experiment may be corrupted.
Note - To view a list of all available counters, run the collect -h command with no additional arguments.
Hardware counter overflow profiling records data when a SIGEMT signal (on Solaris platforms) or a SIGIO signal (on Linux platforms) is delivered to the target. It causes dilation to process that signal, and unwind the call stack. Unlike clock-based profiling, for some hardware counters, different parts of the program might generate events more rapidly than other parts, and show dilation in that part of the code. Any part of the program that generates such events very rapidly might be significantly distorted. Similarly, some events might be generated in one thread disproportionately to the other threads.
You can collect data on descendant processes subject to some limitations.
If you want to collect data for all descendant processes that are followed by the Collector, you must use the collect command with the one of the following options:
-F on option enables you to collect data automatically for calls to fork and its variants and exec and its variants.
-F all option causes the Collector to follow all descendant processes, including those due to calls to system, popen, posix_spawn(3p), posix_spawnp(3p), and sh.
-F '=regexp' option enables data to be collected on all descendant processes whose name or lineage matches the specified regular expression.
See Experiment Control Options for more information about the -F option.
Collecting OpenMP data during the execution of the program can be very expensive. You can suppress that cost by setting the SP_COLLECTOR_NO_OMP environment variable. If you do so, the program will have substantially less dilation, but you will not see the data from slave threads propagate up to the caller, and eventually to main()(), as it normally will if that variable is not set.
A new collector for OpenMP 3.0 is enabled by default in this release. It can profile programs that use explicit tasking. Programs built with earlier compilers can be profiled with the new collector only if a patched version of libmtsk.so is available. If this patched version is not installed, you can switch data collection to use the old collector by setting the SP_COLLECTOR_OLDOMP environment variable.
OpenMP profiling functionality is available only for applications compiled with the Oracle Solaris Studio compilers, since it depends on the Oracle Solaris Studio compiler runtime. For applications compiled with GNU compilers, only machine-level call stacks are displayed.
You can collect data on Java programs subject to the following limitations:
You should use a version of the Java 2 Software Development Kit (JDK) no earlier than JDK 6, Update 18. The Collector first looks for the JDK in the path set in either the JDK_HOME environment variable or the JAVA_PATH environment variable. If neither of these variables is set, it looks for a JDK in your PATH. If there is no JDK in your PATH, it looks for the java executable in /usr/java/bin/java. The Collector verifies that the version of the java executable it finds is an ELF executable, and if it is not, an error message is printed, indicating which environment variable or path was used, and the full path name that was tried.
To get more detailed information for source-line mappings for HotSpot-compiled code, you should use a JDK version no earlier than JDK 6, Update 20 or JDK 7, build b85 early access release.
You must use the collect command to collect data. You cannot use the dbx collector subcommands.
Applications that create descendant processes that run JVM software cannot be profiled.
Some applications are not pure Java, but are C or C++ applications that invoke dlopen() to load libjvm.so, and then start the JVM software by calling into it. To profile such applications, set the SP_COLLECTOR_USE_JAVA_OPTIONS environment variable, and add the -j on option to the collect command line. Do not set the LD_LIBRARY_PATH environment variable for this scenario.
Java profiling uses the Java Virtual Machine Tools Interface (JVMTI), which can cause some distortion and dilation of the run.
For clock-based profiling and hardware counter overflow profiling, the data collection process makes various calls into the JVM software, and handles profiling events in signal handlers. The overhead of these routines, and the cost of writing the experiments to disk will dilate the runtime of the Java program. Such dilation is typically less than 10%.