JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Solaris Studio 12.3: Performance Analyzer     Oracle Solaris Studio 12.3 Information Library
search filter icon
search icon

Document Information

Preface

1.  Overview of the Performance Analyzer

2.  Performance Data

3.  Collecting Performance Data

Compiling and Linking Your Program

Source Code Information

Static Linking

Shared Object Handling

Optimization at Compile Time

Compiling Java Programs

Preparing Your Program for Data Collection and Analysis

Using Dynamically Allocated Memory

Using System Libraries

Using Signal Handlers

Using setuid and setgid

Program Control of Data Collection

The C and C++ Interface

The Fortran Interface

The Java Interface

The C, C++, Fortran, and Java API Functions

Dynamic Functions and Modules

collector_func_load()

collector_func_unload()

Limitations on Data Collection

Limitations on Clock-Based Profiling

Runtime Distortion and Dilation with Clock-profiling

Limitations on Collection of Tracing Data

Runtime Distortion and Dilation with Tracing

Limitations on Hardware Counter Overflow Profiling

Runtime Distortion and Dilation With Hardware Counter Overflow Profiling

Limitations on Data Collection for Descendant Processes

Limitations on OpenMP Profiling

Limitations on Java Profiling

Runtime Performance Distortion and Dilation for Applications Written in the Java Programming Language

Where the Data Is Stored

Experiment Names

Experiment Groups

Experiments for Descendant Processes

Experiments for MPI Programs

Experiments on the Kernel and User Processes

Moving Experiments

Estimating Storage Requirements

Collecting Data

Collecting Data Using the collect Command

Data Collection Options

-p option

-h counter_definition_1...[,counter_definition_n]

-s option

-H option

-M option

-m option

-S option

-c option

-I directory

-N library_name

-r option

Experiment Control Options

-F option

-j option

-J java_argument

-l signal

-t duration

-x

-y signal [ ,r]

Output Options

-o experiment_name

-d directory-name

-g group-name

-A option

-L size

-O file

Other Options

-P process_id

-C comment

-n

-R

-V

-v

Collecting Data From a Running Process Using the collect Utility

To Collect Data From a Running Process Using the collect Utility

Collecting Data Using the dbx collector Subcommands

To Run the Collector From dbx:

Data Collection Subcommands

profile option

hwprofile option

synctrace option

heaptrace option

tha option

sample option

dbxsample { on | off }

Experiment Control Subcommands

disable

enable

pause

resume

sample record name

Output Subcommands

archive mode

limit value

store option

Information Subcommands

show

status

Collecting Data From a Running Process With dbx on Oracle Solaris Platforms

To Collect Data From a Running Process That is Not Under the Control of dbx

Collecting Tracing Data From a Running Program

Collecting Data From MPI Programs

Running the collect Command for MPI

Storing MPI Experiments

Collecting Data From Scripts

Using collect With ppgsz

4.  The Performance Analyzer Tool

5.  The er_print Command Line Performance Analysis Tool

6.  Understanding the Performance Analyzer and Its Data

7.  Understanding Annotated Source and Disassembly Data

8.  Manipulating Experiments

9.  Kernel Profiling

Index

Limitations on Data Collection

This section describes the limitations on data collection that are imposed by the hardware, the operating system, the way you run your program, or by the Collector itself.

There are no limitations on simultaneous collection of different data types: you can collect any data type with any other data type, with the exception of count data.

The Collector can support up to 16K user threads. Data from additional threads is discarded, and a collector error is generated. To support more threads, set the SP_COLLECTOR_NUMTHREADS environment variable to a larger number.

By default, the Collector collects stacks that are, at most, up to 256 frames deep. To support deeper stacks, set the SP_COLLECTOR_STACKBUFSZ environment variable to a larger number.

Limitations on Clock-Based Profiling

The minimum value of the profiling interval and the resolution of the clock used for profiling depend on the particular operating environment. The maximum value is set to 1 second. The value of the profiling interval is rounded down to the nearest multiple of the clock resolution. The minimum and maximum value and the clock resolution can be found by typing the collect -h command with no additional arguments.

On Linux systems, clock-profiling of multithreaded applications might report inaccurate data for threads. The profile signal is not always delivered by the kernel to each thread at the specified interval; sometimes the signal is delivered to the wrong thread. If available, hardware counter profiling using the cycle counter will give more accurate data for threads.

Runtime Distortion and Dilation with Clock-profiling

Clock-based profiling records data when a SIGPROF signal is delivered to the target. It causes dilation to process that signal, and unwind the call stack. The deeper the call stack, and the more frequent the signals, the greater the dilation. To a limited extent, clock-based profiling shows some distortion, deriving from greater dilation for those parts of the program executing with the deepest stacks.

Where possible, a default value is set not to an exact number of milliseconds, but to slightly more or less than an exact number (for example, 10.007 ms or 0.997 ms) to avoid correlations with the system clock, which can also distort the data. Set custom values the same way on SPARC platforms (not possible on Linux platforms).

Limitations on Collection of Tracing Data

You cannot collect any kind of tracing data from a program that is already running unless the Collector library, libcollector.so, had been preloaded. See Collecting Tracing Data From a Running Program for more information.

Runtime Distortion and Dilation with Tracing

Tracing data dilates the run in proportion to the number of events that are traced. If done with clock-based profiling, the clock data is distorted by the dilation induced by tracing events.

Limitations on Hardware Counter Overflow Profiling

Hardware counter overflow profiling has several limitations:

Runtime Distortion and Dilation With Hardware Counter Overflow Profiling

Hardware counter overflow profiling records data when a SIGEMT signal (on Solaris platforms) or a SIGIO signal (on Linux platforms) is delivered to the target. It causes dilation to process that signal, and unwind the call stack. Unlike clock-based profiling, for some hardware counters, different parts of the program might generate events more rapidly than other parts, and show dilation in that part of the code. Any part of the program that generates such events very rapidly might be significantly distorted. Similarly, some events might be generated in one thread disproportionately to the other threads.

Limitations on Data Collection for Descendant Processes

You can collect data on descendant processes subject to some limitations.

If you want to collect data for all descendant processes that are followed by the Collector, you must use the collect command with the one of the following options:

See Experiment Control Options for more information about the -F option.

Limitations on OpenMP Profiling

Collecting OpenMP data during the execution of the program can be very expensive. You can suppress that cost by setting the SP_COLLECTOR_NO_OMP environment variable. If you do so, the program will have substantially less dilation, but you will not see the data from slave threads propagate up to the caller, and eventually to main()(), as it normally will if that variable is not set.

A new collector for OpenMP 3.0 is enabled by default in this release. It can profile programs that use explicit tasking. Programs built with earlier compilers can be profiled with the new collector only if a patched version of libmtsk.so is available. If this patched version is not installed, you can switch data collection to use the old collector by setting the SP_COLLECTOR_OLDOMP environment variable.

OpenMP profiling functionality is available only for applications compiled with the Oracle Solaris Studio compilers, since it depends on the Oracle Solaris Studio compiler runtime. For applications compiled with GNU compilers, only machine-level call stacks are displayed.

Limitations on Java Profiling

You can collect data on Java programs subject to the following limitations:

Runtime Performance Distortion and Dilation for Applications Written in the Java Programming Language

Java profiling uses the Java Virtual Machine Tools Interface (JVMTI), which can cause some distortion and dilation of the run.

For clock-based profiling and hardware counter overflow profiling, the data collection process makes various calls into the JVM software, and handles profiling events in signal handlers. The overhead of these routines, and the cost of writing the experiments to disk will dilate the runtime of the Java program. Such dilation is typically less than 10%.