Oracle® Solaris Studio 12.4: Overview

Exit Print View

Updated: December 2014
 
 

Performance Analyzer Tools

The Oracle Solaris Studio software provides a set of advanced performance tools and utilities that work together. The Collector, Performance Analyzer, Thread Analyzer, and er_print utility help you assess the performance of your code, identify potential performance problems, and locate the part of the code where the problems occur. These tools together are referred to as the Performance Analyzer tools.

You can use options for the Oracle Solaris Studio C, C++, and Fortran compilers to target hardware and advanced optimization techniques that will improve your program's performance. Performance Analyzer tools also are engineered for use on Oracle Sun hardware together with the compilers, and can help you improve your program's performance when running on Oracle Sun machines.

Performance Analyzer tools allow you to have control over the data that is collected, inspect the data deeply, and examine your program's interaction with the hardware. Performance Analyzer tools are designed for and tested with complex compute-intensive applications running on current Oracle Sun hardware.

The Performance Analyzer tools also feature profiling of OpenMP parallel applications and MPI-based distributed applications, to help you to determine if you are using these technologies effectively in your application.

To use the Performance Analyzer tools, you must perform two steps:

  1. Profile a target application in Performance Analyzer or collect performance data from the target application with the collect command.

  2. Examine the data with the Performance Analyzer graphical tool, or the er_print command line utility, or the Thread Analyzer graphical tool for detecting data races and deadlocks on multithreaded applications.

Collect Performance Data to Profile an Application

The Collector collects performance data using profiling and by tracing function calls. The data can include call stacks, microstate accounting information (on Oracle Solaris platforms only), thread synchronization delay data, hardware counter overflow data, Message Passing Interface (MPI) function call data, memory allocation data, and summary information for the operating system and the process. The Collector can collect all types of data for C, C++, and Fortran programs, and profiling data for applications written in the Java programming language. You can run the Collector using the collect command, or from the Profile Application dialog in Performance Analyzer, or by using the dbx debugger's collect subcommand.

The Oracle Solaris Studio IDE profiling tools also use the Collector to gather information.

To collect data with the collect command:

% collect [collect-options] executable executable-options

You can include options to the collect command to specify the type of data you want to collect. For example, the –i on option causes the Collector to perform input/output tracing. You can pass arguments to the target executable by specifying the arguments after the executable.

The Collector creates a data directory with the name test.1.er by default, but you can specify a different name on the command line. The test.1.er directory is known as an experiment, and the name must always end in .er in order for the tools to recognize it as an experiment.

The following command shows how to use collect on the synprog program:

% collect synprog

Creating experiment database test.1.er (Process ID: 11103) ...
00:00:00.000  ===== (11103) synprog run
00:00:00.005  ===== (11103) Mon  22 Sep 14  17:05:51 Stopwatch calibration
  OS release 5.11 -- enabling microstate accounting 5.11.
        0.000096 s.  (22.4 % of 0.000426 s.) -- inner
	N = 1000, avg = 0.096 us., min = 0.090, max = 0.105
        0.000312 s.  (67.0 % of 0.000466 s.) -- outer
	N = 1000, avg = 0.312 us., min = 0.307, max = 0.457
00:00:00.006  ===== (11103)  Begin commandline
	icpu.md.cpu.rec.recd.dousl.gpf.fitos.uf.ec.tco.b.nap.sig.sys.so.sx.so
00:00:00.006  ===== (11103) start of icputime
    3.003069 wall-secs.,   2.978360 CPU-secs., in icputime
00:00:03.009  ===== (11103) start of muldiv
    3.007489 wall-secs.,   2.997647 CPU-secs., in muldiv
00:00:06.017  ===== (11103) start of cputime
    3.002315 wall-secs.,   2.989407 CPU-secs., in cputime
00:00:09.019  ===== (11103) start of recurse
    3.082371 wall-secs.,   3.069782 CPU-secs., in recurse
...
(output edited to conserve space)
...

The data is stored in the test.1.er directory, which can be viewed using Performance Analyzer or er_print.

See the Oracle Solaris Studio 12.4: Performance Analyzer Tutorials for step-by-step instructions for using Performance Analyzer on sample applications you can download.

For detailed information about profiling applications and using the Collector, see the Help menu in Performance Analyzer, the Oracle Solaris Studio 12.4: Performance Analyzer manual, and the collect(1) man page.

Examine Performance Data With Performance Analyzer

Performance Analyzer provides insight into the behavior of your application to enable you to find problem areas in your code. Performance Analyzer identifies which functions, code segments, and source lines are using the most system resources. Performance Analyzer can profile single-threaded, multithreaded, and multi-process applications, then present the profiling data to help you identify where you can improve your application's performance.

You can run Performance Analyzer with the analyzer command. The basic syntax of the analyzer command to start Performance Analyzer is:

% analyzer [experiment-list]

The experiment-list is one or more file names of experiments that were collected with the Collector. If you want to load more than one experiment, specify the names separated by spaces. When invoked on more than one experiment, Performance Analyzer aggregates the experiment data by default, but can also be used to compare the experiments if you specify the –c option on the command line before the experiment names.

If you do not specify an experiment on the command line, Performance Analyzer displays a Welcome screen to help you get started.

To open the experiment test.1.er in Performance Analyzer:

% analyzer test.1.er

The initial view of the experiment is the Overview where you can get a quick overview of time and resources used by your program and select the performance metrics you want to see in the views of performance data.

The following figure shows Performance Analyzer's Functions view for a test.1.er experiment that was made on the synprog example. The Functions view shows the CPU time used by each function of the synprog program. When you click the function gpf_work the Selection Details window on the right side shows details about the gpf_work function's resource usage. At the bottom of the Functions view, the Called-by/Calls area shows the functions that are called by gpf_work and you can double-click the calls to navigate to them in the Functions view..

image:Screen shot of Performance Analyzer's Functions tab

For information about using Performance Analyzer, see the Oracle Solaris Studio 12.4: Performance Analyzer manual, Performance Analyzer integrated help, and the analyzer(1) man page.

See the Oracle Solaris Studio 12.4: Performance Analyzer Tutorials for step-by-step instructions for using Performance Analyzer on sample applications you can download.

Examine Performance Data With the er_print Utility

The er_print utility presents in plain text most of the displays that are presented in the Performance Analyzer except the Timeline display, the MPI Timeline display, and the MPI Chart display.

You can use the er_print utility to display the performance metrics for functions, callers and callees, the call tree, source code listing, disassembly listing, sampling information, dataspace data, thread analysis data, and execution statistics.

The general syntax of the er_print command is:

% er_print -command experiment-list

You can specify one or more commands to indicate the type of data you want to display. The experiment-list is one or more file names of experiments that were collected with the Collector. When invoked on more than one experiment, er_print aggregates the experiment data by default, but can also be used to compare the experiments.

The following example shows the command for displaying function information for a program. The output shown is for the same experiment that was used in the screen capture of Performance Analyzer in the previous section of this document.

%  er_print -functions test.1.er
Functions sorted by metric: Exclusive Total CPU Time

Excl.     Incl.      Name
Total     Total
CPU sec.  CPU sec.
50.806    50.806     <Total>
 5.994     5.994     so_burncpu
 5.914     5.914     real_recurse
 3.502     3.502     gpf_work
 3.012     3.012     sigtime_handler
 3.002     3.002     bounce_a
 3.002     3.002     cputime
 3.002     3.002     icputime
 2.992     2.992     sx_burncpu
 2.992     2.992     underflow
 2.792     2.792     muldiv
 2.532     2.532     my_irand
 1.831     1.831     gethrtime
 1.031     1.991     tailcall_b
 0.961     0.961     inc_middle
 0.961     0.961     tailcall_c
 0.941     0.941     gethrvtime
 0.941     0.941     gettimeofday
 0.911     2.902     tailcall_a
 0.801     0.801     dousleep
 0.650     0.650     inc_entry
 0.640     0.640     inc_exit
 0.480     3.012     fitos
 0.330     0.330     inc_func
 0.320     0.320     inc_body
 0.320     0.320     inc_brace
 0.290     4.003     systime
 0.260     0.260     ext_macro_code

lines deleted

You can also use er_print interactively if you specify the experiment name and omit the command when starting er_print. You can type commands at an (er_print) prompt.

For information about the er_print utility, see the Oracle Solaris Studio 12.4: Performance Analyzer manual and the er_print(1) man page.

Analyze Multithreaded Application Performance With Thread Analyzer

Thread Analyzer is a specialized version of Performance Analyzer for examining multithreaded programs. Thread Analyzer can detect multithreaded programming errors that cause data races and deadlocks in code that is written using the POSIX thread API, the Oracle Solaris thread API, OpenMP directives, or a mix of these.

Thread Analyzer detects two common threading issues in multithreaded programs:

  • Data races, which occur when two threads in a single process access the same shared memory location concurrently and without holding any exclusive locks, and at least one of the accesses is a write.

  • Deadlocks, which occur when two or more threads are blocked because they are waiting for each other to complete a task.

Thread Analyzer is streamlined for multithreaded program analysis and shows only the Races, Deadlocks, and Dual Source data views of Performance Analyzer. For OpenMP programs, the OpenMP Parallel Region and OpenMP Task views are also shown.

You can detect data races on source code or binary code. In both cases, you have to instrument the code to enable the necessary data to be collected.

To use Thread Analyzer:

  1. Instrument your code for analysis of data races. For source code, use the –xinstrument=datarace compiler option when compiling. For binary code, use the discover -i datarace command to create instrumented binaries.

    Deadlock detection does not require instrumentation.

  2. Run the executable with the collect command with the –r race option to collect datarace data, the –r deadlock option to collect deadlock data, or the –r all option to collect both types of data.

  3. Start Thread Analyzer with the tha command or use the er_print command to display the resulting experiment.

The following figure shows the Thread Analyzer window with data races that were detected in an OpenMP program, and the call stacks that lead to the data races.

image:A screen shot of Thread Analyzer's Race Details window with call                             stack traces for a data race in an OpenMP program

For information about using Thread Analyzer, see the tha(1) man page and the Oracle Solaris Studio 12.4: Thread Analyzer User’s Guide .