Go to main content
Oracle® Developer Studio 12.5: Overview

Exit Print View

Updated: June 2016
 
 

Tools for Tuning Application Performance

Oracle Developer Studio software features several tools you can use to examine your application's behavior, enabling you to tune its performance.

    The performance tools include the following:

  • Performance Analyzer and associated tools. A set of advanced performance tools and utilities to help you identify locations in your code where problems affect performance.

  • Simple Performance Optimization Tool (SPOT). A command-line tool that works with the Performance Analyzer tools and produces web pages to report the data gathered by the tools.

  • Profiling Tools in the IDE. Enable you to examine the performance of your projects from within the IDE.

Performance Analyzer Tools

The Oracle Developer Studio software provides a set of advanced performance tools and utilities that work together. The Collector, Performance Analyzer, Thread Analyzer, and er_print utility help you assess the performance of your code, identify potential performance problems, and locate the part of the code where the problems occur. These tools together are referred to as the Performance Analyzer tools.

You can use options for the Oracle Developer Studio C, C++, and Fortran compilers to target hardware and advanced optimization techniques that will improve your program's performance. Performance Analyzer tools also are engineered for use on Oracle Sun hardware together with the compilers, and can help you improve your program's performance when running on Oracle Sun machines.

Performance Analyzer tools allow you to have control over the data that is collected, inspect the data deeply, and examine your program's interaction with the hardware. Performance Analyzer tools are designed for and tested with complex compute-intensive applications running on current Oracle Sun hardware.

The Performance Analyzer tools also feature profiling of OpenMP parallel applications and MPI-based distributed applications, to help you to determine if you are using these technologies effectively in your application.

To use the Performance Analyzer tools, you must perform two steps:

  1. Profile a target application in Performance Analyzer or collect performance data from the target application with the collect command.

  2. Examine the data with the Performance Analyzer graphical tool, or the er_print command line utility, or the Thread Analyzer graphical tool for examining data race and deadlock data.

Collect Performance Data to Profile an Application

The Collector collects performance data using profiling and by tracing function calls. The data can include call stacks, microstate accounting information (on Oracle Solaris platforms only), thread synchronization delay data, hardware counter overflow data, Message Passing Interface (MPI) function call data, memory allocation data, and summary information for the operating system and the process. The Collector can collect all types of data for C, C++, and Fortran programs, and profiling data for applications written in the Java programming language. You can run the Collector using the collect command, or from the Profile Application dialog in Performance Analyzer, or by using the dbx debugger's collect subcommand.

The Oracle Developer Studio IDE profiling tools also use the Collector to gather information.

To collect data with the collect command:

% collect [collect-options] executable executable-options

You can include options to the collect command to specify the type of data you want to collect. For example, the –i on option causes the Collector to perform input/output tracing. You can pass arguments to the target executable by specifying the arguments after the executable.

The Collector creates a data directory with the name test.1.er by default, but you can specify a different name on the command line. The test.1.er directory is known as an experiment, and the name must always end in .er in order for the tools to recognize it as an experiment.

The following command shows how to use collect on the synprog program:

% collect synprog

Creating experiment database test.1.er (Process ID: 11103) ...
00:00:00.000  ===== (11103) synprog run
00:00:00.005  ===== (11103) Mon  22 Sep 14  17:05:51 Stopwatch calibration
  OS release 5.11 -- enabling microstate accounting 5.11.
        0.000096 s.  (22.4 % of 0.000426 s.) -- inner
	N = 1000, avg = 0.096 us., min = 0.090, max = 0.105
        0.000312 s.  (67.0 % of 0.000466 s.) -- outer
	N = 1000, avg = 0.312 us., min = 0.307, max = 0.457
00:00:00.006  ===== (11103)  Begin commandline
	icpu.md.cpu.rec.recd.dousl.gpf.fitos.uf.ec.tco.b.nap.sig.sys.so.sx.so
00:00:00.006  ===== (11103) start of icputime
    3.003069 wall-secs.,   2.978360 CPU-secs., in icputime
00:00:03.009  ===== (11103) start of muldiv
    3.007489 wall-secs.,   2.997647 CPU-secs., in muldiv
00:00:06.017  ===== (11103) start of cputime
    3.002315 wall-secs.,   2.989407 CPU-secs., in cputime
00:00:09.019  ===== (11103) start of recurse
    3.082371 wall-secs.,   3.069782 CPU-secs., in recurse
...
(output edited to conserve space)
...

The data is stored in the test.1.er directory, which can be viewed using Performance Analyzer or er_print.

See the Oracle Developer Studio 12.5: Performance Analyzer Tutorials for step-by-step instructions for using Performance Analyzer on sample applications you can download.

For detailed information about profiling applications and using the Collector, see the Help menu in Performance Analyzer, the Oracle Developer Studio 12.5: Performance Analyzer manual, and the collect(1) man page.

Examine Performance Data With Performance Analyzer

Performance Analyzer provides insight into the behavior of your application to enable you to find problem areas in your code. Performance Analyzer identifies which functions, code segments, and source lines are using the most system resources. Performance Analyzer can profile single-threaded, multithreaded, and multi-process applications, then present the profiling data to help you identify where you can improve your application's performance.

You can run Performance Analyzer with the analyzer command. The basic syntax of the analyzer command to start Performance Analyzer is:

% analyzer [experiment-list]

The experiment-list is one or more file names of experiments that were collected with the Collector. If you want to load more than one experiment, specify the names separated by spaces. When invoked on more than one experiment, Performance Analyzer aggregates the experiment data by default, but can also be used to compare the experiments if you specify the –c option on the command line before the experiment names.

If you do not specify an experiment on the command line, Performance Analyzer displays a Welcome screen to help you get started.

To open the experiment test.1.er in Performance Analyzer:

% analyzer test.1.er

The initial view of the experiment is the Overview where you can get a quick overview of time and resources used by your program and select the performance metrics you want to see in the views of performance data.

The following figure shows Performance Analyzer's Functions view for a test.1.er experiment that was made on the synprog example. The Functions view shows the CPU time used by each function of the synprog program. When you click the function gpf_work the Selection Details window on the right side shows details about the gpf_work function's resource usage. At the bottom of the Functions view, the Called-by/Calls area shows the functions that are called by gpf_work and you can double-click the calls to navigate to them in the Functions view.

image:Screen shot of Performance Analyzer's Functions tab

For information about using Performance Analyzer, see the Oracle Developer Studio 12.5: Performance Analyzer manual, Performance Analyzer integrated help, and the analyzer(1) man page.

See the Oracle Developer Studio 12.5: Performance Analyzer Tutorials for step-by-step instructions for using Performance Analyzer on sample applications you can download.

Examine Performance Data With the er_print Utility

The er_print utility presents in plain text most of the displays that are presented in the Performance Analyzer except the Timeline display, the MPI Timeline display, and the MPI Chart display.

You can use the er_print utility to display the performance metrics for functions, callers and callees, the call tree, source code listing, disassembly listing, sampling information, dataspace data, thread analysis data, and execution statistics.

The general syntax of the er_print command is:

% er_print -command experiment-list

You can specify one or more commands to indicate the type of data you want to display. The experiment-list is one or more file names of experiments that were collected with the Collector. When invoked on more than one experiment, er_print aggregates the experiment data by default, but can also be used to compare the experiments.

The following example shows the command for displaying function information for a program. The output shown is for the same experiment that was used in the screen capture of Performance Analyzer in the previous section of this document.

%  er_print -functions test.1.er
Functions sorted by metric: Exclusive Total CPU Time

Excl.     Incl.      Name
Total     Total
CPU sec.  CPU sec.
50.806    50.806     <Total>
 5.994     5.994     so_burncpu
 5.914     5.914     real_recurse
 3.502     3.502     gpf_work
 3.012     3.012     sigtime_handler
 3.002     3.002     bounce_a
 3.002     3.002     cputime
 3.002     3.002     icputime
 2.992     2.992     sx_burncpu
 2.992     2.992     underflow
 2.792     2.792     muldiv
 2.532     2.532     my_irand
 1.831     1.831     gethrtime
 1.031     1.991     tailcall_b
 0.961     0.961     inc_middle
 0.961     0.961     tailcall_c
 0.941     0.941     gethrvtime
 0.941     0.941     gettimeofday
 0.911     2.902     tailcall_a
 0.801     0.801     dousleep
 0.650     0.650     inc_entry
 0.640     0.640     inc_exit
 0.480     3.012     fitos
 0.330     0.330     inc_func
 0.320     0.320     inc_body
 0.320     0.320     inc_brace
 0.290     4.003     systime
 0.260     0.260     ext_macro_code

lines deleted

You can also use er_print interactively if you specify the experiment name and omit the command when starting er_print. You can type commands at an (er_print) prompt.

For information about the er_print utility, see the Oracle Developer Studio 12.5: Performance Analyzer manual and the er_print(1) man page.

Analyze Multithreaded Application Performance With Thread Analyzer

Thread Analyzer is a specialized version of Performance Analyzer for examining multithreaded programs. Thread Analyzer can detect multithreaded programming errors that cause data races and deadlocks in code that is written using the POSIX thread API, the Oracle Solaris thread API, OpenMP directives, or a mix of these.

Thread Analyzer detects two common threading issues in multithreaded programs:

  • Data races, which occur when two threads in a single process access the same shared memory location concurrently and without holding any exclusive locks, and at least one of the accesses is a write.

  • Deadlocks, which occur when two or more threads are blocked because they are waiting for each other to complete a task.

Thread Analyzer is streamlined for multithreaded program analysis and shows only the Races, Deadlocks, and Dual Source data views of Performance Analyzer. For OpenMP programs, the OpenMP Parallel Region and OpenMP Task views are also shown.

You can detect data races on source code or binary code. In both cases, you have to instrument the code to enable the necessary data to be collected.

To use Thread Analyzer:

  1. Instrument your code for analysis of data races. For source code, use the –xinstrument=datarace compiler option when compiling. For binary code, use the discover -i datarace command to create instrumented binaries.

    Deadlock detection does not require instrumentation.

  2. Run the executable with the collect command with the –r race option to collect datarace data, the –r deadlock option to collect deadlock data, or the –r all option to collect both types of data.

  3. Start Thread Analyzer with the tha command or use the er_print command to display the resulting experiment.

The following figure shows the Thread Analyzer window with data races that were detected in an OpenMP program, and the call stacks that lead to the data races.

image:A screen shot of Thread Analyzer's Race Details window with call                             stack traces for a data race in an OpenMP program

For information about using Thread Analyzer, see the tha(1) man page and the Oracle Developer Studio 12.5: Thread Analyzer User’s Guide.

Simple Performance Optimization Tool (SPOT)

The Simple Performance Optimization Tool (SPOT) can help you diagnose performance problems in an application. SPOT runs a set of performance tools on an application and produces web pages to report the data gathered by the tools. The tools can also be run independently of SPOT.

SPOT is complementary to the Oracle Developer Studio Performance Analyzer. Performance Analyzer tells you where the time was spent in running your application. In certain situations, however, you may need more information to help diagnose your application's problems. SPOT can assist you in these situations.

SPOT uses the collect utility as one of its tools. SPOT uses the er_print utility and an additional utility called er_html to display the profiling data as a web page.

Before you use SPOT, the application binary should be compiled with some level of optimization with the –O option and debugging information with the –g option to enable the SPOT tools to map performance information to lines of code.

SPOT can be used to gather performance data by launching an application or attaching to an already running application.

To run SPOT and launch your application:

% spot executable

To run SPOT on an already running application:

% spot -P process-id

SPOT produces a report for each run of your application, as well as a report that compares SPOT data from different runs.

When SPOT is used on a PID, multiple tools are attached to the PID in sequence to generate the report.

The following figure shows part of the SPOT run report, which shows information about the system on which SPOT was run, and about how the application was compiled. The report includes links to other pages with more information.

image:Screen capture of SPOT report

The SPOT report web pages are linked together to make it easy for you to examine all the data complied.

For more information, see the Oracle Solaris Studio 12.2: Simple Performance Optimization Tool (SPOT) User’s Guide in the Oracle Solaris Studio 12.2 documentation library at http://docs.oracle.com/cd/E18659_01/index.html.

Profiling Tools in the IDE

Oracle Developer Studio IDE provides interactive graphical profiling tools to enable you to examine the performance of your projects as they run within the IDE. The profiling tools use Oracle Developer Studio utilities and operating system utilities to collect the data.

The profiling tools are available from the Profile Project button image:Profile Project button .

Monitor Project

Presents graphs that enable you to see a summary of resource usage of your program.

Memory Access Errors

Analyzes the program as it runs to detect memory access errors and memory leaks.

Data Races and Deadlocks Detection

Analyzes the program as it runs to detect actual and potential data races and deadlocks among the threads.

When you profile your project and choose Monitor Project, the Run Monitor window opens to display the output of the low-impact tools for CPU Usage, Memory Usage, and Thread Usage.

The following figure shows the IDE with the Run Monitor tools.

image:Screen capture of the IDE with Run Monitor tools

Additional tools for more detailed profiling have a greater performance impact on the system and the application, so those tools do not run automatically when you run Monitor Project. The advanced tools are linked to the Run Monitor tools and can be launched easily by clicking buttons to see Hot Spots, Memory Leaks, and Sync Problems.

The Data Races and Deadlocks Detection tool uses the same underlying technology as Thread Analyzer, described later in this document. The tool adds instrumentation to your threaded program and then analyzes the program as it runs to detect actual and potential data races and deadlocks among the threads. To start the tool, click the Profile Project button, select Data Races and/or Deadlocks, specify options for data collection, and click Start.

The following figure shows the Data Races and Deadlocks Detection tool after it has detected data races.

image:Screen capture of IDE with Data Race Detection running

If you click the details link in the Data Race Detection window, the Thread Details window opens to show where the data races occur. You can double-click the threads in the Thread Details window to open the source file where the problem occurs and go to the affected line of code.

The Memory Access Error tool uses the same underlying technology as discover, described earlier. The tool instruments your program and then analyzes the program as it runs to detect memory access errors and memory leaks. To start the tool, click the Profile Project button, select Memory Access Error, specify options for data collection, and click Start. The memory access error types are displayed in the Memory Analysis window. When you click on an error type, the errors of that type are displayed in the Memory Analysis Tool window, where you can see the call stack for each error.

The following figure shows the Memory Access Error tool after it has detected memory access errors.

image:Screen capture of IDE with Memory Access Errors running

For information about using the profiling tools, see the IDE integrated help, which you can access by pressing the F1 key or through the Help menu in the IDE. See "Profiling C/C++/Fortran Applications" , "Detecting Data Races and Deadlocks" and "Finding Memory Access Errors in Your Project" in the help Contents tab.