Updated 2007/05/31

Sun[tm] Studio 12: Performance Analyzer Readme

Contents

  1. Introduction
  2. Starting the Performance Analyzer from the IDE
  3. About the Performance Analyzer
  4. New and Changed Features
  5. Software Corrections
  6. Problems and Workarounds
  7. Limitations and Incompatibilities
  8. Documentation Errors
  9. Required Patches

 


A. Introduction

This document contains information about the Performance Analyzer and its accompanying analysis tools.

Product Documentation

Note - If your Sun Studio compilers and tools have not been installed in the default /opt directory, ask your system administrator for the equivalent path on your system.

 


B. Starting the Performance Analyzer from the IDE

To start the Performance Analyzer from the IDE, do one of the following:

To open the Collector window from the IDE, do one of the following:

 


C. About the Performance Analyzer

The Performance Analyzer and analysis tools include commands for the collection and manipulation of program performance data, a graphical user interface, and a command-line interface, er_print, for the display of performance data. The term Collector is used in this document for the tools that collect performance data and their underlying libraries. These tools are the collect command, the dbx collector subcommands, and the performance data collection features in the IDE.

The program performance analysis tools collect statistical profiles of a program's performance and trace calls to critical library routines, and display the data in tabular and graphical form. The collected data is converted into performance metrics. Metrics can be viewed in tabular form at the level of the load-object, function, source line or instruction. The tools provide a means of navigating program structure that is useful for identifying functions and paths within the code that are responsible for resource usage, inefficiencies, or time delays. The Performance Analyzer GUI can also display the performance data in a timeline display.

This release of the Performance Analyzer collector allows profiling of applications written in the JavaTM programming language, using underlying support in the JavaTM Virtual Machine (JVM) software. JVM software version 5.0 Update 3 and later 5.0 updates, and version 6.0 and its updates, contain this support. The support may change in future JVM releases. If you use this release of the Performance Analyzer collector with a later JVM release, the collector may not be able to collect profiling information for applications written in the Java programming language. Sun Microsystems expects that future releases of the Performance Analyzer collector will be available to support Java profiling under the changing JVM profiling technology.

The terms "Java Virtual Machine" or "JVM" mean a virtual machine for the Java platform.

 


D. New and Changed Features

This section describes the new and changed features for the Performance Analyzer.

  1. Improved Handling of Descendant Processes
  2. New Index Objects Tab
  3. dbx Profiling on Linux
  4. Improved Handling of MPI Profiling
  5. Java Mode has been Replaced by View Mode
  6. Improved OpenMP Support
  7. Other Improvements and New Features

 

  1. Improved Handling of Descendant Processes

    The Analyzer and the er_print utility process an en_desc on|off|reg-exp directive in a .er.rc file. If the directive specifies on, all descendant experiments are read immediately; if the directive specifies off, only the founder experiment is read. If the directive specifies a regular expression, those descendants whose name or lineage match will be read.

  2. New Index Objects Tab and Report

    New tabs are available in the Analyzer to show performance data for threads, cpus, etc. Several new er_print commands are available for index objects.

  3. dbx Profiling on Linux

    Profiling under dbx is now supported on Linux.

  4. Improved Handling of MPI Profiling

    Additional variables specifying process rank for OpenMPI versions of MPI are recognized.

  5. Java Mode Has H5een Replaced

    Java mode has been replaced by View mode. The javamode command is no longer accepted.

  6. Improved OpenMP Support

    The presentation of OpenMP data has been enhanced.

  7. Other Improvements and New Features
    • Improved integration with the Sun Studio IDE.
    • Support for the Sun Studio Thread Analyzer (THA).
    • Improved stack unwind on x86/x64 platforms for both Solaris and Linux.
    • Improved Linux support for: synchronization tracing, MPI tracing with OpenMPI, descentant process support.
    • Improved tracing: interposition only when data is requested gives improved performance when it is not requested.
    • Count data: collect -C {on|static} (SPARC only); requires Cool Tools supplement to Sun Studio 12.
    • Clock-based dataspace profiling.
    • Attach and collect data: collect -P pid

 


E. Software Corrections

This section describes problems that were fixed in this release.

  1. Studio 10 collector for MPI profiling does not create experiment according to MPI rank
  2. Leaklist tab should be shown if and only if one or more expts have leak data
  3. er_print should warn about overflow counts too high or too low
  4. Improve help when user mistypes *metric or *sort command
  5. Print Current Sort whenever we print Current Metrics
  6. Double counting of metrics in source lines

F. Problems and Workarounds

This section discusses known software problems and possible workarounds for those problems.

Some problems are the result of problems in the Solaris OS and can be fixed by installing the appropriate patches. For further information, see the Required Patches section in this readme. Some problems that appear to be in the Performance Analyzer might actually be Collector problems. Problems in the compilers and dbx can also affect the Performance Analyzer. Some issues that are not due to problems in the software are also described in this section.

Problems in the Performance Tools
  1. Java Synchronization Tracing May Have Performance Problems.

    Java synchronization tracing may have performance problems, especially in processing large experiments.

  2. Java Profiling Under dbx is Not Supported.

    Java profiling using the collector command in dbx or the Collector dialog in the Debugger in the Sun Studio IDE is not supported, because the JVM software can not support both a debugging agent and a profiling agent. (4771337)

  3. Cannot Print Summary Tab or Event Tab Data.

    The Performance Analyzer cannot print the data in the Summary tab or the Event tab. To print summary data for a function or a load object, you can use the er_print command. (4286674)

  4. Legend Panel Not Always Updated When Colors are Changed

    The colors in the Legend panel are not always updated when you change colors in the color chooser for the Timeline tab. (4948522)

  5. Libraries with Different Architectures May Not Be Properly Handled when Archived

    Libraries with the same name but different architecture are not copied correctly when using the collect -A copy command. (4970739)

  6. Cannot Run More than One Experiment from a Single dbx session.

    Attempting to run multiple experiments from one dbx session fails. (4999242)

  7. Collector May Crash

    The collector may crash when programs on Linux use pthread_cancel and a profile click occurs during cancel.

Linux-Specific Performance Tools Problems
  1. MP HWC Profiling Under dbx Undercounts Data from Parallel Threads

    The problem occurs on some versions of RedHat Linux and SUSE Linux. Profile interrupts from some threads are dropped. The workaround is to use collect.

  2. Java Profiling Does Not Show Demangled Names from the JVM

    The problem occurs with versions of the JVM that are built with GNU 2.x, which is not a supported compiler. The problem is due to a bug in the demangler that will not be fixed. (no bug number)

  3. Sample Data for MP Programs May Be Incorrect

    CPU times may be inconsistent with sample times. (5025963)

Problems That Can Be Fixed With Solaris OS Patches

The following problems can be fixed by installing the appropriate patches to the Solaris OS. See the Required Patches section in this Readme for more information.

  1. Application Crash during Hardware Counter Profiling

    Under some circumstances hardware counter profiling interrupts triggers an OS bug on UltraSPARC-III processors that can cause the %y register to be corrupted. If the register is live at the time, the application may crash. This is fixed in Solaris 9 OS, update 4. The frequency of the problem is reduced by lower-resolution profiling, and/or the use of only one counter. (4793905)

Other Problems
  1. Clock-Based Profiling Inaccuracies on Loaded Systems

    Profiling an application when there is a load on the system can result in significant undercount of User CPU time, up to 20%. The missing User CPU time shows up as either System CPU time or as Wait-CPU time. The problem occurs only for x86 targets on the Solaris 9 OS. (4509116)

  2. Altered Behavior With Applications That Install Signal Handlers

    Collecting performance data on an application that installs a signal handler can cause altered behavior of the collector or of the application. When such behavior is detected, the collector library records a warning message in the experiment.

    When the collector library is preloaded, the collector's signal handler always re-installs itself as the primary handler, and the signal handler passes on signals that it does not use to any other handler. However, because the collector's signal handler does not interrupt system calls, an application that installs a signal handler that does interrupt system calls can show changed behavior. In the case of the asynchronous I/O library, libaio.so, which uses SIGPROF for asynchronous cancel operations, asynchronous cancel requests arrive late. (4397578)

    If you attach dbx to the application without preloading the collector library, the collector installs its signal handler as the primary handler when collection is enabled. However, any signal handler installed subsequently takes precedence over the collector's signal handler. If this signal handler does not pass on SIGPROF and SIGEMT signals to the collector's signal handler, profiling data is lost.

  3. Data Collection Problems When dbx is Attached to a Process

    If you attach dbx to a running process without preloading the collector library, libcollector.so, there are a number of errors that can occur.

    • You might not be able to collect any data when synchronization wait tracing, heap tracing, or MPI tracing. Tracing data is collected by interposing on various libraries, and if libcollector.so is not preloaded, the interposition cannot be done.

    • If the program installs a signal handler after dbx is attached to the process, and the signal handler does not pass on the SIGPROF and SIGEMT signals, profiling data and sampling data is lost. (4397578)

    • If the program uses the asynchronous I/O library, libaio.so, clock-based profiling data and sampling data is lost, because libaio.so uses SIGPROF for asynchronous cancel operations.

    • If the program uses the hardware counter library, libcpc.so, hardware counter overflow profiling experiments are corrupted because both the collector and the program are using the library. If the hardware counter library is loaded after dbx is attached to the process, the hardware counter experiment can succeed provided references to the libcpc library functions are resolved by a general search rather than a search in libcpc.so.

    • If the program calls setitimer(2), clock-based profiling experiments can be corrupted because both the collector and the program are using the timer.

  4. Incorrect Values for Wait CPU Metric in Statistics Display and Samples

    Incorrect values for the Wait CPU metric are sometimes recorded in the sample packets and the global statistics. These values appear in the Statistics tab of the Performance Analyzer and affect the display of samples in the Timeline tab. (4615617)

  5. Lost Clock Profiling Data Over a Small Time Period

    Clock profiling data can appear to be lost over a period of several seconds when the system clock is being synchronized with an external source. During this time, the system clock is incremented until it is synchronized. Profile signals are delivered at the set interval, but the time stamp recorded in the profile packets includes any increment that is made between signal deliveries.

  6. Data Collection Aborts With a Stack Overflow.

    Sometimes the Collector can fail with a stack overflow error. This happens because the Collector uses the application's stack and the stack size for the application is too small to support use by the Collector. The workaround is to increase the stack size by at least 8 Kbytes. See the limit(1) man page for details. For parallel applications that use the multitasking library, the stack size for each thread must also be set using the STACKSIZE environment variable.

  7. Incomplete Experiment When Program Calls exec.

    When the program on which performance data is being collected successfully calls exec(2) or any of its variants, the experiment is terminated abnormally. Although the experiment can still be read by the Performance Analyzer or er_print, you should run er_archive(1) for the experiment on the computer on which the data was collected, to ensure that the load objects used by the program were archived correctly.

  8. False Recursion Shown With Tail Call Optimization.

    For some functions that make tail-call optimized calls from a shared object (PIC code) and require the determination of the global offset table address in order to reference a global variable, the optimized code is incorrectly reported as recursive, and the real caller is lost. (4656890)

  9. False Recursion Shown With Outline Function Optimization.

    For functions that are optimized by producing an outline function for seldom-executed code, false recursion might be shown due to the inability of the tools to determine the outline function's return address. (4800953)

Check the support page on the SDN Sun Studio portal, http://developers.sun.com/sunstudio/support/ for the latest information.

 


G. Limitations and Incompatibilities 

This section discusses limitations and incompatibilities with systems or other software.

  1. Performance Analyzer Requirements

    The Performance Analyzer requires the Java 2 Software Development Kit, Standard Edition (J2SE), in a version no earlier than 1.4.2_02. If you use an earlier version, the Performance Analyzer runs, but could fail, not function correctly, or perform poorly. The Analyzer does not run with the 64-bit J2SE 5.0 Update 3 technology. If you are using the J2SE 5.0 technology, you must have the 32-bit version available to run the Analyzer.

  2. Profiling Applications Written in the Java Programming Language

    To collect Java-mode or machine-mode profiling data on an application written in the Java programming language you must use a version of the JavaTM 2 Software Development Kit, Standard Edition, no earlier than 1.5.0_03. There are bugs in the JVM software that may cause program failure with any version earlier than 1.5.0_03.

    For best results for Java-mode profiling, you should use the version of the Java 2 Software Development Kit, Standard Edition, available as an install option with this Sun Studio release.

    Java profiling is not supported under dbx.

  3. Hardware Counter Overflow Profiling

    Hardware counter overflow profiling is not supported on UltraSPARC® processors earlier than the UltraSPARC® III series.

    The Collector cannot collect hardware counter overflow data if cputrack is running on the system because cputrack takes control of the hardware counters.

    Hardware counter overflow profiling on Linux requires that you install the latest Perfctr patch, which you can download at http://user.it.uu.se/~mikpe/linux/perfctr/2.6/perfctr-2.6.15.tar.gz. Instructions for installing the patch are included in the tar file.

    Hardware counter overflow profiling does not work on Intel Core 2 Duo processors and Intel Xeon 5100 series processors due to a Solaris bug (6537929). There is no workaround. This problem can only be resolved when the bug is fixed in a future Solaris update or patch.

  4. Library Interposition

    The Collector interposes on various system functions, including signal handling, fork and exec calls, the hardware counter library and some timing functions, to ensure that it can collect valid data. If a program uses any of these functions, its behavior can change. In particular, the profiling timer and the hardware counters are not available to a program when profiling is enabled, and system calls are not interrupted to deliver signals. This behavior affects the use of the asynchronous I/O library, libaio.so, which does interrupt system calls to deliver signals. These interpositions do not take place if you attach dbx to a running process without preloading the collector library, libcollector.so, and then enable data collection.

  5. Finding Source and Object Files

    The executable name that is generated when the debugger is attached to a process can be a relative path, not an absolute path, or the path, even though absolute, might not be accessible to the Performance Analyzer. Similar problems can arise with object files loaded from an archive (.a).

    The Performance Analyzer extracts the basename (the name following the last "/") from the recorded path in the executable or object file, and searches for the files as follows:

    1. It searches for a file with the basename under directories given by addpath or setpath commands, or as set from the Search Path tab in the Set Data Presentation dialog box, in the order given. The default setpath is:
      • The experiment archive directories
      • The current working directory, that is, as ./<basename>
    2. It searches for the file using the original recorded path.

    If the Analyzer does not find the file, it generates an error or warning, showing the path as it originally appeared in the experiment.

    To enable the Performance Analyzer to find the source file, you can add the directory containing the file to the search path, or you can set up a symbolic link from the current directory that points to the actual location of the file, or you can copy the file into the experiment.

  6. Experiment Incompatibility

    The Performance Analyzer cannot load experiments created with versions of the Collector prior to the ForteTM Developer 7 software release.

  7. Use of setuid

    If the process calls setuid or executes setuid files, the Collector can fail to create an experiment due to permission problems.

    See the collect(1) man page for more information about restrictions on data collection.

  8. Hardware Counters using Pentium Processors

    Pentium IV processors with HyperThreading Technology have only one set of hardware counters per physical processor. To use hardware counters on a system with Pentium IV HT processors, a system administrator must first take the processors in the system off-line until each physical processor has only one hardware thread online. See the -v and -p options to psrinfo(1M) and the -f option to psradm(1M) for more information.

    When using multiple hardware counters on a Pentium IV processor, some combinations of counters can not be bound due to resource constraints. For example, the branch_retired metric cannot be measured on registers 12 and 13 simultaneously because both counters require the Pentium IV CRU_ESCR2 ESCR to measure this event. See the processor documentation for more details.

 


H. Documentation Errata

An updated Performance Analyzer manual is not yet available. Check the Sun Studio Developer's portal, http://developers.sun.com/sunstudio/ for latest information.

 


I. Required Patches

Some of the problems with the performance analysis tools originate in bugs in the Solaris OS. To fix these problems, you should install the relevant patches. To obtain a list of required patches, you can type the collect command at the command prompt with no arguments. The patches can be downloaded from http://sunsolve.sun.com.

The following problems can be encountered by the Collector and Performance Analyzer when the patches are not installed:

 


Copyright © 2007 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms.