Sun Logo


Program Performance Analysis Tools

Suntrademark ONE Studio 8

817-0922-10



Contents

Before You Begin

How This Book Is Organized

Typographic Conventions

Shell Prompts

Accessing Compiler Collection Tools and Man Pages

Accessing Compiler Collection Documentation

Accessing Related Solaris Documentation

Resources for Developers

Contacting Sun Technical Support

Sun Welcomes Your Comments

1. Overview of Program Performance Analysis Tools

2. Learning to Use the Performance Tools

Setting Up the Examples for Execution

System Requirements

Choosing Alternative Compiler Options

Basic Features of the Performance Analyzer

Example 1: Basic Performance Analysis

Collecting Data for synprog

Simple Metric Analysis

Extension Exercise for Simple Metric Analysis

Metric Attribution and the gprof Fallacy

The Effects of Recursion

Loading Dynamically Linked Shared Objects

Descendant Processes

Example 2: Analyzing the Performance of a Mixed Java/C++ Application

jsynprog Program Structure and Control Flow

Example 3: OpenMP Parallelization Strategies

Collecting Data for omptest

Comparing Parallel Sections and Parallel Do Strategies

Comparing Critical Section and Reduction Strategies

Example 4: Locking Strategies in Multithreaded Programs

Collecting Data for mttest

How Locking Strategies Affect Wait Time

How Data Management Affects Cache Performance

Extension Exercises for mttest

Example 5: Cache Behavior and Optimization

Collecting Data for cachetest

Execution Speed

Program Structure and Cache Behavior

Program Optimization and Performance

3. Performance Data

What Data the Collector Collects

Clock Data

Hardware-Counter Overflow Profiling Data

Synchronization Wait Tracing Data

Heap Tracing (Memory Allocation) Data

MPI Tracing Data

Global (Sampling) Data

How Metrics Are Assigned to Program Structure

Function-Level Metrics: Exclusive, Inclusive, and Attributed

Interpreting Function-Level Metrics: An Example

How Recursion Affects Function-Level Metrics

4. Collecting Performance Data

Compiling and Linking Your Program

Source Code Information

Static Linking

Optimization

Intermediate Files

Compiling Java Programs

Preparing Your Program for Data Collection and Analysis

Use of System Libraries

Use of Signal Handlers

Use of setuid

Controlling Data Collection From Your Program

Dynamic Functions and Modules

Limitations on Data Collection

Limitations on Clock-based Profiling

Limitations on Collection of Tracing Data

Limitations on Hardware-Counter Overflow Profiling

Runtime Distortion and Dilation With HWC Overflow Profiling

Limitations on Data Collection for Descendant Processes

Limitations on Java Profiling

Runtime Performance Distortion and Dilation for Applications Written in the Java Programming Language

Where the Data Is Stored

Experiment Names

Moving Experiments

Estimating Storage Requirements

Collecting Data Using the collect Command

Data Collection Options

Experiment Control Options

Output Options

Other Options

Collecting Data Using the dbx collector Subcommands

Data Collection Subcommands

Experiment Control Subcommands

Output Subcommands

Information Subcommands

Collecting Data From a Running Process

Collecting Data From MPI Programs

Storing MPI Experiments

Running the collect Command Under MPI

Collecting Data by Starting dbx Under MPI

5. The Performance Analyzer Graphical User Interface

Running the Performance Analyzer

Starting the Analyzer from the Command Line

Starting the Analyzer from the IDE

The Performance Analyzer Displays

The Menu Bar

The Toolbar

The Functions Tab

The Callers-Callees Tab

The Source Tab

The Lines Tab

The Disassembly Tab

The PCs Tab

The Data Objects Tab

The Timeline Tab

The LeakList Tab

The Statistics Tab

The Experiments Tab

The Summary Tab

The Event Tab

The Legend Tab

The Leak Tab

Using the Performance Analyzer

Comparing Metrics

Selecting Experiments

Selecting the Data to Be Displayed

Setting Defaults

Searching for Names or Metric Values

Generating and Using a Mapfile

6. The er_print Command Line Performance Analysis Tool

er_print Syntax

Metric Lists

Commands Controlling the Function List

Command Controlling the Callers-Callees List

Commands Controlling the Leak and Allocation Lists

Commands Controlling the Source and Disassembly Listings

Commands Controlling the Data Space List

Commands Listing Experiments, Samples, Threads, and LWPs

Commands Controlling Selections

Commands Controlling Load Object Selection

Commands That List Metrics

Commands That Control Output

Commands That Print Other Displays

Default-Setting Commands

Default-Setting Commands Affecting Only the Performance Analyzer

Miscellaneous Commands

7. Understanding the Performance Analyzer and Its Data

How Data Collection Works

Experiment Format

Recording Experiments

Interpreting Performance Metrics

Clock-Based Profiling

Synchronization Wait Tracing

Hardware-Counter Overflow Profiling

Heap Tracing

MPI Tracing

Call Stacks and Program Execution

Single-Threaded Execution and Function Calls

Explicit Multithreading

Overview of Java Technology-Based Software Execution

Java Processing Representations

Parallel Execution and Compiler-Generated Body Functions

Incomplete Stack Unwinds

Mapping Addresses to Program Structure

The Process Image

Load Objects and Functions

Aliased Functions

Non-Unique Function Names

Static Functions From Stripped Shared Libraries

Fortran Alternate Entry Points

Cloned Functions

Inlined Functions

Compiler-Generated Body Functions

Outline Functions

Dynamically Compiled Functions

The <Unknown> Function

The <no Java callstack recorded> Function

The <Total> Function

Mapping Data Addresses to Program Data Objects

Dataobject Descriptors

Annotated Code Listings

Annotated Source Code

Annotated Disassembly Code

8. Manipulating Experiments and Viewing Annotated Code Listings

Manipulating Experiments

Viewing Annotated Code Listings With er_src

Other Utilities

The er_archive Utility

The er_export Utility

A. Profiling Programs With prof, gprof, and tcov

Using prof to Generate a Program Profile

Using gprof to Generate a Call Graph Profile

Using tcov for Statement-Level Analysis

Creating tcov Profiled Shared Libraries

Locking Files

Errors Reported by tcov Runtime Functions

Using tcov Enhanced for Statement-Level Analysis

Creating Profiled Shared Libraries for tcov Enhanced

Locking Files

tcov Directories and Environment Variables

Index