Sun Logo


Program Performance Analysis Tools

Forte Developer 7

816-2458-10



Contents

Before You Begin

How This Book Is Organized

Typographic Conventions

Shell Prompts

Accessing Forte Developer Development Tools and Man Pages

Accessing Forte Developer Documentation

Accessing Related Solaris Documentation

Sending Your Comments

1. Overview of Program Performance Analysis Tools

2. Learning to Use the Performance Tools

Setting Up the Examples for Execution

System Requirements

Choosing Alternative Compiler Options

Basic Features of the Performance Analyzer

Example 1: Basic Performance Analysis

Collecting Data for synprog

Simple Metric Analysis

Extension Exercise for Simple Metric Analysis

Metric Attribution and the gprof Fallacy

The Effects of Recursion

Loading Dynamically Linked Shared Objects

Descendant Processes

Example 2: OpenMP Parallelization Strategies

Collecting Data for omptest

Comparing Parallel Sections and Parallel Do Strategies

Comparing Critical Section and Reduction Strategies

Example 3: Locking Strategies in Multithreaded Programs

Collecting Data for mttest

How Locking Strategies Affect Wait Time

How Data Management Affects Cache Performance

Extension Exercises for mttest

Example 4: Cache Behavior and Optimization

Collecting Data for cachetest

Execution Speed

Program Structure and Cache Behavior

Program Optimization and Performance

3. Performance Data

What Data the Collector Collects

Clock Data

Hardware-Counter Overflow Data

Synchronization Wait Tracing Data

Heap Tracing (Memory Allocation) Data

MPI Tracing Data

Global (Sampling) Data

How Metrics Are Assigned to Program Structure

Function-Level Metrics: Exclusive, Inclusive, and Attributed

Interpreting Function-Level Metrics: An Example

How Recursion Affects Function-Level Metrics

4. Collecting Performance Data

Preparing Your Program for Data Collection and Analysis

Use of System Libraries

Use of Signal Handlers

Use of setuid

Controlling Data Collection From Your Program

Dynamic Functions and Modules

Compiling and Linking Your Program

Source Code Information

Static Linking

Optimization

Intermediate Files

Limitations on Data Collection

Limitations on Clock-based Profiling

Limitations on Collection of Tracing Data

Limitations on Hardware-Counter Overflow Profiling

Limitations on Data Collection for Descendant Processes

Limitations on Java Profiling

Where the Data Is Stored

Experiment Names

Moving Experiments

Estimating Storage Requirements

Collecting Data Using the collect Command

Data Collection Options

Experiment Control Options

Output Options

Other Options

Obsolete Options

Collecting Data From the Integrated Development Environment

Collecting Data Using the dbx collector Subcommands

Data Collection Subcommands

Experiment Control Subcommands

Output Subcommands

Information Subcommands

Obsolete Subcommands

Collecting Data From a Running Process

Collecting Data From MPI Programs

Storing MPI Experiments

Running the collect Command Under MPI

Collecting Data by Starting dbx Under MPI

5. The Performance Analyzer Graphical User Interface

Running the Performance Analyzer

The Performance Analyzer Displays

The Functions Tab

The Callers-Callees Tab

The Source Tab

The Disassembly Tab

The Timeline Tab

The LeakList Tab

The Statistics Tab

The Experiments Tab

The Summary Tab

The Event Tab

The Legend Tab

Using the Performance Analyzer

Comparing Metrics

Selecting Experiments

Selecting the Data to Be Displayed

Setting Defaults

Searching for Names or Metric Values

Generating and Using a Mapfile

6. The er_print Command Line Performance Analysis Tool

er_print Syntax

Metric Lists

Function List Commands

Callers-Callees List Commands

Source and Disassembly Listing Commands

Memory Allocation List Commands

Filtering Commands

Selection Lists

Selection Commands

Listing of Selections

Metric List Commands

Defaults Commands

Output Commands

Other Display Commands

Mapfile Generation Command

Control Commands

Information Commands

Obsolete Commands

7. Understanding the Performance Analyzer and Its Data

Interpreting Performance Metrics

Clock-Based Profiling

Synchronization Wait Tracing

Hardware-Counter Overflow Profiling

Heap Tracing

MPI Tracing

Call Stacks and Program Execution

Single-Threaded Execution and Function Calls

Explicit Multithreading

Parallel Execution and Compiler-Generated Body Functions

Incomplete Stack Unwinds

Mapping Addresses to Program Structure

The Process Image

Load Objects and Functions

Aliased Functions

Non-Unique Function Names

Static Functions From Stripped Shared Libraries

Fortran Alternate Entry Points

Cloned Functions

Inlined Functions

Compiler-Generated Body Functions

Outline Functions

Dynamically Compiled Functions

The <Unknown> Function

The <Total> Function

Annotated Code Listings

Annotated Source Code

Annotated Disassembly Code

8. Manipulating Experiments and Viewing Annotated Code Listings

Manipulating Experiments

Viewing Annotated Code Listings With er_src

Other Utilities

The er_archive Utility

The er_export Utility

A. Profiling Programs With prof, gprof, and tcov

Using prof to Generate a Program Profile

Using gprof to Generate a Call Graph Profile

Using tcov for Statement-Level Analysis

Creating tcov Profiled Shared Libraries

Locking Files

Errors Reported by tcov Runtime Functions

Using tcov Enhanced for Statement-Level Analysis

Creating Profiled Shared Libraries for tcov Enhanced

Locking Files

tcov Directories and Environment Variables

Index