Sun Logo

Sun HPC ClusterToolstrademark 6 Software Performance Guide





Code Samples


1. Quick Reference

Compilation and Linking


Analyzer Profiling

Job Launch on a Multinode Cluster

MPI Programming Tips

2. Introduction: The Sun HPC ClusterTools Solution

Sun HPC Hardware




Sun HPC ClusterTools Software


Cluster Runtime Environment

3. Choosing Your Programming Model and Hardware

Starting Out

Programming Models


Amdahl's Law

Scaling Laws of Algorithms

Characterizing Platforms

Basic Hardware Factors

Other Factors

4. Performance Programming

General Good Programming

Clean Programming

Optimizing Local Computation

Optimizing MPI Communications

Reducing Message Volume

Reducing Serialization

Load Balancing



Nonblocking Operations


Sun MPI Collectives

Contiguous Data Types

Special Considerations for Message Passing Over TCP

MPI Communications Case Study

Algorithms Used

Algorithm 1

Algorithm 2

Algorithm 3

Algorithm 4

Algorithm 5

Making a Complete Program

Timing Experiments With the Algorithms

Baseline Results

Directed Polling

Increasing Sun MPI Internal Buffering

Use of MPI_Testall

5. One-Sided Communication

Introducing One-Sided Communication

Comparing Two-Sided and One-Sided Communications

Basic Sun MPI Performance Advice

Case Study: Matrix Transposition

Test Program A

Test Program B

Test Program C

Test Program D

Utility Routines


6. Compilation and Linking

Compiler Version

The mp* Utilities

The -fast Switch

The -xarch Switch

The -xalias Switch

The -g Switch

Other Useful Switches

7. Runtime Considerations and Tuning

Running on a Dedicated System

Setting Sun MPI Environment Variables

Are You Running on a Dedicated System?

Does the Code Use System Buffers Safely?

Are You Willing to Trade Memory for Performance?

Do You Want to Initialize Sun MPI Resources?

Is More Runtime Diagnostic Information Needed?

Launching Jobs on a Multinode Cluster

Minimizing Communication Costs

Load Balancing

Controlling Bisection Bandwidth

Considering the Role of I/O Servers

Running Jobs in the Background

Limiting Core Dumps

Using Line-Buffered Output

Multinode Job Launch Under CRE

Collocal Blocks of Processes

Multithreaded Job

Round-Robin Distribution of Processes

Detailed Mapping

8. Profiling

General Profiling Methodology

Basic Approaches

MPProf Profiling Tool

Sample MPProf Output


Load Balance

Sun MPI Environment Variables

Breakdown by MPI Routine

Time Dependence


Multithreaded Programs

The mpdump Utility

Managing Disk Files

Incorporating Environment Variable Suggestions

Performance Analyzer Profiling of Sun MPI Programs

Data Collection

Data Volume

Data Organization


Other Data Collection Issues

Analyzing Profiling Data

Case Study

Overview of Functions

MPI Wait Times

Other Profiling Approaches

Using the MPI Profiling Interface

Inserting MPI Timer Calls

Using the gprof Utility

A. Sun MPI Implementation

Yielding and Descheduling

Progress Engine

Shared-Memory Point-to-Point Message Passing

Postboxes and Buffers

Connection Pools Versus Send-Buffer Pools

Eager Versus Rendezvous

Performance Considerations

Full Versus Lazy Connections

Optimizations for Collective Operations

Network Awareness

Shared-Memory Optimizations


Multiple Algorithms

One-Sided Message Passing Using Remote Process

B. Sun MPI Environment Variables

Yielding and Descheduling


Shared-Memory Point-to-Point Message Passing

Memory Considerations

Performance Considerations


Shared-Memory Collectives

Running Over TCP

Summary Table Of Environment Variables
