Sun Logo


Sun HPC ClusterToolstrademark 6 Software User's Guide

819-4131-10



Contents

Figures

Tables

Preface

1. Introduction to Sun HPC ClusterTools Software

Supported Configurations

Sun HPC Cluster Runtime Environment (CRE)

Executing Programs With mprun

Killing Programs

Displaying Job Information

Displaying Node Information

Integration With Distributed Resource Management Systems

Sun MPI and MPI I/O

Debugging With TotalView

MPProf

2. Fundamental Concepts

Clusters and Nodes

Partitions

How Partitions Are Enabled and Selected

Load Balancing

Processes

Jobs

How the CRE Environment Is Integrated With Distributed Resource Management Systems

How Programs Are Launched

How Distributed Resource Managers Work

How CRE Works With Zones in the Solaris 10 Operating System

3. Before You Begin

Prerequisites

Command and Man Page Paths

Authentication Methods

Core Files

4. Running Programs With mprun

Syntax

Pre-Entering Command Options with MPRUN-FLAGS

Environment Variables Available for Scripts

Controlling Where the Program Runs

Precedence for Program Execution

procedure iconsmall spaceTo Run a Program With Default Settings

procedure iconsmall spaceTo Run on a Different Cluster (-c)

procedure iconsmall spaceTo Run on a Different Partition (-p)

procedure iconsmall spaceTo Run as Multiple Processes (-np)

procedure iconsmall spaceTo Share Nodes (-j)

procedure iconsmall spaceTo Enable Process Spawning (-Ys)

procedure iconsmall spaceTo Disable Process Spawning (-Ns)

procedure iconsmall spaceTo Wrap Multiple Processes (-W)

procedure iconsmall spaceTo Settle for Available Processes (-S)

procedure iconsmall spaceTo Include Independent Nodes (-u)

procedure iconsmall spaceTo Combine Process Placement Options

Mapping MPI Processes to Nodes

procedure iconsmall spaceTo Distribute Processes Among Nodes (-l)

procedure iconsmall spaceTo Distribute Processes by Block (-Z and -Zt)

procedure iconsmall spaceTo Distribute Processes by Rank Map (-m)

Restrictions

procedure iconsmall spaceTo Reserve Resources For Spawning or Multithreading (-nr)

procedure iconsmall spaceTo Select Nodes by Resource Requirement (-R)

Examples

Controlling Input/Output

procedure iconsmall spaceTo Redirect Output to mprun (-D)

procedure iconsmall spaceTo Redirect Output to Individual Files (-B)

procedure iconsmall spaceTo Shut Off All Standard I/O (-N)

procedure iconsmall spaceTo Redirect With an Argument Vector (-A)

procedure iconsmall spaceTo Read Standard Input From /dev/null (-n)

procedure iconsmall spaceTo Redirect With a Custom Configuration (-I)

Redirecting Output to Other File Descriptors

Redirecting File Descriptor Output to a File

Maximum Number of File Descriptors

Using mprun Options Instead of Shell Syntax

Controlling Other Job Attributes

procedure iconsmall spaceTo Include Shell-Specific Actions

procedure iconsmall spaceTo Move a Process to the Background

procedure iconsmall spaceTo Change the Working Directory (-C)

procedure iconsmall spaceTo Use a Different User Name (-U)

procedure iconsmall spaceTo Use a Different Group Name (-G)

procedure iconsmall spaceTo Run a Job on a Different Project (-P)

procedure iconsmall spaceTo Specify Verbose Output (-v)

procedure iconsmall spaceTo Display Command Help (-h)

procedure iconsmall spaceTo Display the Command's Version (-V)

procedure iconsmall spaceTo Display Job Status Information (-J)

procedure iconsmall spaceTo Store Job Name in a File (-d)

procedure iconsmall spaceTo Tag Output With Its Rank Number (-o)

Command Reference (mprun)

5. Running Programs With mprun in Distributed Resource Management Systems

mprun Options for DRM Integration

Improper Flag Combinations for Batch Jobs

Running Parallel Jobs in the PBS Environment

procedure iconsmall spaceTo Run an Interactive Job in PBS

procedure iconsmall spaceTo Run a Script Job in PBS

Running Parallel Jobs in the LSF Environment

procedure iconsmall spaceTo Run an Interactive Job in LSF

procedure iconsmall spaceTo Run a Script Job in LSF

Running Parallel Jobs in the SGE Environment

procedure iconsmall spaceTo Run an Interactive Job in SGE

procedure iconsmall spaceTo Run a Script Job in SGE

6. Killing or Sending Signals to Programs With mpkill

What You Can Do

Return Values

procedure iconsmall spaceTo Kill a Running Program

procedure iconsmall spaceTo Remove All Traces of a Job

procedure iconsmall spaceTo Display a List of Supported Signals (-l -d)

procedure iconsmall spaceTo Send a Signal to a Job

7. Displaying Program Information With mpps

What You Can Do

procedure iconsmall spaceTo Display Job Status

procedure iconsmall spaceTo Display Information About Individual Jobs (-J)

procedure iconsmall spaceTo Display Job Name, PID, and Host of Current Job (-b)

procedure iconsmall spaceTo Display Information About All Jobs (-e)

procedure iconsmall spaceTo Display a Job's Start Time (-f)

procedure iconsmall spaceTo Display Job Information by Partition (-A -a)

procedure iconsmall spaceTo Display Job Information by Process (-p -P)

Command Reference (mpps)

8. Profiling Programs With MPPROF

Enabling MPI Profiling

Controlling Data Collection

MPI_PROFDATADIR

MPI_PROFINDEXFDIR

MPI_PROFINTERVAL

MPI_PROFMAXFILESIZE

Using mpprof to Generate Reports

mpprof Command Syntax

Generating a Message Passing Report

Reporting on Specific Processes

Reporting Processes That Occur After a Specified Time Interval

To Save Report Output for Later Use

A Sample Report

Using mpdump to Convert Intermediate Binary Files to ASCII Files

The mpdump Command Syntax

A Sample mpdump File

9. Using the DTrace Utility With Sun MPI

mprun Privileges

Running DTrace with MPI Programs

Running an MPI Program Under DTrace

Attaching to MPI Processes

Simple MPI Tracing

Tracking Down Resource Leaks

10. Displaying Information With mpinfo

What You Can Do

procedure iconsmall spaceTo Display Information About Published Names (-T)

procedure iconsmall spaceTo Display Information About Any Cluster (-c)

procedure iconsmall spaceTo Display Information About the Current Cluster (-C)

procedure iconsmall spaceTo Display Information About Individual Partitions (-p)

procedure iconsmall spaceTo Display Information About All Partitions (-P)

procedure iconsmall spaceTo Display Information About Individual Nodes (-n)

procedure iconsmall spaceTo Display Information About All Nodes (-N)

procedure iconsmall spaceTo Display an Online List of Valid Attributes (-lc, -lp, -ln)

procedure iconsmall spaceTo Restrict Output to Individual Attributes (-A)

procedure iconsmall spaceTo Display Information in Verbose Mode (-v)

Command Reference (mpinfo)

A. Troubleshooting

MPI Messages

Error Messages

Warning Messages

Standard Error Classes

MPI I/O Error Handling

Exceeding the File Descriptor Limit

Exceeding the TCP Port Limit

Index