Sun Oracle Logo


Sun HPC ClusterTools 8.2.1c Software

User’s Guide

821-1319-10



Contents

Preface

1. Introduction to Sun HPC ClusterTools Software

Supported Configurations

Open Runtime Environment (ORTE)

Executing Programs With mpirun

Integration With Distributed Resource Management Systems

Open MPI Features

Debugging With TotalView

Performance Analysis With Sun Studio Analyzer

Communications Failover Feature In Multi-Rail Infiniband Configurations

2. Fundamental Concepts

Clusters and Nodes

Processes

How Programs Are Launched

How the Open MPI Environment Is Integrated With Distributed Resource Management Systems

Using Sun Grid Engine With ORTE

Submitting Jobs Under Sun Grid Engine Integration

MCA Parameters

How ORTE Works With Zones in the Solaris 10 Operating System

3. Setting Up Your Environment

Prerequisites

Command and Man Page Paths

Setting Up Your Path

Core Files

4. Compiling MPI Programs

Supported Compilers

Using the Compiler Wrappers

Using Non-Default Error Handlers

Compiling Fortran 90 Programs

5. Running Programs With the mpirun Command

About the mpirun Command

Syntax for the mpirun Command

mpirun Options

Using Environment Variables With the mpirun Command

Using MCA Parameters With the mpirun Command

mpirun Command Examples

procedure iconsmall spaceTo Run a Program With Default Settings

procedure iconsmall spaceTo Run Multiple Processes

procedure iconsmall spaceTo Direct mpirun By Using an Appfile

Mapping MPI Processes to Nodes

Specifying Available Hosts

Specifying Hosts By Using a Hostfile

Specifying Hosts By Using the --host Option

procedure iconsmall spaceTo Specify Multiple Slots Using the --host Option

Excluding Hosts From Scheduling By Using the --host Option

Oversubscribing Nodes

Scheduling Policies

Scheduling By Slot

procedure iconsmall spaceTo Specify By-Slot Scheduling

Scheduling By Node

procedure iconsmall spaceTo Specify By-Node Scheduling

Comparing By-Slot to By-Node Scheduling

Binding MPI Processes

mpirun Options

MCA Parameters

Rankfiles

Controlling Input/Output

procedure iconsmall spaceTo Redirect Standard I/O

Controlling Other Job Attributes

procedure iconsmall spaceTo Change the Working Directory

procedure iconsmall spaceTo Specify Debugging Output

procedure iconsmall spaceTo Display Command Help (-h)

Submitting Jobs Under Sun Grid Engine Integration

Defining Parallel Environment (PE) and Queue

procedure iconsmall spaceTo Use PE Commands

procedure iconsmall spaceTo Use Queue Commands

Submitting Jobs in Interactive Mode

procedure iconsmall spaceTo Set the Interactive Display

procedure iconsmall spaceTo Submit Jobs Interactively

procedure iconsmall spaceTo Verify That Sun Grid Engine Is Running

procedure iconsmall spaceTo Start an Interactive Session Using qrsh

Using MPI Client/Server Applications

procedure iconsmall spaceTo Launch the Client/Server Job

Using Name Publishing

Troubleshooting Client/Server Jobs

Handling Network Failures In Multi-Rail Infiniband Configurations

Optimizing Failover Timing

Viewing Failures

Forcing Failovers for Port Event Errors

Standard Out Example

Syslog Example

For More Information

6. Running Programs With mpirun in Distributed Resource Management Systems

mpirun Options for Third-Party Resource Manager Integration

Checking Your Open MPI Configuration

procedure iconsmall spaceTo Check for rsh/ssh

procedure iconsmall spaceTo Check for PBS/Torque

procedure iconsmall spaceTo Check for Sun Grid Engine

Running Parallel Jobs in the PBS Environment

procedure iconsmall spaceTo Run an Interactive Job in PBS

procedure iconsmall spaceTo Run a Batch Job in PBS

Running Parallel Jobs in the Sun Grid Engine Environment

Defining Parallel Environment (PE) and Queue

procedure iconsmall spaceTo Use PE Commands

procedure iconsmall spaceTo Use Queue Commands

Submitting Jobs Under Sun Grid Engine Integration

procedure iconsmall spaceTo Set the Interactive Display

procedure iconsmall spaceTo Submit Jobs in Batch Mode

procedure iconsmall spaceTo See a Running Job

procedure iconsmall spaceTo Delete a Running Job

rsh Limitations

Using rsh as the Job Launcher

Using Sun Grid Engine as the Job Launcher

For More Information

7. Using MCA Parameters With mpirun

About the Modular Component Architecture

Open MPI Frameworks

The ompi_info Command

Using the ompi_info Command With MCA Parameters

procedure iconsmall spaceTo List All MCA Parameters

procedure iconsmall spaceTo List All MCA Parameters For a Framework

procedure iconsmall spaceTo Display All MCA Parameters For a Selected Component

Using MCA Parameters

procedure iconsmall spaceTo Set MCA Parameters From the Command Line

Using MCA Parameters As Environment Variables

procedure iconsmall spaceTo Set MCA Parameters in the sh Shell

procedure iconsmall spaceTo Set MCA Parameters in the C Shell

procedure iconsmall spaceTo Specify MCA Parameters Using a Text File

Including and Excluding Components

procedure iconsmall spaceTo Include and Exclude Components Using the Command Line

Using MCA Parameters With Sun Grid Engine

Changing the Default Values in MCA Parameters

For More Information

8. Using the DTrace Utility With Open MPI

Checking the mpirun Privileges

procedure iconsmall spaceTo Determine the Correct Privileges on the Cluster

Running DTrace with MPI Programs

Running an MPI Program Under DTrace

procedure iconsmall spaceTo Trace a Program Using the mpitrace.d Script

procedure iconsmall spaceTo Trace a Parallel Program and Get Separate Trace Files

Attaching DTrace to a Running MPI Program

procedure iconsmall spaceTo Attach DTrace to a Running MPI Program

Simple MPI Tracing

Tracking Down Resource Leaks

Using the DTrace mpiperuse Provider

DTrace Support in the ClusterTools Software

Available mpiperuse Probes

Specifying an mpiperuse Probe in a D Script

Available Arguments

How To Use mpiperuse Probes to See Message Queues

mpiperuse Usage Examples

procedure iconsmall spaceTo Count the Number of Messages To or From a Host

procedure iconsmall spaceTo Count the Number of Messages To or From Specific BTLs

procedure iconsmall spaceTo Obtain Distribution Plots of Message Sizes Sent or Received From a Host

procedure iconsmall spaceTo Create Distribution Plots of Message Sizes By Communicator, Rank, and Send/Receive

A. Troubleshooting

MPI Messages

Standard Error Classes

MPI I/O Error Handling

Exceeding the File Descriptor Limit

Increasing the Number of Available File Descriptors

procedure iconsmall spaceTo View the Hard Limit from the C Shell

procedure iconsmall spaceTo View the Hard Limit from the Bourne Shell

procedure iconsmall spaceTo Increase the Number of File Descriptors

Setting File Descriptor Limits When Using Sun Grid Engine

Index