Sun Logo


Sun HPC ClusterToolstrademark 6 Software Administrator's Guide

819-4132-10



Contents

Preface

1. Introduction

Sun HPC Clusters

Cluster Runtime Environment Daemons

Sun HPC ClusterTools Software

Sun CRE's Integration With Batch Processing Systems

Sun MPI and MPI I/O

Loadable Protocol Modules

Related Tools

Sun Compilers

Cluster Console Manager

2. Getting Started

Fundamental Sun CRE Concepts

Cluster of Nodes

Security

Partitions

Load Balancing

Jobs and Processes

Communication Protocols

Activating the Sun HPC ClusterTools Software

Activating Specified Nodes From a Central Host

Activating the Local Node

Verifying Basic Functionality

Checking That the Nodes Are Up

Creating a Default Partition

Verifying That Sun CRE Executes Jobs

Verifying MPI Communications

Stopping and Restarting Sun CRE

Stopping and Starting Sun CRE Daemons From a Central Host

Stopping Daemons on Specified Cluster Nodes

Starting Daemons on Specified Cluster Nodes

Stopping and Starting Sun CRE Daemons on the Local Node

Stopping Daemons Locally

Starting Daemons Locally

3. Overview of Administration Controls

Sun CRE Daemons

Master Daemon tm.rdb

Master Daemon tm.mpmd

Master Daemon tm.watchd

Nodal Daemon tm.omd

Nodal Daemon tm.spmd

Spin Daemon spind

mpadmin: Administration Interface

Introduction to mpadmin

Commonly Used mpadmin Options

Understanding Objects, Attributes, and Contexts

Objects and Attributes

Contexts

mpadmin Prompts

Performing Sample mpadmin Tasks

Listing Names of Nodes

Enabling Nodes

Creating and Enabling Partitions

Customizing Cluster Attributes

Quitting mpadmin

Cluster Configuration File hpc.conf

Preparing to Edit hpc.conf

Stopping the Sun CRE Daemons

Copying the hpc.conf Template

Specifying MPI Options

Updating the Sun CRE Database

Authentication and Security

Setting the Sun CRE Cluster Password

Establishing the Current Authentication Method

Setting Up the Default Authentication

Setting Up DES Authentication

Setting Up Kerberos Authentication

4. Cluster Configuration Notes

Nodes

Number of CPUs

Memory

Swap Space

Interconnects

Sun HPC ClusterTools Internode Communication

Administrative Traffic

Sun CRE-Generated Traffic

Sun MPI Interprocess Traffic

Parallel I/O Traffic

Network Characteristics

Bandwidth

Latency

Performance Under Load

Close Integration With Batch Processing Systems

How Close Integration Works

How Close Integration Is Used

procedure iconsmall spaceTo Enable Close Integration

procedure iconsmall spaceTo Configure the hpc.conf File

procedure iconsmall spaceTo Configure the sunhpc.allow File

procedure iconsmall spaceTo Configure PBS For Close Integration

procedure iconsmall spaceTo Configure LSF for Close Integration

procedure iconsmall spaceTo Configure SGE For Close Integration

5. mpadmin: Detailed Description

mpadmin Syntax

Command-Line Options

-c command - Single Command Option

-f file-name - Take Input From a File

-h - Display Help

-q - Suppress Warning Message

-s cluster-name - Connect to Specified Cluster

-V - Version Display Option

mpadmin Objects, Attributes, and Contexts

mpadmin Objects and Attributes

mpadmin Contexts

mpadmin Command Overview

Types of mpadmin Commands

Configuration Control

create

delete

Attribute Control

set

unset

Context Navigation

current

top

up

node

partition

Information Retrieval

dump

list

show

Miscellaneous Commands

connect

echo

help

quit/exit

Additional mpadmin Functionality

Multiple Commands on a Line

Command Abbreviation

Using mpadmin

Note on Naming Partitions and Custom Attributes

Logging In to the Cluster

Customizing Cluster-Level Attributes

default_interactive_partition

logfile

administrator

Managing Nodes

Node Commands

Node Attributes

Deleting Nodes

Managing Partitions

Partition Commands

Viewing Existing Partitions

Creating a Partition

Configuring Partitions

Partition Attributes

Enabling Partitions

Disabling Partitions

Deleting Partitions

Setting Custom Attributes

6. hpc.conf Configuration File

ShmemResource Section

Guidelines for Setting Limits

MPIOptions Section

Setting MPI Spin Policy

CREOptions Section

Specifying the Cluster

Logging System Events

Enabling Core Files

Enabling Authentication

Changing the Maximum Number of Published Names

Identifying A Default Resource Manager

Limiting mprun's Ability to Launch Programs in Batch Mode

HPCNodes Section

PMODULES Section

PM Section

NAME Column

RANK Column

TCP-IP PM Section

Propagating hpc.conf Information

7. Maintenance and Troubleshooting

Cleaning Up Defunct Sun CRE Jobs

Removing Sun CRE Jobs That Have Exited

Removing Sun CRE Jobs That Have Not Terminated

Killing Orphaned Processes

Using Diagnostics

Using Network Diagnostics

Checking Load Averages

Using Interval Diagnostics

Interpreting Sun CRE Error Messages

Anticipating Common Problems

Understanding Protocol-Related Errors

Errors When Sun CRE Daemons Load Protocol Modules

Errors When Protocol Modules Discover Interfaces

Recovering From System Failure

procedure iconsmall spaceTo Reboot Sun CRE:

A. Cluster Console Manager Tools

Cluster Console Manager

Launching Cluster Console Tools

Common Window

Hosts Menu

Select Hosts Dialog

procedure iconsmall spaceTo Add a Single Node

procedure iconsmall spaceTo Add All Nodes in a Cluster

procedure iconsmall spaceTo Remove a Node

Options Menu

Help Menu

Text Field

Term Windows

Using the Cluster Console

Administering Configuration Files

The clusters File

The serialports File

Index