Sun HPC ClusterTools 6 Software Administrator's Guide
|
    |
Sun HPC ClusterTools
6 Software Administrator's Guide
819-4132-10
Preface
1. Introduction
Sun HPC Clusters
Cluster Runtime Environment Daemons
Sun HPC ClusterTools Software
Sun CRE's Integration With Batch Processing Systems
Sun MPI and MPI I/O
Loadable Protocol Modules
Related Tools
Sun Compilers
Cluster Console Manager
2. Getting Started
Fundamental Sun CRE Concepts
Cluster of Nodes
Security
Partitions
Load Balancing
Jobs and Processes
Communication Protocols
Activating the Sun HPC ClusterTools Software
Activating Specified Nodes From a Central Host
Activating the Local Node
Verifying Basic Functionality
Checking That the Nodes Are Up
Creating a Default Partition
Verifying That Sun CRE Executes Jobs
Verifying MPI Communications
Stopping and Restarting Sun CRE
Stopping and Starting Sun CRE Daemons From a Central Host
Stopping Daemons on Specified Cluster Nodes
Starting Daemons on Specified Cluster Nodes
Stopping and Starting Sun CRE Daemons on the Local Node
Stopping Daemons Locally
Starting Daemons Locally
3. Overview of Administration Controls
Sun CRE Daemons
Master Daemon tm.rdb
Master Daemon tm.mpmd
Master Daemon tm.watchd
Nodal Daemon tm.omd
Nodal Daemon tm.spmd
Spin Daemon spind
mpadmin: Administration Interface
Introduction to mpadmin
Commonly Used mpadmin Options
Understanding Objects, Attributes, and Contexts
Objects and Attributes
Contexts
mpadmin Prompts
Performing Sample mpadmin Tasks
Listing Names of Nodes
Enabling Nodes
Creating and Enabling Partitions
Customizing Cluster Attributes
Quitting mpadmin
Cluster Configuration File hpc.conf
Preparing to Edit hpc.conf
Stopping the Sun CRE Daemons
Copying the hpc.conf Template
Specifying MPI Options
Updating the Sun CRE Database
Authentication and Security
Setting the Sun CRE Cluster Password
Establishing the Current Authentication Method
Setting Up the Default Authentication
Setting Up DES Authentication
Setting Up Kerberos Authentication
4. Cluster Configuration Notes
Nodes
Number of CPUs
Memory
Swap Space
Interconnects
Sun HPC ClusterTools Internode Communication
Administrative Traffic
Sun CRE-Generated Traffic
Sun MPI Interprocess Traffic
Parallel I/O Traffic
Network Characteristics
Bandwidth
Latency
Performance Under Load
Close Integration With Batch Processing Systems
How Close Integration Works
How Close Integration Is Used

To Enable Close Integration

To Configure the hpc.conf File

To Configure the sunhpc.allow File

To Configure PBS For Close Integration

To Configure LSF for Close Integration

To Configure SGE For Close Integration
5. mpadmin: Detailed Description
mpadmin Syntax
Command-Line Options
-c command - Single Command Option
-f file-name - Take Input From a File
-h - Display Help
-q - Suppress Warning Message
-s cluster-name - Connect to Specified Cluster
-V - Version Display Option
mpadmin Objects, Attributes, and Contexts
mpadmin Objects and Attributes
mpadmin Contexts
mpadmin Command Overview
Types of mpadmin Commands
Configuration Control
create
delete
Attribute Control
set
unset
Context Navigation
current
top
up
node
partition
Information Retrieval
dump
list
show
Miscellaneous Commands
connect
echo
help
quit/exit
Additional mpadmin Functionality
Multiple Commands on a Line
Command Abbreviation
Using mpadmin
Note on Naming Partitions and Custom Attributes
Logging In to the Cluster
Customizing Cluster-Level Attributes
default_interactive_partition
logfile
administrator
Managing Nodes
Node Commands
Node Attributes
Deleting Nodes
Managing Partitions
Partition Commands
Viewing Existing Partitions
Creating a Partition
Configuring Partitions
Partition Attributes
Enabling Partitions
Disabling Partitions
Deleting Partitions
Setting Custom Attributes
6. hpc.conf Configuration File
ShmemResource Section
Guidelines for Setting Limits
MPIOptions Section
Setting MPI Spin Policy
CREOptions Section
Specifying the Cluster
Logging System Events
Enabling Core Files
Enabling Authentication
Changing the Maximum Number of Published Names
Identifying A Default Resource Manager
Limiting mprun's Ability to Launch Programs in Batch Mode
HPCNodes Section
PMODULES Section
PM Section
NAME Column
RANK Column
TCP-IP PM Section
Propagating hpc.conf Information
7. Maintenance and Troubleshooting
Cleaning Up Defunct Sun CRE Jobs
Removing Sun CRE Jobs That Have Exited
Removing Sun CRE Jobs That Have Not Terminated
Killing Orphaned Processes
Using Diagnostics
Using Network Diagnostics
Checking Load Averages
Using Interval Diagnostics
Interpreting Sun CRE Error Messages
Anticipating Common Problems
Understanding Protocol-Related Errors
Errors When Sun CRE Daemons Load Protocol Modules
Errors When Protocol Modules Discover Interfaces
Recovering From System Failure

To Reboot Sun CRE:
A. Cluster Console Manager Tools
Cluster Console Manager
Launching Cluster Console Tools
Common Window
Hosts Menu
Select Hosts Dialog

To Add a Single Node

To Add All Nodes in a Cluster

To Remove a Node
Options Menu
Help Menu
Text Field
Term Windows
Using the Cluster Console
Administering Configuration Files
The clusters File
The serialports File
Index
Sun HPC ClusterTools 6 Software Administrator's Guide
|
819-4132-10
|
    |
Copyright © 2006, Sun Microsystems, Inc. All Rights Reserved.