Documentation Home
> Sun HPC ClusterTools 3.0 Administrator's Guide: With CRE
Sun HPC ClusterTools 3.0 Administrator's Guide: With CRE
Book Information
Preface
Chapter 1 Introduction
Sun HPC System Hardware
The Cluster Runtime Environment
Sun HPC ClusterTools 3.0 Software
Sun MPI and MPI I/O
Parallel File System
Prism
Sun S3L
Sun Compilers
Cluster Console Manager
Switch Management Agent
Chapter 2 Getting Started
Start the CRE Daemons
Verify Basic Functionality
Run mpinfo
Create a Default Partition
Verify That CRE Executes Jobs
Running Basic MPI Tests
Verify MPI Communications
Stopping and Restarting the CRE
To Shut Down the CRE Without Shutting Down Solaris
To Restart the CRE Without Rebooting Solaris
Overview of the CRE Daemons
The Role of tm.rdb
The Role of tm.mpmd
The Role of tm.watchd
The Role of tm.omd
The Role of tm.spmd
Chapter 3 Cluster Administration: A Primer
CRE Environment Variables
SUNHPC_CLUSTER
SUNHPC_CONFIG_DIR
SUNHPC_PART
mpadmin: Administration Interface
Introduction to mpadmin
mpadmin Syntax
-c command - Single Command Option
-f file-name - Take Input From a File
-h - Display Help
-q - Suppress Warning Message
-s cluster-name - Connect to Specified Cluster
-V - Version Display Option
mpadmin Objects, Attributes, and Contexts
mpadmin Objects and Attributes
mpadmin Contexts
mpadmin Prompts
List Names of Nodes
Enabling Nodes
Creating and Enabling Partitions
Example: Creating a Two-Node Partition
Example: Two Partitions Sharing a Node
Shared vs. Dedicated Partitions
Customizing Cluster Administration
Changing the logfile Attribute
Changing the administrator Attribute
Quitting mpadmin
hpc.conf: Cluster Configuration File
Prepare to Edit hpc.conf
Stop the CRE Daemons
Copy the hpc.conf Template
Create PFS I/O Servers
Create PFS File Systems
Parallel File System Name
Server Node Hostnames
Storage Device Names
Thread Limits
Set Up Network Interfaces
Interface Names
Rank Attribute
MTU Attribute
Stripe Attribute
Protocol Attribute
Latency Attribute
Bandwidth Attribute
Specify MPI Options
Update the CRE Database
Chapter 4 PFS Operations
Starting PFS I/O Daemons
Starting PFS Proxy Daemons
Stopping PFS I/O Daemons
Stopping PFS Proxy Daemons
PFS Node or I/O Daemon Failures
Chapter 5 Cluster Configuration Notes
Nodes
Number of CPUs
Memory
Swap Space
Interconnects
ClusterTools Internode Communication
Administrative Traffic
CRE-Generated Traffic
Sun MPI Interprocess Traffic
Prism Traffic
Parallel I/O Traffic
Network Characteristics
Bandwidth
Latency
Performance Under Load
Storage and the Parallel File System
PFS on SMPs and Clusters
PFS Using Individual Disks or Storage Arrays
PFS and Storage Placement
Separate Functions
Mixed Functions
Balancing Bandwidth for PFS Performance
Chapter 6 mpadmin: Detailed Description
mpadmin Syntax
Command-Line Options
-c command - Single Command Option
-f file-name - Take Input From a File
-h - Display Help
-q - Suppress Warning Message
-s cluster-name - Connect to Specified Cluster
-V - Version Display Option
mpadmin Objects, Attributes, and Contexts
mpadmin Objects and Attributes
mpadmin Command Overview
Types of mpadmin Commands
Configuration Control
create
delete
Attribute Control
set
unset
Context Navigation
current
Top
up
node
partition
network
Information Retrieval
dump
list
show
Miscellaneous Commands
connect
echo
help
quit/exit
Additional mpadmin Functionality
Multiple Commands on a Line
Command Abbreviation
Using mpadmin
Introductory Notes
Naming Partitions and Custom Attributes
Separate Name Spaces
Log In to the Cluster
Customizing Cluster-Level Attributes
default_interactive_partition
logfile
administrator
lock_max_age
Nodes and Network Interfaces
Node Commands
Node Attributes
enabled
master
max_locked_mem and min_unlocked_mem
max_total_procs
name
partition
shmem_minfree
Deleting Nodes
Recommendations
Using the delete Command
Partitions
Partition Commands
Creating Partitions
Prerequisites
Viewing Existing Partitions
Creating a Partition
Configuring Partitions
Partition Attributes
enabled
max_locked_mem and min_unlocked_mem
max_total_procs
name
no_logins
no_mp_tasks
nodes
shmem_minfree
Enabling Partitions
Prerequisite
Setting enabled
Disabling Partitions
Deleting Partitions
Setting Custom Attributes
Chapter 7 hpc.conf: Detailed Description
ShmemResource Section
Guidelines for Setting Limits
Netif Section
Interface Names
Rank Attribute
MTU Attribute
Stripe Attribute
Protocol Attribute
Latency Attribute
Bandwidth Attribute
MPIOptions Section
Overview
PFSFileSystem Section
Parallel File System Name
Server Node Hostnames
Storage Device Names
Thread Limits
PFSServers Section
PFS I/O Server Hostnames
Buffer Size
HPCNodes Section
Propagate hpc.conf Information
Chapter 8 Troubleshooting
Cleaning Up Defunct CRE Jobs
Removing CRE Jobs that have Exited
CRE Jobs that Have Not Terminated
Orphaned Processes
Diagnostics
Network Diagnostics
Checking Load Averages
Using Interval Diagnostics
Error Conditions and Troubleshooting Tips
Error Messages
Troubleshooting Tips
Procedures for Recovery
Re-creating the CRE Database
Appendix A Installing and Removing the Software
Installing at the Command Line
Before Installation
The hpc_config File
Accessing hpc_config
If You Have Already Installed the Software
Editing hpc_config
Supported Software Installation
LSF Support
LSF Parameter Modification
Name of the LSF Cluster
General Installation Information
Type of Installation
Installation Location
CD-ROM Mount Point
Information for NFS and Cluster-Local Installations
Installation Method Options
Hardware Information
SCI Support
Information for NFS Installations Only
NFS Server Host Name
Location of the Software on the Server
Sample hpc_config Files
Run cluster_tool_setup
Installing Software Packages
Removing the Software
Removing the Software: Configuration Tool
Removing the Software: Command Line
Removing and Reinstalling Individual Packages
Appendix B Cluster Management Tools
Launching Cluster Console Tools
Common Window
Menu Bar
Hosts Menu
Select Hosts Dialog Box
Options Menu
Help Menu
Text Field
Term Windows
Using CCM
Administering Configuration Files
The clusters File
The serialports File
© 2010, Oracle Corporation and/or its affiliates