The Sun HPC Cluster Runtime Environment (CRE) is a program execution environment that provides basic job launching and load-balancing capabilities.
This manual provides information needed to administer Sun HPC clusters on which MPI programs run under the CRE. The system administration topics covered in this manual are organized in the following manner:
Chapter 2, Getting Started provides quick-start instructions for getting an MPI job running on a Sun HPC cluster with newly installed Sun HPC ClusterTools software.
Chapter 3, Cluster Administration: A Primer provides an introduction to configuring a Sun HPC cluster, using both the administration command interface, mpadmin, and the cluster configuration file, hpc.conf.
Chapter 4, PFS Operations explains how to start and stop PFS daemons. If your cluster does not implement PFS file systems, ignore this chapter.
Chapter 5, Cluster Configuration Notes discusses various considerations that can influence how a Sun HPC cluster will be configured.
Chapter 6, mpadmin: Detailed Description provides a more comprehensive description of mpadmin features.
Chapter 7, hpc.conf: Detailed Description provides a more comprehensive description of the hpc.conf configuration file.
Chapter 8, Troubleshooting provides guidelines for performing routine maintenance and for recognizing and troubleshooting error conditions.
Appendix A, Installing and Removing the Software describes the procedure for installing Sun HPC ClusterTools 3.0 from the command line rather that using the graphical user interface supplied with the ClusterTools software.
Appendix B, Cluster Management Tools describes the Cluster Console Manager (CCM), a set of cluster administration tools.
The balance of this chapter provides an overview of the Sun HPC ClusterTools 3.0 software and the Sun HPC cluster hardware on which it runs.
A Sun HPC cluster configuration can range from a single Sun SMP (symmetric multiprocessor) server to a cluster of SMPs connected via any Sun-supported, TCP/IP-capable interconnect.
An individual SMP server within a Sun HPC cluster is referred to as a node.
The recommended interconnect technology for clustering Sun HPC servers is the Scalable Coherent Interface (SCI). SCI's bandwidth and latency characteristics make it the preferred choice for the cluster's primary network. An SCI network can be used to create Sun HPC clusters with up to four nodes.
Larger Sun HPC clusters can be built using a Sun-supported, TCP/IP interconnect, such as 100BaseT Ethernet or ATM. The CRE supports parallel jobs running on clusters of up to 64 nodes containing up to 256 CPUs.
Any Sun HPC node that is connected to a disk storage system can be configured as a Parallel File System (PFS) I/O server. PFS file systems are configured by editing the appropriate sections of the system configuration file, hpc.conf. See Chapter 7, hpc.conf: Detailed Description for details.
The CRE comprises two sets of daemons--the master daemons and the nodal daemons.
The master daemons consist of the tm.rdb, tm.mpmd, and tm.watchd. They run on one node exclusively, which is called the master node. There are two nodal daemons, tm.omd and tm.spmd. They run on all the nodes.
These two sets of daemons work cooperatively to maintain the state of the cluster and manage program execution. See "Overview of the CRE Daemons" for individual descriptions of the CRE daemons.
Sun HPC ClusterTools 3.0 Software is an integrated ensemble of parallel development tools that extend Sun's network computing solutions to high-end distributed-memory applications. The Sun HPC ClusterTools products can be used either in the Cluster Runtime Environment or with LSF Suite 3.2.2, Platform Computing Corporation's resource management software, extended with parallel support.
Sun HPC ClusterTools components run under Solaris 2.6 or Solaris 7 (32- or 64-bit).
Sun MPI is a highly optimized version of the Message-Passing Interface (MPI) communications library. Sun MPI implements all of the MPI 1.2 standard as well as a significant subset of the MPI 2.0 feature list. For example, Sun MPI provides the following features:
Support for multithreaded programming.
Seamless use of different network protocols; for example, code compiled on a Sun HPC cluster that has a Scalable Coherent Interface (SCI) network, can be run without change on a cluster that has an ATM network.
Multiprotocol support such that MPI picks the fastest available medium for each type of connection (such as shared memory, SCI, or ATM).
Communication via shared memory for fast performance on clusters of SMPs.
Finely tunable shared memory communication.
Optimized collectives for symmetric multiprocessors (SMPs).
Prism support - Users can develop, run, and debug programs in the Prism programming environment.
MPI I/O support for parallel file I/O.
Sun MPI is a dynamic library.
Sun MPI provides full F77, C, and C++ support and Basic F90 support.
Sun HPC ClusterTools's Parallel File System (PFS) component provides high-performance file I/O for multiprocess applications running in a cluster-based, distributed-memory environment.
PFS file systems closely resemble UFS file systems, but provide significantly higher file I/O performance by striping files across multiple PFS I/O server nodes. This means the time required to read or write a PFS file can be reduced by an amount roughly proportional to the number of file server nodes in the PFS file system.
PFS is optimized for the large files and complex data access patterns that are characteristic of parallel scientific applications.
Prism is the Sun HPC graphical programming environment. It allows you to develop, execute, debug, and visualize data in message-passing programs. With Prism you can
Control program execution, such as:
Start and stop execution.
Set breakpoints and traces.
Print values of variables and expressions.
Display the call stack.
Visualize data in various formats.
Analyze performance of MPI programs.
Control entire multiprocess parallel jobs, aggregating processes into meaningful groups, called process sets or psets.
Prism can be used with applications written in F77, F90, C, and C++.
The Sun Scalable Scientific Subroutine Library (Sun S3L) provides a set of parallel and scalable functions and tools that are used widely in scientific and engineering computing. It is built on top of MPI and provides the following functionality for Sun MPI programmers:
Vector and dense matrix operations (level 1, 2, 3 Parallel BLAS)
Iterative solvers for sparse systems
Matrix-vector multiply for sparse systems
FFT
LU factor and solve
Autocorrelation
Convolution/deconvolution
Tridiagonal solvers
Banded solvers
Eigensolvers
Singular value decomposition
Least squares
One-dimensional and multidimensional sorts
Selected ScaLAPACK and BLACS application program interface
Conversion between ScaLAPACK and S3L
Matrix transpose
Random number generators (linear congruential and lagged Fibonacci)
Random number generator and I/O for sparse systems
Matrix inverse
Array copy
Safety mechanism
An array syntax interface callable from message-passing programs
Toolkit functions for operations on distributed data
Support for the multiple instance paradigm (allowing an operation to be applied concurrently to multiple, disjoint data sets in a single call)
Thread safety
Detailed programming examples and support documentation provided online
Sun S3L routines can be called from applications written in F77, F90, C, and C++.
The Sun HPC ClusterTools 3.0 release supports the following Sun compilers:
Sun WorkShop Compilers C/C++ 4.2 (also included in Sun Visual WorkShop C++ 3.0)
Sun WorkShop Compilers Fortran 4.2 (also included in Sun Performance WorkShop Fortran 3.0)
Sun Visual WorkShop C++ 5.0
Sun Performance WorkShop Fortran 5.0
The Cluster Console Manager is a suite of applications (cconsole, ctelnet, and crlogin) that simplify cluster administration by enabling the administrator to initiate commands on all nodes in the cluster simultaneously. When invoked, the selected Cluster Console Manager application opens a master window and a set of terminal windows, one for each node in the cluster. Any command entered in the master window is broadcast to all the nodes in the cluster. The commands are echoed in the terminal windows, as are messages received from the respective nodes.
The Switch Management Agent (SMA) supports management of the Scalable Coherent Interface (SCI), including SCI session management and various link and switch states.