Sun HPC ClusterTools 3.0 Administrator's Guide: With CRE

Chapter 1 Introduction

The Sun HPC Cluster Runtime Environment (CRE) is a program execution environment that provides basic job launching and load-balancing capabilities.

This manual provides information needed to administer Sun HPC clusters on which MPI programs run under the CRE. The system administration topics covered in this manual are organized in the following manner:

The balance of this chapter provides an overview of the Sun HPC ClusterTools 3.0 software and the Sun HPC cluster hardware on which it runs.

Sun HPC System Hardware

A Sun HPC cluster configuration can range from a single Sun SMP (symmetric multiprocessor) server to a cluster of SMPs connected via any Sun-supported, TCP/IP-capable interconnect.


Note -

An individual SMP server within a Sun HPC cluster is referred to as a node.


The recommended interconnect technology for clustering Sun HPC servers is the Scalable Coherent Interface (SCI). SCI's bandwidth and latency characteristics make it the preferred choice for the cluster's primary network. An SCI network can be used to create Sun HPC clusters with up to four nodes.

Larger Sun HPC clusters can be built using a Sun-supported, TCP/IP interconnect, such as 100BaseT Ethernet or ATM. The CRE supports parallel jobs running on clusters of up to 64 nodes containing up to 256 CPUs.

Any Sun HPC node that is connected to a disk storage system can be configured as a Parallel File System (PFS) I/O server. PFS file systems are configured by editing the appropriate sections of the system configuration file, hpc.conf. See Chapter 7, hpc.conf: Detailed Description for details.

The Cluster Runtime Environment

The CRE comprises two sets of daemons--the master daemons and the nodal daemons.

The master daemons consist of the tm.rdb, tm.mpmd, and tm.watchd. They run on one node exclusively, which is called the master node. There are two nodal daemons, tm.omd and tm.spmd. They run on all the nodes.

These two sets of daemons work cooperatively to maintain the state of the cluster and manage program execution. See "Overview of the CRE Daemons" for individual descriptions of the CRE daemons.

Sun HPC ClusterTools 3.0 Software

Sun HPC ClusterTools 3.0 Software is an integrated ensemble of parallel development tools that extend Sun's network computing solutions to high-end distributed-memory applications. The Sun HPC ClusterTools products can be used either in the Cluster Runtime Environment or with LSF Suite 3.2.2, Platform Computing Corporation's resource management software, extended with parallel support.

Sun HPC ClusterTools components run under Solaris 2.6 or Solaris 7 (32- or 64-bit).

Sun MPI and MPI I/O

Sun MPI is a highly optimized version of the Message-Passing Interface (MPI) communications library. Sun MPI implements all of the MPI 1.2 standard as well as a significant subset of the MPI 2.0 feature list. For example, Sun MPI provides the following features:

Sun MPI provides full F77, C, and C++ support and Basic F90 support.

Parallel File System

Sun HPC ClusterTools's Parallel File System (PFS) component provides high-performance file I/O for multiprocess applications running in a cluster-based, distributed-memory environment.

PFS file systems closely resemble UFS file systems, but provide significantly higher file I/O performance by striping files across multiple PFS I/O server nodes. This means the time required to read or write a PFS file can be reduced by an amount roughly proportional to the number of file server nodes in the PFS file system.

PFS is optimized for the large files and complex data access patterns that are characteristic of parallel scientific applications.

Prism

Prism is the Sun HPC graphical programming environment. It allows you to develop, execute, debug, and visualize data in message-passing programs. With Prism you can

Prism can be used with applications written in F77, F90, C, and C++.

Sun S3L

The Sun Scalable Scientific Subroutine Library (Sun S3L) provides a set of parallel and scalable functions and tools that are used widely in scientific and engineering computing. It is built on top of MPI and provides the following functionality for Sun MPI programmers:

Sun S3L routines can be called from applications written in F77, F90, C, and C++.

Sun Compilers

The Sun HPC ClusterTools 3.0 release supports the following Sun compilers:

Cluster Console Manager

The Cluster Console Manager is a suite of applications (cconsole, ctelnet, and crlogin) that simplify cluster administration by enabling the administrator to initiate commands on all nodes in the cluster simultaneously. When invoked, the selected Cluster Console Manager application opens a master window and a set of terminal windows, one for each node in the cluster. Any command entered in the master window is broadcast to all the nodes in the cluster. The commands are echoed in the terminal windows, as are messages received from the respective nodes.

Switch Management Agent

The Switch Management Agent (SMA) supports management of the Scalable Coherent Interface (SCI), including SCI session management and various link and switch states.