C H A P T E R  1

Introduction

The Sun Cluster Runtime Environment (CRE) is a program execution environment that provides basic job launching and load-balancing capabilities.

This manual provides information needed to administer Sun HPC clusters on which MPI programs run under Sun CRE or any other resource manager. The topics covered are organized in the following manner:

The balance of this chapter provides an overview of the Sun HPC ClusterTools software and the Sun HPC cluster hardware on which it runs.


Sun HPC Clusters

A Sun HPC hardware configuration can be a single Sun SMP (symmetric multiprocessor) server or multiple SMPs, or x64-based servers, interconnected into a cluster. Sun HPC ClusterTools software supports parallel jobs of up to 2048 processes per job running on clusters of up to 256 nodes.



Note - An individual SMP server within a Sun HPC cluster is referred to as a node.



Sun HPC clusters can also be built using any Sun-supported TCP/IP interconnect, such as Gigabit Ethernet and Infiniband.


Cluster Runtime Environment Daemons

Sun CRE comprises two sets of daemons--the master daemons and the nodal daemons. These two sets of daemons work cooperatively to maintain the state of the cluster and manage program execution.

The master daemons consist of the daemons tm.rdb, tm.mpmd, and tm.watchd. They run on one node exclusively, which is called the master node. There are two nodal daemons, tm.omd and tm.spmd. They run on all the nodes.


Sun HPC ClusterTools Software

Sun HPC ClusterTools software is an integrated suite of parallel development tools that extend Sun's network computing solutions to high-end distributed-memory applications.

Sun HPC ClusterTools components run under the Solaris 10 Operating System (OS).


Sun CRE's Integration With Batch Processing Systems

The Sun CRE environment provides close integration with several batch processing systems, also known as distributed resource managers (DRM). You can launch parallel jobs from a batch system to control resource allocation, and continue to use Sun CRE to monitor job status. The currently supported distributed resource managers are:

To launch a parallel job through a batch processing system, follow these general guidelines:

You can launch the parallel job either through a script or interactively. For details, see Close Integration With Batch Processing Systems.

The architecture used to implement close integration can easily accommodate new resource managers. For this purpose, it provides a Sun CRE wrapper library and a resource manager plugin interface. Both are described in the Sun MPI Software Programming and Reference Manual.

Sun MPI and MPI I/O

Sun MPI is a highly optimized version of the Message-Passing Interface (MPI) communications library. Sun MPI implements the MPI-2 standard. In addition, Sun MPI provides extensions such as support for multithreaded programming, MPI I/O support for parallel file I/O, and others as detailed in the Sun MPI documentation.

Sun MPI provides full F77, C, and C++ support and basic F90 support.

Loadable Protocol Modules

The Sun MPI library is capable of providing high-performance communications over several different protocols. Sun HPC ClusterTools software makes these protocols available to MPI programs: Shared Memory (SHM) and Transport Control Protocol (TCP).



Note - Remote Shared Memory (RSM) support is no longer available in Sun HPC ClusterTools.



Protocols are provided as dynamically loaded library modules, separate from the MPI library. The cluster administrator determines which protocols are available on a cluster and their relative priorities. The user need not be concerned with the details of any protocol underlying MPI communications.


Related Tools

Sun HPC ClusterTools software provides or makes use of several related tools, including Sun compilers and the Cluster Console Manager.

Sun Compilers

Sun HPC ClusterTools software supports Sun Studio Compiler Collection Versions 8, 9, 10, and 11 for C, C++, and Fortran compilers.

Cluster Console Manager

The Cluster Console Manager is a suite of applications (cconsole, ctelnet, and crlogin) that simplify cluster administration by enabling you to initiate commands on all nodes in the cluster simultaneously. Any command entered in the CCM's master window is broadcast to all the nodes in the cluster.

These applications are described in Appendix A of this manual.



Note - The Cluster Console tools, cconsole, ctelnet, and crlogin, are no longer shipped with ClusterTools 6 software. The Cluster Console Tools can be downloaded as part of the Sun Java Enterprise System (JES) at the following URL:

http://www.sun.com/software/javaenterprisesystem/index.xml