Sun HPC ClusterTools 3.0 Administrator's Guide: With LSF

Chapter 1 Introduction

This manual contains Sun HPC-specific system administration information that is not available in the LSF Batch Administrator's Guide.

Sun HPC system administrators should read the LSF documentation first and then read this manual to learn about issues not covered in the LSF documentation set.

The following list summarizes the topics covered in this manual:

The balance of this chapter provides an overview of the Sun HPC ClusterTools 3.0 release.

Sun HPC System Overview

A Sun HPC ClusterTools 3.0 system can be a single Sun SMP (symmetric multiprocessor) server or a cluster of these SMPs running both the Sun HPC ClusterTools 3.0 software and LSF Base, Batch, and Parallel software.

Sun HPC System Hardware

A Sun HPC system configuration can range from a single Sun SMP (symmetric multiprocessor) server to a cluster of SMPs connected by any Sun-supported, TCP/IP-capable interconnect.


Note -

An individual SMP server within a Sun HPC cluster is referred to as a node.


The recommended interconnect technology for clustering Sun HPC servers is the Scalable Coherent Interface (SCI). SCI's bandwidth and latency characteristics make it the preferred choice for the cluster's primary network. An SCI network can be used to create Sun HPC clusters with up to four nodes.

Larger Sun HPC clusters can be built using a Sun-supported TCP/IP interconnect, such as 100BaseT Ethernet or ATM. Individual parallel Sun HPC jobs can have up to 1024 processes running on as many as 64 nodes.

Any Sun HPC node that is connected to a disk storage system can be configured as a Parallel File System (PFS) I/O server. See Chapter 4, PFS Configuration Notes and Chapter 5, Starting and Stopping PFS Daemons for additional information about PFS I/O servers and PFS file systems.

Sun HPC ClusterTools 3.0 Software and LSF Suite 3.2.3 Software

Sun HPC ClusterTools 3.0 software is an integrated ensemble of parallel development tools that extend Sun's network computing solutions to high-end distributed-memory applications. The Sun HPC ClusterTools products are teamed with LSF Suite 3.2.3, Platform Computing Corporation's resource management software.

The Sun HPC ClusterTools 3.0 software runs under Solaris 2.6 or Solaris 7 (32-bit or 64-bit).

Load Sharing Facility

LSF Suite 3.2.3 is a collection of resource-management products that provide distributed batch scheduling, load balancing, job execution, and job termination services across a network of computers. The LSF products required by Sun HPC ClusterTools 3.0 software are: LSF Base, LSF Batch, and LSF Parallel.

Refer to the LSF Administrator's Guide for a fuller description of LSF Base and LSF Batch and to the LSF Parallel User's Guide for more information about LSF Parallel.

LSF supports the concept of interactive batch execution of Sun HPC jobs as well the conventional batch method. Interactive batch mode allows users to submit jobs through the LSF Batch system and remain attached to the job throughout execution.

Sun MPI and MPI I/O

Sun MPI is a highly optimized version of the Message-Passing Interface (MPI) communications library. Sun MPI implements all of the MPI 1.2 standard as well as a significant subset of the MPI 2.0 feature list. For example, Sun MPI provides the following features:

Sun MPI and MPI I/O provide full F77, C, and C++ support and Basic F90 support.

Parallel File System

Sun HPC ClusterTools's Parallel File System (PFS) component provides high-performance file I/O for multiprocess applications running in a cluster-based, distributed-memory environment.

PFS file s closely resemble UFS file s, but provide significantly higher file I/O performance by striping files across multiple PFS I/O server nodes. This means the time required to read or write a PFS file can be reduced by an amount roughly proportional to the number of file server nodes in the PFS file .

PFS is optimized for the large files and complex data access patterns that are characteristic of parallel scientific applications.

Prism

Prism is the Sun HPC graphical programming environment. It allows you to develop, execute, debug, and visualize data in message-passing programs. With Prism you can

Prism can be used with applications written in F77, F90, C, and C++.

Sun S3L

The Sun Scalable Scientific Subroutine Library (Sun S3L) provides a set of parallel and scalable functions and tools that are used widely in scientific and engineering computing. It is built on top of MPI and provides the following functionality for Sun MPI programmers:

Sun S3L routines can be called from applications written in F77, F90, C, and C++.

SunCompilers

The Sun HPC ClusterTools 3.0 release supports the following Sun compilers:

Cluster Console Manager

The Cluster Console Manager is a suite of applications (cconsole, ctelnet, and crlogin) that simplify cluster administration by enabling the administrator to initiate commands on all nodes in the cluster simultaneously. When invoked, the selected Cluster Console Manager application opens a master window and a set of terminal windows, one for each node in the cluster. Any command entered in the master window is broadcast to all the nodes in the cluster. The commands are echoed in the terminal windows, as are messages received from the respective nodes.

Switch Management Agent

The Switch Management Agent (SMA) supports management of the Scalable Coherent Interface (SCI), including SCI session management and various link and switch states.