C H A P T E R  2

Fundamental Concepts

This chapter summarizes a few basic concepts that you should understand to get the most out of Sun’s HPC ClusterTools software. It contains the following sections:


Clusters and Nodes

High performance computing clusters[1] are groups of servers interconnected by any Sun-supported interconnect. Each server in a cluster is called a node. A cluster can consist of a single node.

ORTE (Open Run-Time Environment) is the runtime support system for Open MPI that allows users to execute their applications in a distributed clustering environment.

When using ORTE, you can select the cluster and nodes on which your MPI programs will run and how your processes will be distributed among them. For instructions, see Chapter 5, “Running Programs With the mpirun Command.”

For more information about how Open MPI allocates computing resources, see the FAQ entitled “Running MPI Jobs” at:

http://www.open-mpi.org/faq/?category=running


Processes

Open MPI allows you to control several aspects of job and process execution, such as:

How Programs Are Launched

The exact instructions vary from one resource manager to another, and are affected by your Open MPI configuration, but they all follow these general guidelines:

1. You can launch the job either interactively or through a script. Instructions for both are provided in Chapter 5 and Chapter 6.

2. You can enter the DRM processing environment (for example, Sun Grid Engine) before launching jobs with mpirun.

3. You can reserve resources for the parallel job and set other job control parameters from within the DRM, or use a hosts file to specify the parameters.

For tasks and instructions, see Chapter 5.


How the Open MPI Environment Is Integrated With Distributed Resource Management Systems

As described in Chapter 1, the Open MPI/Sun HPC ClusterTools 8 environment provides close integration between ORTE and several different DRM systems, including the following:

The integration process is similar for all DRM systems, with some individual differences. At run time, mpirun calls the specified DRM system (launcher), which in turn launches the job.

For information on the ways in which mpirun interacts with DRM systems, see Chapter 5. In addition, see the FAQ on running MPI jobs at:

http://www.open-mpi.org/faq/?category=running

Chapter 6 provides instructions for script-based and interactive job launching.

Using Sun Grid Engine With ORTE

HPC sites use batch systems to share resources fairly and accountably, and also to guarantee that a job can obtain the resources it needs to run at maximum efficiency. To properly monitor a job’s resource consumption, the batch system must be the agent that launches the job.

Sun Grid Engine, like many other batch systems, cannot launch multiple process jobs (such as MPI applications) on its own. In Sun HPC ClusterTools 8, ORTE launches the multiple process jobs and sets up the environment required by Open MPI.

When Sun Grid Engine launches a parallel job in cooperation with ORTE, Sun Grid Engine “owns” the resulting launched processes. Sun Grid Engine monitors the resources for these processes, thereby creating a tightly integrated environment for resource accounting. OpenRTE allows users to execute their parallel applications.



Note - There is also an open source version of Grid Engine (GE) hosted on http://www.sunsource.net. Although the Sun HPC ClusterTools 8/Open MPI integration is developed with Sun Grid Engine, this integration should work for the open source Grid Engine as well.


Submitting Jobs Under Sun Grid Engine Integration

To submit jobs under Sun Grid Engine integration in Sun HPC ClusterTools 8, you must first create a Sun Grid Engine (SGE) environment using qsub, qsh, and so on. Instructions about how to set up the parallel environment (PE) and queue in Sun Grid Engine are described in the Sun HPC ClusterTools 8 Software User’s Guide.

There are two ways to submit jobs under Sun Grid Engine integration: interactive mode and batch mode. Running Parallel Jobs in the Sun Grid Engine Environment explains how to submit jobs in both modes in the Sun Grid Engine environment.


MCA Parameters

Open MPI provides MCA (Modular Component Architecture) parameters for use with the mpirun command. These parameters and their values direct mpirun to perform specified functions. To specify an MCA parameter, use the
-mca flag and the parameter name and value with the mpirun command.

For more information about how to use MCA parameters, see Chapter 7.


How ORTE Works With Zones in the Solaris 10 Operating System

The Solaris 10 Operating System (Solaris 10 OS) enables you to create secure, isolated areas within a single instance of the Solaris 10 OS. These areas, called zones, provide secure environments for running applications. Applications that execute in one zone cannot monitor or affect activity in another zone. You can create multiple non-global zones to run as virtual instances of the Solaris OS on the same hardware.

The global zone is the default zone for the Solaris system. You install Sun HPC ClusterTools software into the global zone. Any non-global zones running under that Solaris system “inherit” that installation. This means that you may install and configure Sun HPC ClusterTools and compile/run/debug your programs in either a global or a non-global zone.



Note - The non-global zones do not inherit the links set up in the global zone. This means that you must either type out the full path to the Sun HPC ClusterTools executables on the command line (for example, you would type /opt/SUNWhpc/HPC8.0/bin/mpirun instead of /opt/SUNWhpc/bin/mpirun), or run the ctact utility in the non-global zone to set up the links. For more information about the ctact utility, refer to the Sun HPC ClusterTools 8 Software Installation Guide.



1 (Footnote) Suntrademark Cluster is a completely different technology used for high availability (HA) applications.