C H A P T E R  2

Fundamental Concepts

This chapter summarizes a few basic concepts that you should understand to get the most out of Sun’s HPC ClusterTools software. It contains the following sections:


Clusters and Nodes

High performance computing clusters[1] are groups of servers interconnected by any Sun-supported, TCP/IP-capable interconnect. Each server in a cluster is called a node. A cluster can consist of a single node.

When using ORTE, you can select the cluster and nodes on which your MPI programs will run and how your processes will be distributed among them. For instructions, see Chapter 4, “Running Programs With the mpirun Command.”

For more information about how Open MPI allocates computing resources, see the FAQ entitled “Running MPI Jobs” at:

http://www.open-mpi.org/faq/?category=running


Processes

Open MPI allows you to control several aspects of job and process execution, such as:

For tasks and instructions, see Chapter 4.


How the Open MPI Environment Is Integrated With Distributed Resource Management Systems

As described in Chapter 1, the Open MPI/Sun HPC ClusterTools 7.1 environment provides close integration between ORTE and several different DRM systems, including the following:

The integration process is similar for all DRM systems, with some individual differences. At run time, mpirun calls the specified DRM system (launcher), which in turn launches the job.

For information on the ways in which mpirun interacts with DRM systems, see Chapter 4. In addition, see the FAQ on running MPI jobs at:

http://www.open-mpi.org/faq/?category=running

Instructions for script-based and interactive job launching are provided in Chapter 5.

How Programs Are Launched

The exact instructions vary from one resource manager to another, and are affected by your Open MPI configuration, but they all follow these general guidelines:

1. You can launch the job either interactively or through a script. Instructions for both are provided in Chapter 4 and Chapter 5.

2. You can enter the DRM processing environment (for example, Sun Grid Engine) before launching jobs with mpirun.

3. You can reserve resources for the parallel job and set other job control parameters from within the DRM, or use a hosts file to specify the parameters.

For more information about launching programs using ORTE or Sun Grid Engine, see Chapters 4 and 5.


How ORTE Works With Zones in the Solaris 10 Operating System

The Solaris 10 Operating System (Solaris 10 OS) enables you to create secure, isolated areas within a single instance of the Solaris 10 OS. These areas, called zones, provide secure environments for running applications. Applications that execute in one zone cannot monitor or affect activity in another zone. You can create multiple non-global zones to run as virtual instances of the Solaris OS on the same hardware.

The global zone is the default zone for the Solaris system. You install Sun HPC ClusterTools software into the global zone. Any non-global zones running under that Solaris system “inherit” that installation. This means that you may install and configure Sun HPC ClusterTools and compile/run/debug your programs in either a global or a non-global zone.



Note - The non-global zones do not inherit the links set up in the global zone. This means that you must either type out the full path to the Sun HPC ClusterTools executables on the command line (for example, you would type /opt/SUNWhpc/HPC7.1/bin/mpirun instead of /opt/SUNWhpc/bin/mpirun), or run the ctact utility in the non-global zone to set up the links.



1 (Footnote) Suntrademark Cluster is a completely different technology used for high availability (HA) applications.