Sun HPC ClusterTools 3.0 Administrator's Guide: With CRE

Overview

This section provides a set of options that control MPI communication behavior in ways that are likely to affect message-passing performance. It contains two templates with predefined option settings. These templates are shown in Example 7-4 and discussed below.

General-Purpose, Multiuser Template - The first template in the MPIOptions section is designed for general-purpose use at times when multiple message-passing jobs will be running concurrently.

Performance Template - The second template is designed to maximize the performance of message-passing jobs when only one job is allowed to run at a time.


Note -

The first line of each template contains the phrase "Queue=xxxx." This is because the queue-based LSF workload management runtime environment uses the same hpc.conf file as the CRE.


The options in the general-purpose template are the same as the default settings for the Sun MPI library. In other words, you do not have to uncomment the general-purpose template to have its option values be in effect. This template is provided in the MPIOptions section so you can see what options are most beneficial when operating in a multiuser mode.

If you want to use the performance template, do the following:

The resulting template should appear as follows:

Begin MPIOptions
coscheduling			off
spin			on
End MPIOptions

Table 7-1 provides brief descriptions of the MPI runtime options that can be set in hpc.conf. Each description identifies the default value and describes the effect of each legal value.

Some MPI options not only control a parameter directly, they can also be set to a value that passes control of the parameter to an environment variable. Where an MPI option has an associated environment variable, Table 7-1 names the environment variable


Example 7-4 MPIOptions Section Example


# Following is an example of the options that affect the runtime
# environment of the MPI library. The listings below are identical
# to the default settings of the library. The "queue=hpc" phrase
# makes it an LSF-specific entry, and only for the queue named hpc.
# These options are a good choice for a multiuser queue. To be
# recognized by CRE, the "Queue=hpc" needs to be removed.
#
# Begin MPIOptions queue=hpc
# coscheduling  avail
# pbind         avail
# spindtimeout   1000
# progressadjust   on
# spin            off
#
# shm_numpostbox       16
# shm_shortmsgsize    256
# rsm_numpostbox       15
# rsm_shortmsgsize    401
# rsm_maxstripe         2
# End MPIOptions

# The listing below is a good choice when trying to get maximum
# performance out of MPI jobs that are running in a queue that
# allows only one job to run at a time.
#
# Begin MPIOptions Queue=performance
# coscheduling             off
# spin                      on
# End MPIOptions

Table 7-1 MPI Runtime Options

 

Values 

 

Option 

Default 

Other 

Description 

coscheduling

avail

 

Allows spind use to be controlled by the environment variable MPI_COSCHED. If MPI_COSCHED=0 or is not set, spind is not used. If MPI_COSCHED=1, spind must be used.

 

 

on

Enables coscheduling; spind is used. This value overrides MPI_COSCHED=0.

 

 

off

Disables coscheduling; spind is not to be used. This value overrides MPI_COSCHED=1.

pbind

avail

 

Allows processor binding state to be controlled by the environment variable MPI_PROCBIND. If MPI_PROCBIND=0 or is not set, no processes will be bound to a processor. This is the default.

If MPI_PROCBIND=1, all processes on a node will be bound to a processor.

 

 

on

All processes will be bound to processors. This value overrides MPI_PROCBIND=0.

 

 

off

No processes on a node are bound to a processor. This value overrides MPI_PROCBIND=1.

spindtimeout

1000

 

When polling for messages, a process waits 1000 milliseconds for spind to return. This equals the value to which the environment variable MPI_SPINDTIMEOUT is set.

 

 

integer

To change the default timeout, enter an integer value specifying the number of milliseconds the timeout should be. 

progressadjust

on

 

Allows user to set the environment variable MPI_SPIN.

 

 

off

Disables user's ability to set the environment variable MPI_SPIN.

shm_numpostbox

16

 

Sets to 16 the number of postbox entries that are dedicated to a connection endpoint. This equals the value to which the environment variable MPI_SHM_NUMPOSTBOX is set.

 

 

integer

To change the number of dedicated postbox entries, enter an integer value specifying the desired number. 

shm_shortmsgsize

256

 

Sets to 256 the maximum number of bytes a short message can contain. This equals the default value to which the environment variable MPI_SHM_SHORTMSGSIZE is set.

 

 

integer

To change the maximum-size definition of a short message, enter an integer specifying the maximum number of bytes it can contain. 

rsm_numpostbox

15

 

Sets to 15 the number of postbox entries that are dedicated to a connection endpoint. This equals the value to which the environment variable MPI_RSM_NUMPOSTBOX is set.

 

 

integer

To change the number of dedicated postbox entries, enter an integer value specifying the desired number. 

rsm_shortmsgsize

401

 

Sets to 401 the maximum number of bytes a short message can contain. This equals the value to which the environment variable MPI_RSM_SHORTMSGSIZE is set.

 

 

integer

To change the maximum-size definition of a short message, enter an integer specifying the maximum number of bytes it can contain. 

rsm_maxstripe

2

 

Sets to 2 the maximum number of stripes that can be used. This equals the value to which the environment variable MPI_RSM_MAXSTRIPE is set.

 

 

integer 

To change the maximum number of stripes that can be used, enter an integer specifying the desired limit. This value cannot be greater than 2. 

spin

off

 

Sets the MPI library to avoid spinning while waiting for status. This equals the value to which the environment variable MPI_SPIN is set.

 

 

on

Sets the MPI library to spin.