Sun MPI 4.0 User's Guide: With CRE

Appendix A Environment Variables

Many environment variables are available for fine-tuning your Sun MPI environment. All 39 Sun MPI environment variables are listed here with brief descriptions. The same descriptions are also available on the MPI man page. If you want to return to the default setting after having set a variable, simply unset it (using unsetenv). The effects of some of the variables are explained in more detail in Chapter 6, Performance Tuning .

The environment variables are listed here in six groups:

Informational

MPI_PRINTENV

When set to 1, causes the environment variables and hpc.conf parameters associated with the MPI job to be printed out. The default is 0.

MPI_QUIET

If set to 1, suppresses Sun MPI warning messages. The default value is 0.

MPI_SHOW_ERRORS

If set to 1, the MPI_ERRORS_RETURN error handler prints the error message and returns the error. The default value is 0.

MPI_SHOW_INTERFACES

When set to 1, 2 or 3, information regarding which interfaces are being used by an MPI application prints to stdout. Set MPI_SHOW_INTERFACES to 1 to print the selected internode interface. Set it to 2 to print all the interfaces and their rankings. Set it to 3 for verbose output. The default value, 0, does not print information to stdout.

General Performance Tuning

MPI_POLLALL

When set to 1, the default value, all connections are polled for receives, also known as full polling. When set to 0, only those connections are polled where receives are posted. Full polling helps drain system buffers and so lessen the chance of deadlock for "unsafe" codes. Well-written codes should set MPI_POLLALL to 0 for best performance.

MPI_PROCBIND

Binds each MPI process to its own processor. By default, MPI_PROCBIND is set to 0, which means processor binding is off. To turn processor binding on, set it to 1. The system administrator may allow or disable processor binding by setting the pbind parameter in the hpc.conf file on or off. If this parameter is set, the MPI_PROCBIND environment variable is disabled. Performance can be enhanced with processor binding, but very poor performance will result if processor binding is used for multithreaded jobs or for more than one job at a time.

MPI_SPIN

Sets the spin policy. The default value is 0, which causes MPI processes to spin nonaggressively, allowing best performance when the load is at least as great as the number of CPUs. A value of 1 causes MPI processes to spin aggressively, leading to best performance if extra CPUs are available on each node to handle system daemons and other background activities.

Tuning Memory for Point-to-Point Performance

MPI_RSM_CPOOLSIZE

The requested size, in bytes, to be allocated per stripe for buffers for each remote-shared-memory connection. This value may be overridden when connections are established. The default value is 16384 bytes.

MPI_RSM_NUMPOSTBOX

The number of postboxes per stripe per remote-shared-memory connection. The default is 15 postboxes.

MPI_RSM_PIPESIZE

The limit on the size (in bytes) of a message that can be sent over remote shared memory via the buffer list of one postbox per stripe. The default is 8192 bytes.

MPI_RSM_SBPOOLSIZE

If set, MPI_RSM_SBPOOLSIZE is the requested size in bytes of each RSM send buffer pool. An RSM send buffer pool is the pool of buffers on a node that a remote process would use to send to processes on the node. A multiple of 1024 must be used. If unset, then pools of buffers are dedicated to connections rather than to senders.

MPI_RSM_SHORTMSGSIZE

The maximum size, in bytes, of a message that will be sent via remote shared memory without using buffers. The default value is 401 bytes.

MPI_SHM_CPOOLSIZE

The amount of memory, in bytes, that can be allocated to each connection pool. When MPI_SHM_SBPOOLSIZE is not set, the default value is 24576 bytes. Otherwise, the default value is MPI_SHM_SBPOOLSIZE.

MPI_SHM_CYCLESIZE

The limit, in bytes, on the portion of a shared-memory message that will be sent via the buffer list of a single postbox during a cyclic transfer. The default value is 8192 bytes. A multiple of 1024 that is at most MPI_SHM_CPOOLSIZE/2 must be used.

MPI_SHM_CYCLESTART

Shared-memory transfers that are larger than MPI_SHM_CYCLESTART bytes will be cyclic. The default value is 24576 bytes.

MPI_SHM_NUMPOSTBOX

The number of postboxes dedicated to each shared-memory connection. The default value is 16.

MPI_SHM_PIPESIZE

The limit, in bytes, on the portion of a shared-memory message that will be sent via the buffer list of a single postbox during a pipeline transfer. The default value is 8192 bytes. The value must be a multiple of 1024.

MPI_SHM_PIPESTART

The size, in bytes, at which shared-memory transfers will start to be pipelined. The default value is 2048. Multiples of 1024 must be used.

MPI_SHM_SBPOOLSIZE

If set, MPI_SHM_SBPOOLSIZE is the size, in bytes, of the pool of shared-memory buffers dedicated to each sender. A multiple of 1024 must be used. If unset, then pools of shared-memory buffers are dedicated to connections rather than to senders.

MPI_SHM_SHORTMSGSIZE

The size (in bytes) of the section of a postbox that contains either data or a buffer list. The default value is 256 bytes.


Note -

If MPI_SHM_PIPESTART, MPI_SHM_PIPESIZE, or MPI_SHM_CYCLESIZE is increased to a size larger than 31744 bytes, then MPI_SHM_SHORTMSGSIZE may also have to be increased. See Chapter 6, Performance Tuning for more information.


Numerics

MPI_CANONREDUCE

Prevents reduction operations from using any optimizations that take advantage of the physical location of processors. This may provide more consistent results in the case of floating-point addition, for example. However, the operation may take longer to complete. The default value is 0, meaning optimizations are allowed. To prevent optimizations, set the value to 1.

Tuning Rendezvous

MPI_EAGERONLY

When set to 1, the default, only the eager protocol is used. When set to 0, both eager and rendez-vous protocols are used.

MPI_RSM_RENDVSIZE

Messages communicated by remote shared memory that are greater than this size will use the rendezvous protocol unless the environment variable MPI_EAGERONLY is set to 1. Default value is 16384 bytes.

MPI_SHM_RENDVSIZE

Messages communicated by shared memory that are greater than this size will use the rendezvous protocol unless the environment variable MPI_EAGERONLY is set. The default value is 24576 bytes.

MPI_TCP_RENDVSIZE

Messages communicated by TCP that contain data of this size and greater will use the rendezvous protocol unless the environment variable MPI_EAGERONLY is set. Default value is 49152 bytes.

Miscellaneous

MPI_COSCHED

Specifies the user's preference regarding use of the spind daemon for coscheduling. Values can be 0 (prefer no use) or 1 (prefer use). This preference may be overridden by the system administrator's policy. This policy is set in the hpc.conf file and can be 0 (forbid use), 1 (require use), or 2 (no policy). If no policy is set and no user preference is specified, coscheduling is not used.


Note -

If no user preference is specified, the value 2 will be shown when environment variables are printed with MPI_PRINTENV.


MPI_FLOWCONTROL

Limits the number of unexpected messages that can be queued from a particular connection. Once this quantity of unexpected messages has been received, polling the connection for incoming messages stops. The default value, 0, indicates that no limit is set. To limit flow, set the value to some integer greater than zero.

MPI_FULLCONNINIT

Ensures that all connections are established during initialization. By default, connections are established lazily. However, you can override this default by setting the environment variable MPI_FULLCONNINIT to 1, forcing full-connection initialization mode. The default value is 0.

MPI_MAXFHANDLES

The maximum number of Fortran handles for objects other than requests. MPI_MAXFHANDLES specifies the upper limit on the number of concurrently allocated Fortran handles for MPI objects other than requests. This variable is ignored in the default 32-bit library. The default value is 1024. Users should take care to free MPI objects that are no longer in use. There is no limit on handle allocation for C codes.

MPI_MAXREQHANDLES

The maximum number of Fortran request handles. MPI_MAXREQHANDLES specifies the upper limit on the number of concurrently allocated MPI request handles. Users must take care to free up request handles by properly completing requests. The default value is 1024. This variable is ignored in the default 32-bit library.

MPI_OPTCOLL

The MPI collectives are implemented using a variety of optimizations. Some of these optimizations can inhibit performance of point-to-point messages for "unsafe" programs. By default, this variable is 1, and optimized collectives are used. The optimizations can be turned off by setting the value to 0.

MPI_RSM_MAXSTRIPE

Defines the maximum number of stripes that can be used during communication via remote shared memory. The default value is the number of stripes in the cluster, with a maximum default of 2.

MPI_SHM_BCASTSIZE

On SMPs, the implementation of MPI_Bcast() for large messages is done using a double-buffering scheme. The size of each buffer (in bytes) is settable by using this environment variable. The default value is 32768 bytes.

MPI_SHM_GBPOOLSIZE

The amount of memory available, in bytes, to the general buffer pool for use by collective operations. The default value is 20971520 bytes.

MPI_SHM_REDUCESIZE

On SMPs, calling MPI_Reduce() causes all processors to participate in the reduce. Each processor will work on a piece of data equal to the MPI_SHM_REDUCESIZE setting. The default value is 256 bytes. Care must be taken when setting this variable because the system reserves MPI_SHM_REDUCESIZE * np * np memory to execute the reduce.

MPI_SPINDTIMEOUT

When coscheduling is enabled, limits the length of time (in milliseconds) a message will remain in the poll waiting for the spind daemon to return. If the timeout occurs before the daemon finds any messages, the process re-enters the polling loop. The default value is 1000 ms. A default can also be set by a system administrator in the hpc.conf file.

MPI_TCP_CONNLOOP

Sets the number of times MPI_TCP_CONNTIMEOUT occurs before signaling an error. The default value for this variable is 0, meaning that the program will abort on the first occurrence of MPI_TCP_CONNTIMEOUT.

MPI_TCP_CONNTIMEOUT

Sets the timeout value in seconds that is used for an accept() call. The default value for this variable is 600 seconds (10 minutes). This timeout can be triggered in both full- and lazy-connection initialization. After the timeout is reached, a warning message will be printed. If MPI_TCP_CONNLOOP is set to 0, then the first timeout will cause the program to abort.

MPI_TCP_SAFEGATHER

Allows use of a congestion-avoidance algorithm for MPI_Gather() and MPI_Gatherv() over TCP. By default, MPI_TCP_SAFEGATHER is set to 1, which means use of this algorithm is on. If you know that your underlying network can handle gathering large amounts of data on a single node, you may want to override this algorithm by setting MPI_TCP_SAFEGATHER to 0.