Sun MPI 4.0 User's Guide: With CRE

Trading Memory for Performance

Depending on message traffic, performance can stall if system buffers become congested, but it can be superior if buffers are large. Here, we examine performance for on-node messages via shared-memory buffers.

It is helpful to think of data traffic per connection, the "path" from a particular sender to a particular receiver, since many Sun MPI buffering resources are allocated on a per-connection basis. A sender may emit bursts of messages on a connection, during which time the corresponding receiver may not be depleting the buffers. For example, a sender may execute a sequence of send operations to one receiver during a period in which that receiver is not making any MPI calls whatsoever.

You may need to use profiling to diagnose such conditions. For more information on profiling, see the Prism User's Guide and the Sun MPI Programming and Reference Guide.

Rendezvous or Eager Protocol?

Is your program sending many long, unexpected messages? Sun MPI offers message rendezvous, which requires a receiver to echo a ready signal to the sender before data transmission can begin. This can improve performance for the case of a pair of processes that communicate with a different order for their sends as for their receives, since receive-side buffering would be reduced. To allow rendezvous behavior for long messages, set the MPI_EAGERONLY environment variable:

% setenv MPI_EAGERONLY 0

The threshold message size for rendezvous behavior can be tuned independently for each protocol with MPI_SHM_RENDVSIZE, MPI_TCP_RENDVSIZE, and MPI_RSM_RENDVSIZE.

Note -

Rendezvous will often degrade performance by coupling senders to receivers. Also, for some "unsafe" codes, it can produce deadlock.

Many Broadcasts or Reductions

Does your program include many broadcasts or reductions on large messages? Large broadcasts may benefit from increased values of MPI_SHM_BCASTSIZE, and large reductions from increased MPI_SHM_REDUCESIZE. Also, if many different communicators are involved, you may want to increase MPI_SHM_GBPOOLSIZE. In most cases, the default values will provide best performance.