Sun MPI 4.0 User's Guide: With CRE

Shared-Memory Collectives

Collective operations in Sun MPI are highly optimized and make use of a "general buffer pool" within shared memory.

MPI_SHM_GBPOOLSIZE sets the amount of space available on a node for the "optimized" collectives in bytes. By default, it is set to 20971520 bytes. This space is used by MPI_Bcast, MPI_Reduce, MPI_Allreduce, MPI_Reduce_scatter, and MPI_Barrier, provided that two or more of the MPI processes are on the node.

When a communicator is created, space is reserved in the general buffer pool for performing barriers, short broadcasts, and a few other purposes.

For larger broadcasts, shared memory is allocated out of the general buffer pool. The maximum buffer-memory footprint in bytes of a broadcast operation is set by an environment variable as

(n/4) * 2 * MPI_SHM_BCASTSIZE

where n is the number of MPI processes on the node. If less memory is needed than this, then less memory is used. After the broadcast operation, the memory is returned to the general buffer pool.

For reduce operations,

n * n * MPI_SHM_REDUCESIZE

bytes are borrowed from the general buffer pool.

The broadcast and reduce operations are pipelined for very large messages. By increasing MPI_SHM_BCASTSIZE and MPI_SHM_REDUCESIZE, one can improve the efficiency of these collective operations for very large messages, but the amount of time it takes to fill the pipeline can also increase.

If MPI_SHM_GBPOOLSIZE proves to be too small and a collective operation happens to be unable to borrow memory from this pool, the operation will revert to slower algorithms. Hence, under certain circumstances, performance could dictate increasing MPI_SHM_GBPOOLSIZE.