When tuning Sun MPI environment variables, one factor involves increasing buffers for improved performance. Our case study involved single messages of up to roughly 40 Kbytes going over a local connection. Thus, we increase shared-memory buffers by a large margin, following the given prescriptions and adding
% setenv MPI_SHM_CPOOLSIZE 102400 % setenv MPI_SHM_CYCLESTART 102400 % setenv MPI_SHM_RENDVSIZE 102400 % setenv MPI_SHM_NUMPOSTBOX 50
to our list of run-time environment variables. For further information about Sun MPI environment variables, see the Sun MPI 4.0 Programming and Reference Guide.
Rerunning the program under Prism, the benchmark time dropped from 146 seconds to 137 seconds. This 6% overall improvement has been effected by slicing off a portion of the roughly 20% of run time that was spent in MPI_Wait calls. In particular, the time spent on MPI_Wait calls that terminate MPI_Isend calls has practically vanished.
One notable signature of our changes is found by examining the scatter plot of MPI_Wait times. The new data, shown in Figure 7-6 and akin to Figure 7-3, indicate that MPI_Wait times are considerably reduced. There are still occasional long waits, but they are reproducible from iteration to iteration and result from waits on receives for data that the senders simply have not yet sent. These indicate critical paths that are inherent to the application and have little to do with how fast MPI is moving data.
Another notable figure is the scatter plot for MPI_Isend times, shown in Figure 7-7. In this case, for the X axis, we have chosen the Field: to be bytes and then used the Refresh button to refresh the plot. The figure indicates that MPI_Isend performance grows roughly linearly with the message size at a rate of about 100 Mbyte/second, a reasonable rate for on-node data transfer.