Sun MPI 4.0 Programming and Reference Guide

Overlapping I/O With Computation and Communication

MPI I/O also supports nonblocking versions of each of the data-access routines, that is, the data-access routines that have the letter i before write or read in the routine name (i stands for immediate). By definition, nonblocking I/O routines return immediately after the I/O request has been issued and does not wait until the I/O request has been completed. This functionality allows the user to perform computation and communication at the same time as the I/O. Since large I/O requests can take a long time to complete, this provides a way to more efficiently utilize your programs waiting time.

As in our example above, parallel jobs often partition large matrices stored in files. These parallel jobs may use many large matrices or matrices that are too large to fit into memory at once. Thus, each process may access the multiple and/or large matrices in stages. During each stage, a process reads in a chunk of data, then performs some computation on it (which may involve communicating with the other processes in the parallel job). While performing the computation and communication, the process could issue a nonblocking I/O read request for the next chunk of data. Similarly, once the computation on a particular chunk has completed, a nonblocking write request could be issued before performing computation and communication on the next chunk.

The following example code illustrates the use of a nonblocking data-access routine. Notice that, like nonblocking communication routines, the nonblocking I/O routines require a call to MPI_Wait to wait for the nonblocking request to complete or repeated calls to MPI_Test to determine when the nonblocking data access has completed. Once complete, the write or read buffer is available for use again by the program.


Example 4-2 Example code in which each process reads and writes NUM_BYTES bytes to a file using the nonblocking MPI I/O routines MPI_File_iread_at and MPI_File_iwrite_at, respectively. Note the use of MPI_Wait and MPI_Test to determine when the nonblocking requests have completed.

/* iwr_at.c
 *
 * Example to demonstrate use of MPI_File_iwrite_at and MPI_File_iread_at
 *
*/

#include <stdio.h>
#include "mpi.h"

#define NUM_BYTES 100

void sample_error(int error, char *string)
{
  fprintf(stderr, "Error %d in %s\n", error, string);
  MPI_Finalize();
  exit(-1);
}

void
main( int argc, char **argv )
{  
  char filename[128];
  char *buff;
  MPI_File fh;
  MPI_Offset offset;
  MPI_Request request;
  MPI_Status status;
  int i, rank, flag, result;

  if(argc < 2) {
    fprintf(stdout, "Missing argument: filename\n");
    exit(-1);
  }
  strcpy(filename, argv[1]);

  MPI_Init(&argc, &argv);

  result = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  if(result != MPI_SUCCESS) 
    sample_error(result, "MPI_Comm_rank");

  result = MPI_File_open(MPI_COMM_WORLD, filename, 
			 MPI_MODE_RDWR | MPI_MODE_CREATE, 
			 (MPI_Info)NULL, &fh);
  if(result != MPI_SUCCESS) 
    sample_error(result, "MPI_File_open");

  buff = (char *)malloc(NUM_BYTES*sizeof(char));
  for(i=0;i<NUM_BYTES;i++) buff[i] = i;

  offset = rank * NUM_BYTES;
  result = MPI_File_iread_at(fh, offset, buff, NUM_BYTES, 
			     MPI_BYTE, &request); 
  if(result != MPI_SUCCESS) 
    sample_error(result, "MPI_File_iread_at");

  /* Perform some useful computation and/or communication */

  result = MPI_Wait(&request, &status);

  buff = (char *)malloc(NUM_BYTES*sizeof(char));
  for(i=0;i<NUM_BYTES;i++) buff[i] = i;
  result = MPI_File_iwrite_at(fh, offset, buff, NUM_BYTES, 
			      MPI_BYTE, &request);
  if(result != MPI_SUCCESS) 
    sample_error(result, "MPI_File_iwrite_at");
  
  /* Perform some useful computation and/or communication */

  flag = 0;
  i = 0;
  while(!flag) {
     result = MPI_Test(&request, &flag, &status);
     i++;
     /* Perform some more computation or communication, if possible
*/
  }

  result = MPI_File_close(&fh);
  if(result != MPI_SUCCESS) 
    sample_error(result, "MPI_File_close");

  MPI_Finalize();

  fprintf(stdout, "Successful completion\n");

  free(buff);
}