This section gives a brief description of the routines in the Sun MPI library. All the Sun MPI routines are listed in Appendix A, Sun MPI and Sun MPI I/O Routines with brief descriptions and their C syntax. For detailed descriptions of individual routines, see the man pages. For more complete information, see the MPI standard (see "Related Publications" of the preface).
Point-to-point routines include the basic send and receive routines in both blocking and nonblocking forms and in four modes.
A blocking send blocks until its message buffer can be written with a new message. A blocking receive blocks until the received message is in the receive buffer.
Nonblocking sends and receives differ from blocking sends and receives in that they return immediately and their completion must be waited or tested for. It is expected that eventually nonblocking send and receive calls will allow the overlap of communication and computation.
MPI's four modes for point-to-point communication are:
Standard, in which the completion of a send implies that the message either is buffered internally or has been received. Users are free to overwrite the buffer that they passed in with any of the blocking send or receive routines, after the routine returns.
Synchronous, in which rendezvous semantics occur between sender and receiver; that is, a send blocks until the corresponding receive has occurred.
Ready, in which a send can be started only if the matching receive is already posted. The ready mode for sends is a way for the programmer to notify the system that the receive has been posted, so that the underlying system can use a faster protocol if it is available.
Collective communication routines are blocking routines that involve all processes in a communicator. Collective communication includes broadcasts and scatters, reductions and gathers, all-gathers and all-to-alls, scans, and a synchronizing barrier call.
Table 2-1 Collective Communication Routines
MPI_Bcast |
Broadcasts from one process to all others in a communicator. |
MPI_Scatter |
Scatters from one process to all others in a communicator. |
MPI_Reduce |
Reduces from all to one in a communicator. |
MPI_Allreduce |
Reduces, then broadcasts result to all nodes in a communicator. |
MPI_Reduce_scatter |
Scatters a vector that contains results across the nodes in a communicator. |
MPI_Gather |
Gathers from all to one in a communicator. |
MPI_Allgather |
Gathers, then broadcasts the results of the gather in a communicator. |
MPI_Alltoall |
Performs a set of gathers in which each process receives a specific result in a communicator. |
MPI_Scan |
Scans (parallel prefix) across processes in a communicator. |
MPI_Barrier |
Synchronizes processes in a communicator (no data is transmitted). |
Many of the collective communication calls have alternative vector forms, with which different amounts of data can be sent to or received from different processes.
The syntax and semantics of these routines are basically consistent with the point-to-point routines (upon which they are built), but there are restrictions to keep them from getting too complicated:
The amount of data sent must exactly match the amount of data specified by the receiver.
There is only one mode, a mode analogous to the standard mode of point-to-point routines.
A distinguishing feature of the MPI standard is that it includes a mechanism for creating separate worlds of communication, accomplished through communicators, contexts, and groups.
A communicator specifies a group of processes that will conduct communication operations within a specified context without affecting or being affected by operations occurring in other groups or contexts elsewhere in the program. A communicator also guarantees that, within any group and context, point-to-point and collective communication are isolated from each other.
A group is an ordered collection of processes. Each process has a rank in the group; the rank runs from 0 to n-1. A process can belong to more than one group; its rank in one group has nothing to do with its rank in any other group.
A context is the internal mechanism by which a communicator guarantees safe communication space to the group.
At program startup, two default communicators are defined: MPI_COMM_WORLD, which has as a process group all the processes of the job; and MPI_COMM_SELF, which is equivalent to an identity communicator. The process group that corresponds to MPI_COMM_WORLD is not predefined, but can be accessed using MPI_COMM_GROUP. One MPI_COMM_SELF communicator is defined for each process, each of which has rank zero in its own communicator. For many programs, these are the only communicators needed.
Communicators are of two kinds: intracommunicators, which conduct operations within a given group of processes; and intercommunicators, which conduct operations between two groups of processes.
Communicators provide a caching mechanism, which allows an application to attach attributes to communicators. Attributes can be user data or any other kind of information.
New groups and new communicators are constructed from existing ones. Group constructor routines are local, and their execution does not require interprocessor communication. Communicator constructor routines are collective, and their execution may require interprocess communication.
Users who do not need any communicator other than the default MPI_COMM_WORLD communicator -- that is, who do not need any sub- or supersets of processes -- can simply plug in MPI_COMM_WORLD wherever a communicator argument is requested. In these circumstances, users can ignore this section and the associated routines. (These routines can be identified from the listing in Appendix A, Sun MPI and Sun MPI I/O Routines.)
All Sun MPI communication routines have a data type argument. These may be primitive data types, such as integers or floating-point numbers, or they may be user-defined, derived data types, which are specified in terms of primitive types.
Derived data types allow users to specify more general, mixed, and noncontiguous communication buffers, such as array sections and structures that contain combinations of primitive data types.
The basic data types that can be specified for the data-type argument correspond to the basic data types of the host language. Values for the data-type argument for Fortran and the corresponding Fortran types are listed in the following table.
Table 2-2 Possible Values for the Data Type Argument for Fortran
MPI Data Type |
Fortran Data Type |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MPI_PACKED |
|
Values for the data-type argument in C and the corresponding C types are listed in the following table. .
Table 2-3 Possible Values for the Data Type Argument for C
MPI Data Type |
C Data Type |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MPI_PACKED |
|
The data types MPI_BYTE
and MPI_PACKED
have no corresponding Fortran or C data types.
Sometimes within an inner loop of a parallel computation, a communication with the same argument list is executed repeatedly. The communication can be slightly improved by using a persistent communication request, which reduces the overhead for communication between the process and the communication controller. A persistent request can be thought of as a communication port or "half-channel."
Process topologies are associated with communicators; they are optional attributes that can be given to an intracommunicator (not to an intercommunicator).
Recall that processes in a group are ranked from 0 to n-1. This linear ranking often reflects nothing of the logical communication pattern of the processes, which may be, for instance, a 2- or 3-dimensional grid. The logical communication pattern is referred to as a virtual topology (separate and distinct from any hardware topology). In MPI, there are two types of virtual topologies that can be created: Cartesian (grid) topology and graph topology.
You can use virtual topologies in your programs by taking physical processor organization into account to provide a ranking of processors that optimizes communications.
Environmental inquiry functions include routines for starting up and shutting down, error-handling routines, and timers.
Few MPI routines may be called before MPI_Init or after MPI_Finalize. Examples include MPI_Initialized and MPI_Version. MPI_Finalize may be called only if there are no outstanding communications involving that process.
The set of errors handled by MPI is dependent upon the implementation. See Appendix B, Troubleshooting for tables listing the Sun MPI 4.0 error classes.