Sun MPI 4.0 Programming and Reference Guide

Sun MPI Routines

This section gives a brief description of the routines in the Sun MPI library. All the Sun MPI routines are listed in Appendix A, Sun MPI and Sun MPI I/O Routines with brief descriptions and their C syntax. For detailed descriptions of individual routines, see the man pages. For more complete information, see the MPI standard (see "Related Publications" of the preface).

Point-to-Point Routines

Point-to-point routines include the basic send and receive routines in both blocking and nonblocking forms and in four modes.

A blocking send blocks until its message buffer can be written with a new message. A blocking receive blocks until the received message is in the receive buffer.

Nonblocking sends and receives differ from blocking sends and receives in that they return immediately and their completion must be waited or tested for. It is expected that eventually nonblocking send and receive calls will allow the overlap of communication and computation.

MPI's four modes for point-to-point communication are:

Collective Communication

Collective communication routines are blocking routines that involve all processes in a communicator. Collective communication includes broadcasts and scatters, reductions and gathers, all-gathers and all-to-alls, scans, and a synchronizing barrier call.

Table 2-1 Collective Communication Routines

MPI_Bcast

Broadcasts from one process to all others in a communicator. 

MPI_Scatter

Scatters from one process to all others in a communicator. 

MPI_Reduce

Reduces from all to one in a communicator. 

MPI_Allreduce

Reduces, then broadcasts result to all nodes in a communicator. 

MPI_Reduce_scatter

Scatters a vector that contains results across the nodes in a communicator. 

MPI_Gather

Gathers from all to one in a communicator. 

MPI_Allgather

Gathers, then broadcasts the results of the gather in a communicator. 

MPI_Alltoall

Performs a set of gathers in which each process receives a specific result in a communicator. 

MPI_Scan

Scans (parallel prefix) across processes in a communicator. 

MPI_Barrier

Synchronizes processes in a communicator (no data is transmitted).  

Many of the collective communication calls have alternative vector forms, with which different amounts of data can be sent to or received from different processes.

The syntax and semantics of these routines are basically consistent with the point-to-point routines (upon which they are built), but there are restrictions to keep them from getting too complicated:

Managing Groups, Contexts, and Communicators

A distinguishing feature of the MPI standard is that it includes a mechanism for creating separate worlds of communication, accomplished through communicators, contexts, and groups.

A communicator specifies a group of processes that will conduct communication operations within a specified context without affecting or being affected by operations occurring in other groups or contexts elsewhere in the program. A communicator also guarantees that, within any group and context, point-to-point and collective communication are isolated from each other.

A group is an ordered collection of processes. Each process has a rank in the group; the rank runs from 0 to n-1. A process can belong to more than one group; its rank in one group has nothing to do with its rank in any other group.

A context is the internal mechanism by which a communicator guarantees safe communication space to the group.

At program startup, two default communicators are defined: MPI_COMM_WORLD, which has as a process group all the processes of the job; and MPI_COMM_SELF, which is equivalent to an identity communicator. The process group that corresponds to MPI_COMM_WORLD is not predefined, but can be accessed using MPI_COMM_GROUP. One MPI_COMM_SELF communicator is defined for each process, each of which has rank zero in its own communicator. For many programs, these are the only communicators needed.

Communicators are of two kinds: intracommunicators, which conduct operations within a given group of processes; and intercommunicators, which conduct operations between two groups of processes.

Communicators provide a caching mechanism, which allows an application to attach attributes to communicators. Attributes can be user data or any other kind of information.

New groups and new communicators are constructed from existing ones. Group constructor routines are local, and their execution does not require interprocessor communication. Communicator constructor routines are collective, and their execution may require interprocess communication.


Note -

Users who do not need any communicator other than the default MPI_COMM_WORLD communicator -- that is, who do not need any sub- or supersets of processes -- can simply plug in MPI_COMM_WORLD wherever a communicator argument is requested. In these circumstances, users can ignore this section and the associated routines. (These routines can be identified from the listing in Appendix A, Sun MPI and Sun MPI I/O Routines.)


Data Types

All Sun MPI communication routines have a data type argument. These may be primitive data types, such as integers or floating-point numbers, or they may be user-defined, derived data types, which are specified in terms of primitive types.

Derived data types allow users to specify more general, mixed, and noncontiguous communication buffers, such as array sections and structures that contain combinations of primitive data types.

The basic data types that can be specified for the data-type argument correspond to the basic data types of the host language. Values for the data-type argument for Fortran and the corresponding Fortran types are listed in the following table.

Table 2-2 Possible Values for the Data Type Argument for Fortran

MPI Data Type 

Fortran Data Type 

MPI_INTEGER

INTEGER

MPI_REAL

REAL

MPI_DOUBLE_PRECISION

DOUBLE PRECISION

MPI_COMPLEX

COMPLEX

MPI_LOGICAL

LOGICAL

MPI_CHARACTER

CHARACTER(1)

MPI_DOUBLE_COMPLEX

DOUBLE COMPLEX

MPI_REAL4

REAL*4

MPI_REAL8

REAL*8

MPI_INTEGER2

INTEGER*2

MPI_INTEGER4

INTEGER*4

MPI_BYTE

 

MPI_PACKED

 

Values for the data-type argument in C and the corresponding C types are listed in the following table. .

Table 2-3 Possible Values for the Data Type Argument for C

MPI Data Type 

C Data Type 

MPI_CHAR

signed char

MPI_SHORT

signed short int

MPI_INT

signed int

MPI_LONG

signed long int

MPI_UNSIGNED_CHAR

unsigned char

MPI_UNSIGNED_SHORT

unsigned short int

MPI_UNSIGNED

unsigned int

MPI_UNSIGNED_LONG

unsigned long int

MPI_FLOAT

float

MPI_DOUBLE

double

MPI_LONG_DOUBLE

long double

MPI_LONG_LONG_INT

long long int

MPI_BYTE

 

MPI_PACKED

 

The data types MPI_BYTE and MPI_PACKED have no corresponding Fortran or C data types.

Persistent Communication Requests

Sometimes within an inner loop of a parallel computation, a communication with the same argument list is executed repeatedly. The communication can be slightly improved by using a persistent communication request, which reduces the overhead for communication between the process and the communication controller. A persistent request can be thought of as a communication port or "half-channel."

Managing Process Topologies

Process topologies are associated with communicators; they are optional attributes that can be given to an intracommunicator (not to an intercommunicator).

Recall that processes in a group are ranked from 0 to n-1. This linear ranking often reflects nothing of the logical communication pattern of the processes, which may be, for instance, a 2- or 3-dimensional grid. The logical communication pattern is referred to as a virtual topology (separate and distinct from any hardware topology). In MPI, there are two types of virtual topologies that can be created: Cartesian (grid) topology and graph topology.

You can use virtual topologies in your programs by taking physical processor organization into account to provide a ranking of processors that optimizes communications.

Environmental Inquiry Functions

Environmental inquiry functions include routines for starting up and shutting down, error-handling routines, and timers.

Few MPI routines may be called before MPI_Init or after MPI_Finalize. Examples include MPI_Initialized and MPI_Version. MPI_Finalize may be called only if there are no outstanding communications involving that process.

The set of errors handled by MPI is dependent upon the implementation. See Appendix B, Troubleshooting for tables listing the Sun MPI 4.0 error classes.