C H A P T E R 1 |
OpenMP API Summary |
OpenMP is a portable, parallel programming model for shared memory multiprocessor architectures, developed in collaboration with a number of computer vendors. The specifications were created and are published by the OpenMP Architecture Review Board. For more information on the OpenMP developer community, including tutorials and other resources, see their web site at:
http://www.openmp.org
The OpenMP API is the recommended parallel programming model for all Forte Developer compilers. See Chapter 4 for guidelines on converting legacy Fortran and C parallelization directives to OpenMP.
This chapter summarizes the directives, run-time library routines, and environment variables comprising the OpenMP Application Program Interfaces, as implemented by the Forte Developer Fortran 95, C and C++ compilers.
The material presented in this chapter is only a summary with many details left out intentionally for the sake of brevity. In all cases, refer to the OpenMP specification documents for complete details.
The Fortran 2.0 and C/C++ 1.0 OpenMP specifications can be found on the official OpenMP website, http://www.openmp.org/, and are hyper linked to the Forte Developer documentation index installed with the software, at:
file:/opt/SUNWspro/docs/index.html
In the tables and examples that follow, Fortran directives and source code are shown in upper case, but are case-insensitive.
The term structured-block refers to a block of Fortran or C/C++ statements having no transfers into or out from the block.
Constructs within square brackets, [...], are optional.
Throughout this manual, "Fortran" refers to the Fortran 95 language and compiler, f95.
The terms "directive" and "pragma" are used interchangeably in this manual.
Only one directive-name can be specified on a directive line.
Fortran fixed format accepts three directive "sentinels", free format only one. In the Fortran examples that follow, free format will be used.
C and C++ use the standard preprocessing directive starting with #pragma omp.
The OpenMP API defines the preprocessor symbol _OPENMP to be used for conditional compilation. In addition, OpenMP Fortran API accepts a conditional compilation sentinel.
Compiling with OpenMP enabled defines the macro _OPENMP. |
The PARALLEL directive defines a parallel region, which is a region of the program that is to be executed by multiple threads in parallel.
TABLE 1-1 identifies the clauses that can appear with this construct.
Work-sharing constructs divide the execution of the enclosed code region among the members of the team of threads that encounter it. Work sharing constructs must be enclosed within a parallel region for the construct to execute in parallel.
There are many special conditions and restrictions on these directives and the code they apply to. Programmers are urged to refer to the appropriate OpenMP specification document for the details.
Specifies that the iterations of the DO or for loop that follows must be executed in parallel.
The DO directive specifies that the iterations of the DO loop that immediately follows must be executed in parallel. This directive must appear within a parallel region to be effective. |
SECTIONS encloses a non-iterative block of code to be divided among threads in the team. Each block is executed once by a thread in the team.
Each section is preceded by a SECTION directive, which is optional for the first section.
The structured block enclosed by SINGLE is executed by only one thread in the team. Threads in the team that are not executing the SINGLE block wait at the end of the block unless NOWAIT is specified.
Divides the work of executing the enclosed code block into separate units of work, and causes the threads of the team to share the work such that each unit is executed only once.
There is no C/C++ equivalent to the Fortran WORKSHARE construct.
TABLE 1-1 identifies the clauses that can appear with these constructs.
The combined parallel work-sharing constructs are shortcuts for specifying a parallel region that contains one work-sharing construct.
There are many special conditions and restrictions on these directives and the code they apply to. Programmers are urged to refer to the appropriate OpenMP specification document for the details.
TABLE 1-1 identifies the clauses that can appear with these constructs.
Shortcut for specifying a parallel region that contains a single DO or for loop. Equivalent to a PARALLEL directive followed immediately by a DO or for directive. clause can be any of the clauses accepted by the PARALLEL and DO/for directives, except the NOWAIT modifier.
Shortcut for specifying a parallel region that contains a single SECTIONS directive. Equivalent to a PARALLEL directive followed by a SECTIONS directive. clause can be any of the clauses accepted by the PARALLEL and SECTIONS directives, except the NOWAIT modifier.
Provides a shortcut for specifying a parallel region that contains a single WORKSHARE directive. clause can be one of the clauses accepted by either the PARALLEL or WORKSHARE directive.
The following constructs specify thread synchronization. There are many special conditions and restrictions regarding these constructs that are too numerous to summarize here. Programmers are urged to refer to the appropriate OpenMP specification document for the details.
Only the master thread of the team executes the block enclosed by this directive. The other threads skip this block and continue. There is no implied barrier on entry to or exit from the master section.
Restrict access to the structured block to only one thread at a time. The optional name argument identifies the critical region. All unnamed CRITICAL directives map to the same name. Critical section names are global entities of the program and must be unique. For Fortran, if name appears on the CRITICAL directive, it must also appear on the END CRITICAL directive. For C/C++, the identifier used to name a critical region has external linkage and is in a name space which is separate from the name spaces used by labels, tags, members, and ordinary identifiers.
Synchronizes all the threads in a team. Each thread waits until all the others in the team have reached this point.
Ensures that a specific memory location is to be updated atomically, rather than exposing it to the possibility of multiple, simultaneous writing threads.
This implementation replaces all ATOMIC directives by enclosing the expression-statement in a critical section.
The pragma applies only to the immediately following statement, which must be in one of these forms: |
Thread-visible Fortran variables or C objects are written back to memory at the point at which this directive appears. The FLUSH directive only provides consistency between operations within the executing thread and global memory. The optional list consists of a comma-separated list of variables or objects that need to be flushed. A flush directive without a list synchronizes all thread-visible shared variables or objects.
The enclosed block is executed in the order that iterations would be executed in a sequential execution of the loop.
The following directives control the data environment during execution of parallel constructs.
Makes the list of objects (Fortran common blocks and named variables, C named variables) private to a thread but global within the thread.
See the OpenMP specifications (section 2.6.1 in the Fortran 2.0 specifications, secton 2.7.1 in the C/C++) for the complete details and restrictions.
Common block names must appear between slashes. To make a common block THREADPRIVATE, this directive must appear after every COMMON declaration of that block. |
Each variable of list must have a file-scope or namespace-scope declaration preceding the pragma. |
This section summarizes the data scoping and scheduling clauses that can appear on OpenMP directives.
Several directives accept clauses that allow a user to control the scope attributes of variables within the extent of the construct. If no data scope clause is specified for a directive, the default scope for variables affected by the directive is SHARED.
Fortran: list is a comma-separated list of named variables or common blocks that are accessible in the scoping unit. Common block names must appear within slashes (for example, /ABLOCK/).
There are important restrictions on the use of these scoping clauses. Refer to section 2.6.2 in the Fortran 2.0 specification, and section 2.7.2 in the C/C++ specification for complete details.
TABLE 1-1 identifies the directives on which these clauses can appear.
Declares the variables in the comma separated list to be private to each thread in a team.
All the threads in the team share the variables that appear in list, and access the same storage area.
Specify scoping attribute for all variables within a parallel region. THREADPRIVATE variables are not affected by this clause. If not specified, DEFAULT(SHARED) is assumed.
Variables on list are PRIVATE. In addition, private copies of the variables are initialized from the original object existing before the construct.
Variables on the list are PRIVATE. In addition, when the LASTPRIVATE clause appears on a DO or for directive, the thread that executes the sequentially last iteration updates the version of the variable before the construct. On a SECTIONS directive, the thread that executes the lexically last SECTION updates the version of the object it had before the construct.
The REDUCTION clause is intended to be used on a region in which the reduction variable is used only in reduction statements. Variables on list must be SHARED in the enclosing context. A private copy of each variable is created for each thread as if it were PRIVATE. At the end of the reduction, the shared variable is updated by combining the original value with the final value of each of the private copies.
The SCHEDULE clause specifies how iterations in a Fortran DO loop or C/C++ for loop are divided among the threads in a team. TABLE 1-1 shows which directives allow the SCHEDULE clause.
There are important restrictions on the use of these scheduling clauses. Refer to section 2.3.1 in the Fortran 2.0 specification, and section 2.4.1 in the C/C++ specification for complete details.
Specifies how iterations of the DO or for loop are divided among the threads of the team. type can be one of STATIC, DYNAMIC, GUIDED, or RUNTIME. In the absence of a SCHEDULE clause, STATIC scheduling is used. chunk must be an integer expression.
Iterations are divided into pieces of a size specified by chunk. The pieces are statically assigned to threads in the team in a round-robin fashion in the order of the thread number. If not specified, chunk is chosen to divide the iterations into contiguous chunks nearly equal in size with one chunk assigned to each thread.
Iterations are broken into pieces of a size specified by chunk. As each thread finishes a piece of the iteration space, it dynamically obtains the next set of iterations. When no chunk is specified, it defaults to 1.
With GUIDED, the chunk size is reduced in an exponentially decreasing manner with each dispatched piece of the iterations. chunk specifies the minimum number of iterations to dispatch each time. (The size of the initial chunk of the iterations is implementation dependent; see Chapter 2.). When no chunk is specified, it defaults to 1.
Scheduling is deferred until runtime. Schedule type and chunk size will be determined from the setting of the OMP_SCHEDULE environment variable. (Default is SCHEDULE(STATIC)
The Fortran OpenMP API provides a NUM_THREADS clause on the PARALLEL, PARALLEL SECTIONS, PARALLEL DO, and PARALLEL WORKSHARE directives.
TABLE 1-1 shows the clauses that can appear on these directives and pragmas:
IF |
|||||||
PRIVATE |
|||||||
SHARED |
|||||||
FIRSTPRIVATE |
|||||||
LASTPRIVATE |
|||||||
DEFAULT |
|||||||
REDUCTION |
|||||||
COPYIN |
|||||||
COPYPRIVATE |
|||||||
ORDERED |
|||||||
SCHEDULE |
|||||||
NOWAIT |
|||||||
NUM_THREADS |
1. Fortran only: COPYPRIVATE can appear on the END SINGLE directive.
2. For Fortran, a NOWAIT modifier can appear on the END DO, END SECTIONS, END SINGLE, or END WORKSHARE directives.
3. Only Fortran supports WORKSHARE and PARALLEL WORKSHARE.
OpenMP provides a set of callable library routines to control and query the parallel execution environment, a set of general purpose lock routines, and two portable timer routines.
The Fortran run-time library routines are external procedures. In the following summary, int_expr is a scalar integer expression, and logical_expr is a scalar logical expression.
OMP_ functions returning INTEGER(4) and LOGICAL(4) are not intrinsic and must be declared properly, otherwise the compiler will assume REAL. Interface declarations for the OpenMP Fortran runtime library routines summarized below are provided by the Fortran include file omp_lib.h and a Fortran MODULE omp_lib, as described in the Fortran OpenMP 2.0 specifications.
Supply an INCLUDE 'omp_lib.h' statement or #include "omp_lib.h" preprocessor directive, or a USE omp_lib statement in every program unit that references these library routines.
Compiling with -Xlist will report any type mismatches.
The integer parameter omp_lock_kind defines the KIND type parameters used for simple lock variables in the OMP_*_LOCK routines.
The integer parameter omp_nest_lock_kind defines the KIND type parameters used for the nestable lock variables in the OMP_*_NEST_LOCK routines.
The integer parameter openmp_version is defined as a preprocessor macro _OPENMP having the form YYYYMM where YYYY and MM are the year and month designations of the version of the OpenMP Fortran API.
The C/C++ run-time library functions are external functions.
The header <omp.h> declares two types, several functions that can be used to control and query the parallel execution environment, and lock functions that can be used to synchronize access to data.
The type omp_lock_t is an object type capable of representing that a lock is available, or that a thread owns a lock. These locks are referred to as simple locks.
The type omp_nest_lock_t is an object type capable of representing that a lock is available, or that a thread owns a lock. These locks are referred to as nestable locks.
For details, refer to the appropriate OpenMP specifications.
Sets the number of threads to use for subsequent parallel regions
Returns the number of threads currently in the team executing the parallel region from which it is called.
Returns maximum value that can be returned by calls to the OMP_GET_NUM_THREADS function.
Returns the thread number, within its team, of the thread executing the call to this function. This number lies between 0 and OMP_GET_NUM_THREADS()-1, with 0 being the master thread.
Return the number of processors available to the program.
Determine if called from within the dynamic extent of a region executing in parallel.
Enables or disables dynamic adjustment of the number of available threads. (Dynamic adjustment is enabled by default.)
Determine whether or not dynamic thread adjustment is enabled.
Enables or disables nested parallelism. (Nested parallelism is not supported, and is disabled by default.)
Determine whether or not nested parallelism is enabled. (Nested parallelism is not supported, and is disabled by default.)
Two types of locks are supported: simple locks and nestable locks. Nestable locks may be locked multiple times by the same thread before being unlocked; simple locks may not be locked if they are already in a locked state. Simple lock variables may only be passed to simple lock routines, and nested lock variables only to nested lock routines.
Initialize a lock variable for subsequent calls.
Disassociates a lock variable from any locks.
Forces the executing thread to wait until the specified lock is available. The thread is granted ownership of the lock when it is available.
Releases the executing thread from ownership of the lock. Behavior is undefined if the thread does not own that lock.
OPM_TEST_LOCK attempts to set the lock associated with lock variable. Call does not block execution of the thread.
OMP_TEST_NEST_LOCK returns the new nesting count if the lock was set successfully, otherwise it returns 0. Call does not block execution of the thread.
Two functions support a portable wall clock timer.
Returns the elapsed wall clock time in seconds "since some arbitrary time in the past".
Returns the number of seconds between successive clock ticks.
Copyright © 2002, Sun Microsystems, Inc. All rights reserved.