C H A P T E R 1 |
OpenMP API Summary |
The OpenMP Application Program Interface is a portable, parallel programming model for shared memory multiprocessor architectures, developed in collaboration with a number of computer vendors. The specifications were created and are published by the OpenMP Architecture Review Board. For more information on OpenMP, including tutorials and other resources, see their web site at: http://www.openmp.org/.
The OpenMP API is the recommended parallel programming model for all Sun Studio compilers on Solaris OS platforms. See Chapter 6 for guidelines on converting legacy Fortran and C parallelization directives to OpenMP.
This chapter summarizes the directives, run-time library routines, and environment variables comprising the OpenMP Version 2.0 Application Program Interfaces, as implemented by the Sun Studio Fortran 95, C and C++ compilers.
The material presented in this chapter is only a summary with many details left out intentionally for the sake of brevity. In all cases, refer to the OpenMP specification documents for complete details.
The Fortran and C/C++ OpenMP 2.0 specifications can be found on the official OpenMP website, http://www.openmp.org/.
In the tables and examples that follow, Fortran directives and source code are shown in upper case, but are case-insensitive.
The term structured-block refers to a block of Fortran or C/C++ statements having no transfers into or out of the block.
Constructs within square brackets, [...], are optional.
Throughout this manual, "Fortran" refers to the Fortran 95 language and compiler, f95.
The terms "directive" and "pragma" are used interchangeably in this manual.
Only one directive-name can be specified on a directive line, and applies to the succeeding program statement.
Fortran fixed format accepts three directive "sentinels", free format accepts only one. In the Fortran examples that follow, free format will be used.
C and C++ use the standard preprocessing directive starting with #pragma omp.
The OpenMP API defines the preprocessor symbol _OPENMP to be used for conditional compilation. In addition, OpenMP Fortran API accepts a conditional compilation sentinel.
Compiling with OpenMP enabled defines the macro _OPENMP. |
The PARALLEL directive defines a parallel region, which is a region of the program that is to be executed by multiple threads in parallel.
There are many special conditions and restrictions. Programmers are urged to refer to the appropriate OpenMP specification document for the details.
TABLE 1-1 identifies the clauses that can appear with this construct.
Work-sharing constructs divide the execution of the enclosed code region among the members of the team of threads that encounter it. Work sharing constructs must be enclosed within a parallel region for the construct to execute in parallel.
There are many special conditions and restrictions on these directives and the code they apply to. Programmers are urged to refer to the appropriate OpenMP specification document for the details.
Specifies that the iterations of the DO or for loop that follows should be executed in parallel.
TABLE 1-1 identifies the clauses that can appear with this construct.
The SECTIONS construct encloses a set of structured blocks of code to be divided among threads in the team. Each block is executed once by a thread in the team.
Each section is preceded by a SECTION directive, which is optional for the first section.
TABLE 1-1 identifies the clauses that can appear with this construct.
The structured block enclosed by SINGLE is executed by only one thread in the team. Threads in the team that are not executing the SINGLE block wait at the end of the block unless NOWAIT is specified.
TABLE 1-1 identifies the clauses that can appear with this construct.
The WORKSHARE construct divides the work of executing the enclosed code block into separate units of work, and causes the threads of the team to share the work such that each unit is executed only once.
There is no C/C++ equivalent to the Fortran WORKSHARE construct.
The combined parallel work-sharing constructs are shortcuts for specifying a parallel region that contains one work-sharing construct.
There are many special conditions and restrictions on these directives and the code they apply to. Refer to the appropriate OpenMP specification document for the complete details. The description that follows is intended only as a summary and is not complete.
TABLE 1-1 identifies the clauses that can appear with these constructs.
Shortcut for specifying a parallel region that contains a single DO or for loop. Equivalent to a PARALLEL directive followed immediately by a DO or for directive. clause can be any of the clauses accepted by the PARALLEL and DO/for directives, except the NOWAIT modifier.
Shortcut for specifying a parallel region that contains a single SECTIONS directive. Equivalent to a PARALLEL directive followed by a SECTIONS directive. clause can be any of the clauses accepted by the PARALLEL and SECTIONS directives, except the NOWAIT modifier.
The Fortran PARALLEL WORKSHARE construct provides a shortcut for specifying a parallel region that contains a single WORKSHARE directive. clause can be one of the clauses accepted by the PARALLEL directive.
The following constructs specify thread synchronization. There are many special conditions and restrictions regarding these constructs that are too numerous to summarize here. Programmers are urged to refer to the appropriate OpenMP specification document for the complete details.
Only the master thread of the team executes the block enclosed by this directive. The other threads skip this block and continue. There is no implied barrier on entry to or exit from the master construct.
Restrict access to the structured block to only one thread at a time. The optional name argument identifies the critical region. All unnamed CRITICAL directives map to the same name. Critical section names are global entities of the program and must be unique. For Fortran, if name appears on the CRITICAL directive, it must also appear on the END CRITICAL directive. For C/C++, the identifier used to name a critical region has external linkage and is in a name space which is separate from the name spaces used by labels, tags, members, and ordinary identifiers.
Synchronizes all the threads in a team. Each thread waits until all the others in the team have reached this point.
After all threads in the team have encountered the barrier, each thread in the team begins executing the statements after the BARRIER directive in parallel.
Note that because the barrier pragma does not have a C/C++ statement as part of its syntax, there are restrictions on its placement within a program. See the C/C++ OpenMP specifications for details.
Ensures that a specific memory location is to be updated atomically, rather than exposing it to the possibility of multiple, simultaneous writing threads.
The directive applies only to the expression-statement immediately following the directive, which must be in one of these forms: |
This implementation replaces all ATOMIC directives by enclosing the expression-statement in a critical section.
Thread-visible Fortran variables or C objects are written back to memory at the point at which this directive appears. The FLUSH directive only provides consistency between operations within the executing thread and global memory. The optional variable-list consists of a comma-separated list of variables or objects that need to be flushed. A FLUSH directive without a variable-list synchronizes all thread-visible shared variables or objects.
Note that because the flush pragma does not have a C/C++ statement as part of its syntax, there are restrictions on its placement within a program. See the C/C++ OpenMP specifications for details.
The enclosed block is executed in the order that iterations would be executed in a sequential execution of the loop.
The following directives control the data environment during execution of parallel constructs.
Makes the list of objects (Fortran common blocks and named variables, C and C++ named variables) private to a thread but global within the thread.
See the OpenMP specifications for the complete details and restrictions.
Common block names must appear between slashes. To make a common block THREADPRIVATE, this directive must appear after every COMMON declaration of that block. |
Each variable in list at file, namespace, or block scope must refer to a variable declaration at file, namespace, or block scope that lexically preceeds the pragma. |
This section summarizes the data scoping and scheduling clauses that can appear on OpenMP directives.
Several directives accept clauses that allow a user to control the scope attributes of variables within the extent of the construct. If no data scope clause is specified for a directive, the default scope for variables affected by the directive is SHARED.
Fortran: list is a comma-separated list of named variables or common blocks that are accessible in the scoping unit. Common block names must appear within slashes (for example, /ABLOCK/).
There are important restrictions on the use of these scoping clauses. Refer to the appropriate sections of the OpenMP specifications for complete details.
TABLE 1-1 identifies the directives on which these clauses can appear.
Declares the variables in the optional comma-separated list to be private to each thread in a team.
All the threads in the team share the variables that appear in list, and access the same storage area.
DEFAULT(PRIVATE | SHARED | NONE)
Specify scoping attributes for all variables within a parallel region. THREADPRIVATE variables are not affected by this clause. If not specified, DEFAULT(SHARED) is assumed. A variable's default data-sharing attribute can be overridden by using the private, firstprivate, lastprivate, reduction, and shared clauses.
The variables in list are PRIVATE. In addition, private copies of the variables are initialized from the original object existing before the construct.
The variables in the list are PRIVATE. In addition, when the LASTPRIVATE clause appears on a DO or for directive, the thread that executes the sequentially last iteration updates the original object. On a SECTIONS directive, the thread that executes the lexically last SECTION updates the original object.
The COPYIN clause applies only to variables, common blocks, and variables in common blocks that are declared as THREADPRIVATE. In a parallel region, COPYIN specifies that the data in the master thread of the team be copied to the threadprivate copies at the beginning of the parallel region.
The COPYIN clause applies only to variables that are declared as THREADPRIVATE. In a parallel region, COPYIN specifies that the data in the master thread of the team be copied to the threadprivate copies at the beginning of the parallel region.
Uses a private variable to broadcast a value, or a pointer to a shared object, from one member of a team to the other members. COPYPRIVATE clause can only appear on the END SINGLE directive. The broadcast occurs after the execution of the structured block associated with the single construct, and before any threads in the team have left the barrier at the end of the construct. The variables in list must not appear in a PRIVATE or FIRSTPRIVATE clause of the SINGLE construct specifying COPYPRIVATE.
Uses a private variable to broadcast a value from one member of a team to the other members. The copyprivate clause can only appear on the single directive. The broadcast occurs after the execution of the structured block associated with the single construct, and before any threads in the team have left the barrier at the end of the construct. The variables in list must not appear in a private or firstprivate clause for the same single directive.
REDUCTION(operator|intrinsic:list)
operator is one of: +, *, -, .AND., .OR., .EQV., .NEQV.
intrinsic is one of: MAX, MIN, IAND, IOR, IEOR
Variables in list must be named variables of intrinsic type.
operator is one of: +, *, -, &, ^, |, &&, ||
The REDUCTION clause is intended to be used on a region in which the reduction variable is used only in reduction statements. The variables in list must be SHARED in the enclosing context. A private copy of each variable is created for each thread as if it were PRIVATE. At the end of the reduction, the shared variable is updated by combining the original value with the final value of each of the private copies.
See the appropriate sections of the OpenMP specifications for complete details and restrictions on REDUCTION clauses and constructs.
The SCHEDULE clause specifies how iterations in a Fortran DO loop or C/C++ for loop are divided among the threads in a team. TABLE 1-1 shows which directives allow the SCHEDULE clause.
There are important restrictions on the use of these scheduling clauses. Refer to section 2.3.1 in the Fortran specification, and section 2.4.1 in the C/C++ specification for complete details.
Specifies how iterations of the DO or for loop are divided among the threads of the team. type can be one of STATIC, DYNAMIC, GUIDED, or RUNTIME. In the absence of a SCHEDULE clause, Sun Studio compilers use STATIC scheduling. chunk must be an integer expression.
Iterations are divided into pieces of a size specified by chunk. The pieces are statically assigned to threads in the team in a round-robin fashion in the order of the thread number. If not specified, chunk is chosen so that the iterations divide into contiguous chunks nearly equal in size with one chunk assigned to each thread.
Iterations are divided into pieces of a size specified by chunk, and assigned to a waiting thread. As each thread finishes its piece of the iteration space, it dynamically obtains the next set of iterations. When no chunk is specified, it defaults to 1.
With GUIDED, the chunk size is reduced in an exponentially decreasing manner with each dispatched piece of the iterations. chunk specifies the minimum number of iterations to dispatch each time. (The size of the chunks is determined by a formula that is implementation dependent; see GUIDED: Determination of Chunk Sizes.). When no chunk is specified, it defaults to 2.0.
Scheduling is deferred until runtime. Schedule type and chunk size will be determined from the value of the OMP_SCHEDULE environment variable. (Default is SCHEDULE(STATIC).
The OpenMP API provides a NUM_THREADS clause on the PARALLEL, PARALLEL SECTIONS, PARALLEL DO, PARALLEL for,and PARALLEL WORKSHARE directives.
num_threads(scalar_integer_expression)
Specifies the number of threads in the team created when a thread enters a parallel region. scalar_integer_expression is the number of threads requested, and supersedes the number of threads defined by a prior call to the OMP_SET_NUM_THREADS library function, or the value of the OMP_NUM_THREADS environment variable. If dynamic thread management is enabled, the request is the maximum number of threads to use.
Note that num_threads does not apply to subsequent regions.
TABLE 1-1 shows the clauses that can appear on these directives and pragmas:
IF |
|||||||
PRIVATE |
|||||||
SHARED |
|||||||
FIRSTPRIVATE |
|||||||
LASTPRIVATE |
|||||||
DEFAULT |
|||||||
REDUCTION |
|||||||
COPYIN |
|||||||
COPYPRIVATE |
|||||||
ORDERED |
|||||||
SCHEDULE |
|||||||
NOWAIT |
|||||||
NUM_THREADS |
1. Fortran only: COPYPRIVATE can appear on the END SINGLE directive.
2. For Fortran, a NOWAIT modifier can only appear on the END DO, END SECTIONS, END SINGLE, or END WORKSHARE directives.
3. Only Fortran supports WORKSHARE and PARALLEL WORKSHARE.
OpenMP provides a set of callable library routines to control and query the parallel execution environment, a set of general purpose lock routines, and two portable timer routines. Full details appear in the Fortran and C/C++ OpenMP specifications.
The Fortran run-time library routines are external procedures. In the following summary, int_expr is a scalar integer expression, and logical_expr is a scalar logical expression.
The OMP_ functions returning INTEGER(4) and LOGICAL(4) are not intrinsic and must be declared properly, otherwise the compiler will assume REAL. Interface declarations for the OpenMP Fortran runtime library routines summarized below are provided by the Fortran include file omp_lib.h and a Fortran MODULE omp_lib, as described in the Fortran OpenMP specifications.
Supply an INCLUDE 'omp_lib.h' statement or #include "omp_lib.h" preprocessor directive, or a USE omp_lib statement in every program unit that references these library routines.
Compiling with -Xlist will report any type mismatches.
The integer parameter omp_lock_kind defines the KIND type parameters used for simple lock variables in the OMP_*_LOCK routines.
The integer parameter omp_nest_lock_kind defines the KIND type parameters used for the nestable lock variables in the OMP_*_NEST_LOCK routines.
The integer parameter openmp_version is defined as a preprocessor macro _OPENMP having the form YYYYMM where YYYY and MM are the year and month designations of the version of the OpenMP Fortran API.
The C/C++ run-time library functions are external functions.
The header <omp.h> declares two types, several functions that can be used to control and query the parallel execution environment, and lock functions that can be used to synchronize access to data.
The type omp_lock_t is an object type capable of representing that a lock is available, or that a thread owns a lock. These locks are referred to as simple locks.
The type omp_nest_lock_t is an object type capable of representing that a lock is available, or that a thread owns a lock. These locks are referred to as nestable locks.
For details, refer to the appropriate OpenMP specifications.
Sets the number of threads to use for subsequent parallel regions not specified with a num_threads() clause. This call affects only the subsequent parallel regions encountered by the calling thread at the same or inner nesting level.
SUBROUTINE OMP_SET_NUM_THREADS(int_expr)
#include <omp.h>
void omp_set_num_threads(int num_threads);
Returns the number of threads currently in the team executing the parallel region from which it is called.
INTEGER(4) FUNCTION OMP_GET_NUM_THREADS()
#include <omp.h>
int omp_get_num_threads(void);
Returns maximum number of threads that would be used to form a team if an active paralel region specified without a num_threads() clause were to be encountered at this point in the program.
INTEGER(4) FUNCTION OMP_GET_MAX_THREADS()
#include <omp.h>
int omp_get_max_threads(void);
Returns the thread number, within its team, of the thread executing the call to this function. This number lies between 0 and OMP_GET_NUM_THREADS()-1, with 0 being the master thread.
INTEGER(4) FUNCTION OMP_GET_THREAD_NUM()
#include <omp.h>
int omp_get_thread_num(void);
Return the number of processors available to the program.
INTEGER(4) FUNCTION OMP_GET_NUM_PROCS()
#include <omp.h>
int omp_get_num_procs(void);
Determine whether or not thread is executing within the dynamic extent of a parallel region.
LOGICAL(4) FUNCTION OMP_IN_PARALLEL()
Returns .TRUE. if called within the dynamic extent of an active parallel region, .FALSE. otherwise.
#include <omp.h>
int omp_in_parallel(void);
Returns nonzero if called within the dynamic extent of an active parallel region, zero otherwise.
An active parallel region is a parallel region where the IF clause evaluates to TRUE.
Enables or disables dynamic adjustment of the number of available threads. (Dynamic adjustment is enabled by default.) This call affects only the subsequent parallel regions encountered by the calling thread at the same or inner nesting level.
SUBROUTINE OMP_SET_DYNAMIC(logical_expr)
Dynamic adjustment is enabled when logical_expr evaluates to .TRUE., and is disabled otherwise.
#include <omp.h>
void omp_set_dynamic(int dynamic);
If dynamic evaluates to nonzero, dynamic adjustment is enabled; otherwise it is disabled.
Determine whether or not dynamic thread adjustment is enabled at this point in the program.
LOGICAL(4) FUNCTION OMP_GET_DYNAMIC()
Returns .TRUE. if dynamic thread adjustment is enabled, .FALSE. otherwise.
#include <omp.h>
int omp_get_dynamic(void);
Returns nonzero if dynamic thread adjustment is enabled, zero otherwise.
Enables or disables nested parallelism.This call affects only the subsequent parallel regions encountered by the calling thread at the same or inner nesting level.
SUBROUTINE OMP_SET_NESTED(logical_expr)
Nested parallelism is enabled if logical_expr evaluates to .TRUE., and is disabled otherwise.
#include <omp.h>
void omp_set_nested(int nested);
Nested parallelism is enabled if nested evaluates to non-zero, and is disabled otherwise.
Nested parallelism is disabled by default. See Chapter 2 for information on nested parallelism.
Determine whether or not nested parallelism is enabled at this point in the program.
LOGICAL(4) FUNCTION OMP_GET_NESTED()
Returns .TRUE. if nested parallelism is enabled, .FALSE. otherwise.
#include <omp.h>
int omp_get_nested(void);
Returns nonzero if nested parallelism is enabled, zero otherwise.
See Chapter 2 for information on nested parallelism.
Two types of locks are supported: simple locks and nestable locks. Nestable locks may be locked multiple times by the same thread before being unlocked; simple locks may not be locked if they are already in a locked state. Simple lock variables may only be passed to simple lock routines, and nested lock variables only to nested lock routines.
The lock variable var must be accessed only through these routines. Use the parameters OMP_LOCK_KIND and OMP_NEST_LOCK_KIND (defined in omp_lib.h INCLUDE file and the omp_lib MODULE) for this purpose. For example,
INTEGER(KIND=OMP_LOCK_KIND) :: var
INTEGER(KIND=OMP_NEST_LOCK_KIND) :: nvar
Simple lock variables must have type omp_lock_t and must be accessed only through these functions. All simple lock functions require an argument that points to omp_lock_t type.
Nested lock variables must have type omp_nest_lock_t, and similarly all nested lock functions require an argument that points to omp_nest_lock_t type.
Initialize a lock variable for subsequent calls.
SUBROUTINE OMP_INIT_NEST_LOCK(nvar)
#include <omp.h>
void omp_init_lock(omp_lock_t *lock);
void omp_init_nest_lock(omp_nest_lock_t *lock);
SUBROUTINE OMP_DESTROY_LOCK(var)
SUBROUTINE OMP_DESTROY_NEST_LOCK(nvar)
void omp_destroy_lock(omp_lock_t *lock);
void omp_destroy_nest_lock(omp_nest_lock_t *lock);
Forces the executing thread to wait until the specified lock is available. The thread is granted ownership of the lock when it is available.
SUBROUTINE OMP_SET_NEST_LOCK(nvar)
void omp_set_lock(omp_lock_t *lock);
void omp_set_nest_lock(omp_nest_lock_t *lock);
Releases the executing thread from ownership of the lock. Behavior is undefined if the thread does not own that lock.
SUBROUTINE OMP_UNSET_LOCK(var)
SUBROUTINE OMP_UNSET_NEST_LOCK(nvar)
void omp_unset_lock(omp_lock_t *lock);
void omp_unset_nest_lock(omp_nest_lock_t *lock);
OMP_TEST_LOCK attempts to set the lock associated with lock variable. Call does not block execution of the thread.
OMP_TEST_NEST_LOCK returns the new nesting count if the lock was set successfully, otherwise it returns 0. Call does not block execution of the thread.
LOGICAL(4) FUNCTION OMP_TEST_LOCK(var)
Returns .TRUE. if the lock was set, .FALSE. otherwise.
INTEGER(4) FUNCTION OMP_TEST_NEST_LOCK(nvar)
Returns nesting count if lock was set successfully, zero otherwise.
#include <omp.h>
int omp_test_lock(omp_lock_t *lock);
Returns a nonzero value if lock was set successfully, zero otherwise.
int omp_test_nest_lock(omp_nest_lock_t *lock);
Returns lock nest count if lock was set successfully, zero otherwise.
Two functions support a portable wall clock timer.
Returns the elapsed wall clock time in seconds "since some arbitrary time in the past".
REAL(8) FUNCTION OMP_GET_WTIME()
#include <omp.h>
double omp_get_wtime(void);
Returns the number of seconds between successive clock ticks.
REAL(8) FUNCTION OMP_GET_WTICK()
#include <omp.h>
double omp_get_wtick(void);
Copyright © 2004, Sun Microsystems, Inc. All Rights Reserved.