This chapter describes how to compile programs that utilize the OpenMP API.
To run a parallelized program in a multithreaded environment, you must set the OMP_NUM_THREADS environment variable prior to program execution. This tells the runtime system the maximum number of threads the program can create. The default is 1. In general, set OMP_NUM_THREADS to the available number of processors on the target platform.
The compiler README files contain information about limitations and known deficiencies regarding their OpenMP implementation. These README files are viewable directly by invoking the compiler with the -xhelp=readme flag, or by pointing an HTML browser to the Forte Developer documentation index at
file:/opt/SUNWspro/docs/index.html
3.1 Fortran 95
To enable explicit parallelization with OpenMP directives, compile the program with the f95 option flag -openmp. This flag is a macro for the following combination of f95 options:
-mp=openmp -explicitpar -stackvar -D_OPENMP=200011
-openmp=stubs links with the stubs routines for the OpenMP API routines. Use this option if you need to compile your application to execute serially.
-openmp=stubs also defines the _OPENMP preprocessor token.
See the f95(1) man page for details on these options.
3.1.1 Validation of OpenMP Directives With -XlistMP
You can obtain a static, interprocedural validation of a program's OpenMP directives by using the f95 compiler's global program checking feature. Enable OpenMP checking by compiling with the -XlistMP flag. (Diagnostic messages from -XlistMP appear in a separate file created with the name of the source file and a .lst extension). The compiler will diagnose the following violations:
Violations in the specifications of parallel directives:
- If ordered sections are contained in the dynamic extent of the DO directive, the ORDERED clause must be present in DO directive.
- The variable in the enclosing PARALLEL region must be SHARED if it is specified on the LASTPRIVATE list of a DO directive.
- An ORDERED directive can appear only in the dynamic extent of a DO or PARALLEL DO directive.
- If a variable is PRIVATE (explicitly or implicitly) or THREADPRIVATE in a PARALLEL region and this variable is set in this region then it is incorrect to use this variable after this PARALLEL region.
- Variables in the COPYPRIVATE list must be private in the enclosing context.
- Variables that appear on the FIRSTPRIVATE, LASTPRIVATE, and REDUCTION clauses on a work-sharing directive must have shared scope in the enclosing parallel region.
- DO, SECTIONS, SINGLE, and WORKSHARE directives that bind to the same PARALLEL directive are not allowed to be nested one inside the other.
- DO, SECTIONS, SINGLE, and WORKSHARE directives are not permitted in the dynamic extent of CRITICAL, ORDERED, and MASTER directives.
- BARRIER directives are not permitted in the dynamic extent of DO, SECTIONS, SINGLE, WORKSHARE, MASTER, CRITICAL, and ORDERED directives.
- MASTER directives are not permitted in the dynamic extent of DO, SECTIONS, SINGLE, WORKSHARE, MASTER, CRITICAL, and ORDERED directives.
- ORDERED directives are not allowed in the dynamic extent of SECTIONS, SINGLE, WORKSHARE, CRITICAL, and MASTER directives.
- Multiple ORDERED sections are not permitted in the dynamic extent of PARALLEL DO.
Obstacles to parallelization determined by interprocedural data dependence analysis:
- Variables declared as PRIVATE are undefined for each thread on entering the construct.
- It is incorrect to use REDUCTION variable outside reduction statement.
- Variables that are declared LASTPRIVATE or REDUCTION for a work-sharing directive for which NOWAIT appears must not be used prior to a barrier.
- Assignment into shared scalar variable inside parallel construct may lead to incorrect results.
- Using a SHARED variable as an ATOMIC variable may cause performance degradation.
- Value of a private variable can be undefined if this variable was assigned inside MASTER or SINGLE blocks.
Additional diagnostics:
- ATOMIC directive applies only to the immediately following statement which must be of the special form.
- Syntax or usage is wrong for a REDUCTION statement.
- The operator declared in a REDUCTION clause should be the same as in REDUCTION statement.
- ATOMIC variables must be scalars.
- CRITICAL directives with the same name are not allowed to be nested one inside the other.
For example, compiling a source file ord.f with -XlistMP produces a diagnostic file ord.lst:
FILE "ord.f"
1 !$OMP PARALLEL
2 !$OMP DO ORDERED
3 do i=1,100
4 call work(i)
5 end do
6 !$OMP END DO
7 !$OMP END PARALLEL
8
9 !$OMP PARALLEL
10 !$OMP DO
11 do i=1,100
12 call work(i)
13 end do
14 !$OMP END DO
15 !$OMP END PARALLEL
16 end
17 subroutine work(k)
18 !$OMP ORDERED
^
**** ERR-OMP: It is illegal for an ORDERED directive to bind to a
directive (ord.f, line 10, column 2) that does not have the
ORDERED clause specified.
19 write(*,*) k
20 !$OMP END ORDERED
21 return
22 end
|
In this example, the ORDERED directive in subroutine WORK receives a diagnostic that refers to the second DO directive because it lacks an ORDERED clause.
3.2 C and C++
To enable explicit parallelization with OpenMP directives, compile your program with the option flag -xopenmp. This flag can take an optional keyword argument.
If you specify -xopenmp but do not include a keyword, the compiler assumes -xopenmp=parallel. If you do not specify -xopenmp, the compiler assumes -xopenmp=none.
-xopenmp=parallel enables recognition of OpenMP pragmas and applies to SPARC only. The optimization level under -xopenmp=parallel is -xO3. The compiler issues a warning if the optimization level of your program is changed from a lower level to -xO3. -xopenmp=parallel defines the _OPENMP preprocessor token to be YYYYMM (specifically 199810L).
-xopenmp=stubs links with the stubs routines for the OpenMP API routines. Use this option if you need to compile your application to execute serially.
-xopenmp=stubs also predefines the _OPENMP preprocessor token.
-xopenmp=none does not enable recognition of OpenMP pragmas, makes no change to the optimization level of your program, and does not predefine any preprocessor tokens.
With C, do not compile with -xopenmp and -xparallel or -xexplicitpar together.
The C++ implementation is limited to just the OpenMP C Version 1.0 API specifications.
3.3 OpenMP Environment Variables
The OpenMP specifications define four environment variables that control the execution of OpenMP programs. These are summarized in the following table. Additional multiprocessing environment variables affect execution of OpenMP programs and are not part of the OpenMP specifications. These are summarized in TABLE 3-2:
TABLE 3-1 OpenMP Environment Variables: setenv VARIABLE value
Environment Varialbe
|
Function
|
OMP_SCHEDULE
|
Sets schedule type for DO, PARALLEL DO, parallel for, for, directives/pragmas with schedule type RUNTIME specified. If not defined, a default value of STATIC is used. value is "type[,chunk]"
Example: setenv OMP_SCHEDULE "GUIDED,4"
|
OMP_NUM_THREADS or PARALLEL
|
Sets the number of threads to use during execution, unless set by a NUM_THREADS clause, or a call to OMP_SET_NUM_THREADS(). If not set, a default of 1 is used. value is a positive integer. (Current maximum is 128). For compatibility with legacy programs, setting the PARALLEL environment variable has the same effect as setting OMP_NUM_THREADS. However, if they are both set to different values, the runtime library will issue an error message.
Example: setenv OMP_NUM_THREADS 16
|
OMP_DYNAMIC
|
Enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. If not set, a default value of TRUE is used. value is either TRUE or FALSE.
Example: setenv OMP_DYNAMIC FALSE
|
OMP_NESTED
|
Enables or disables nested parallelism. (Nested parallelism is not supported).
value is either TRUE or FALSE. (This variable has no effect.)
Example: setenv OMP_NESTED FALSE
|
TABLE 3-2 Multiprocessing Environment Variables
Environment Variable
|
Function
|
SUNW_MP_WARN
|
Controls warning messages issued by the OpenMP runtime library. If set TRUE the runtime library issues warning messages to stderr; FALSE disables warning messages. The default is FALSE.
Example:
setenv SUNW_MP_WARN FALSE
|
SUNW_MP_THR_IDLE
|
Controls the end-of-task status of each thread executing the parallel part of a program. You can set the value to spin, sleep ns, or sleep nms. The default is SPIN -- a thread should spin (or busy-wait) after completing a parallel task, until a new parallel task arrives.
Choosing SLEEP time specifies the amount of time a thread should spin-wait after completing a parallel task. If, while a thread is spinning, a new task arrives for the thread, the tread executes the new task immediately. Otherwise, the thread goes to sleep and is awakened when a new task arrives. time may be specified in seconds, (ns), or just (n), or milliseconds, (nms).
SLEEP with no argument puts the thread to sleep immediately after completing a parallel task. SLEEP, SLEEP (0), SLEEP (0s), and SLEEP (0ms) are all equivalent.
Example: setenv SUNW_MP_THR_IDLE (50ms)
|
STACKSIZE
|
Sets the stack size for each thread. The value is in kilobytes.
The default thread stack sizes are 4 Mb on 32-bit SPARC V8 platforms, and 8 Mb on 64-bit SPARC V9 platforms.
Example:
setenv STACKSIZE 8192 sets the thread stack size to 8 Mb
|
3.4 Stacks and Stack Sizes
The executing program maintains a main memory stack for the initial thread executing the program, as well as distinct stacks for each helper thread. Stacks are temporary memory address spaces used to hold arguments and automatic variables over subprogram or function references.
The default main stack is about 8 megabytes. Compiling Fortran programs with the f95 option -stackvar forces the allocation of local variables and arrays on the stack as if they were automatic variables. Use of -stackvar with OpenMP programs is required with explicitly parallelized programs because it improves the optimizer's ability to parallelize calls in loops. (See the Fortran User's Guide for a discussion of the -stackvar flag.) However, this may lead to stack overflow if not enough memory is allocated for the stack.
Use the limit C-shell command, or the ulimit ksh/sh command, to display or set the size of the main stack.
Each helper thread of a multithreaded program has its own thread stack. This stack mimics the initial thread stack but is unique to the thread. The thread's PRIVATE arrays and variables (local to the thread) are allocated on the thread stack. The default size is 4 megabytes on 32-bit systems and 8 megabytes on 64-bit systems. The size of the thread stack is set with the STACKSIZE environment variable.
demo% setenv STACKSIZE 16384 <-Set thread stack size to 16 Mb (C shell)
demo% STACKSIZE=16384 <-Same, using Bourne/Korn shell
demo% export STACKSIZE
|
Finding the best stack size might have to be determined by trial and error. If the stack size is too small for a thread to run it may cause silent data corruption in neighboring threads, or segmentation faults. If you are unsure about stack overflows, compile your Fortran or C programs with the -xcheck=stkovf flag to report on runtime stack overflow situations that occur in the compiled code.
OpenMP API User's Guide
| 816-2468-10
|
|
Copyright © 2002, Sun Microsystems, Inc. All rights reserved.