This chapter notes specific behaviors in the OpenMP 2.5 specification that are implementation dependent. For last-minute information regarding the latest compiler releases, see the compiler documentation on the Sun Developer Network portal, http://developers.sun.com/sunstudio
Memory Model
There is no guarantee that memory accesses by multiple threads to the same variable without synchronization are atomic with respect to each other.
Several implementation-dependent and application-dependent factors affect whether accesses are atomic or not. Some variables might be larger than the largest atomic memory operation on the target platform. Some variables might be mis-aligned or of unknown alignment and the compiler or the run-time system may need to use multiple loads/stores to access the variable. Sometimes there are faster code sequences that use more loads/stores.
Internal Control Variables
The OpenMP runtime library maintains the following internal control variables:
nthreads-var - stores the number of threads requested for future parallel regions.
dyn-var - controls whether dynamic adjustment of the number of threads to be used for future parallel regions is enabled.
nest-var - controls whether nested parallelism is enabled for future parallel regions.
run-sched-var - stores scheduling information to be used for loop regions using the RUNTIME schedule clause.
def-sched-var - stores implementation defined default scheduling information for loop regions.
The runtime library maintains separate copies of each of nthreads-var, dyn-var, and nest-var for each thread. On the other hand, the runtime library maintains one copy of each of run-sched-var and def-sched-var that applies to all threads.
The default value of nthreads-var is 1. That is, without an explicit num_threads() clause, a call to the omp_set_num_threads() routine, or an explicit definition of the OMP_NUM_THREADS environment variable, the default number of threads in a team is 1.
A call to omp_set_num_threads() modifies the value of nthreads-var for the calling thread only and applies to parallel regions at the same or inner nesting level encountered by the calling thread.
If the requested number of threads is greater than the number of threads an implementation can support or if the value is not a positive integer, then if SUNW_MP_WARN is set to TRUE or a callback function is registered by a call to sunw_mp_register_warn(), a warning message will be issued.
Nested parallelism is supported. Nested parallel regions can be executed by multiple threads.
The default value of nest-var is false. That is, nested parallelism is disabled by default. Set the OMP_NESTED environment variable, or call the omp_set_nested() routine to enable it.
A call to omp_set_nested() modifies the value of nest-var for the calling thread only and applies to parallel regions at the same or inner nesting level encountered by the calling thread.
By default, the maximum number of active nesting levels supported is 4. You can change that maximum by setting the environment variable SUNW_MP_MAX_NESTED_LEVELS.
The default value of dyn-var is true. That is, dynamic adjustment is enabled by default. Set the OMP_DYNAMIC environment variable, or call the omp_set_dynamic() routine to disable dynamic adjustment.
A call to omp_set_dynamic() modifies the value of dyn-var for the calling thread only and applies to parallel regions at the same or inner nesting level encountered by the calling thread.
If dynamic adjustment is enabled, then the number of threads in the team is adjusted to be the minimum of:
the number of threads the user requested
1 + the number of available threads in the pool
the number of available virtual processors
On the other hand, if dynamic adjustment is disabled, then the number of threads in the team will be the minimum of:
the number of threads the user requested
1 + the number of available threads in the pool
In exceptional situations, such as when there is lack of system resources, the number of threads supplied will be less than described above. In these situations, if SUNW_MP_WARN is set to TRUE or a callback function is registered via a call to sunw_mp_register_warn(), a warning message will be issued.
Refer to Chapter 2 for more information about the pool of threads and the nested parallelism execution model.
Loop Scheduling
The default value of def-sched-var is STATIC scheduling. To specify a different schedule for a loop region, use the SCHEDULE clause.
The default value of run-sched-var is also STATIC scheduling. You can change the default by setting the OMP_SCHEDULE environment variable
GUIDED: Determination of Chunk Sizes
The default chunk size for SCHEDULE(GUIDED) when chunksize is not specified is 1. The OpenMP runtime library uses the following formula for computing the chunk sizes for a loop with GUIDED scheduling: chunksize = unassigned_iterations / (weight * num_threads) where: unassigned_iterations is the number of iterations in the loop that have not yet been assigned to any thread; weight is a floating-point constant that can be specified by the user at runtime with the SUNW_MP_GUIDED_WEIGHT environment variable (2.3 OpenMP Environment Variables). The current default, if not specified, assumes weight is 2.0; num_threads is the number of threads used to execute the loop.Choice of the weight value affects the sizes of the initial and subsequent chunks of iterations assigned to threads in loops, and has a direct affect on load balancing. Experimental results show that the default weight of 2.0 works well generally. However some applications could benefit from a different weight value.
Programs that are explicitly threaded using POSIX or Solaris threads can contain OpenMP directives or call routines that contain OpenMP directives.
Setting the SUNW_MP_WARN environment variable (2.3 OpenMP Environment Variables) enables runtime validity checking by the OpenMP runtime library.
For example, the following code will fall into an endless loop as threads wait at different barriers, and must be terminated with a control-C from the terminal:
% cat bad1.c #include <omp.h> #include <stdio.h> int main(void) { omp_set_dynamic(0); omp_set_num_threads(4); #pragma omp parallel { int i = omp_get_thread_num(); if (i % 2) { printf("At barrier 1.\n"); #pragma omp barrier } } return 0; } % cc -xopenmp -xO3 bad1.c % ./a.out run the program At barrier 1. At barrier 1. program hung in endless loop Control-C to terminate execution |
But if we set SUNW_MP_WARN before execution, the runtime library will detect the problem:
% setenv SUNW_MP_WARN TRUE % ./a.out WARNING (libmtsk): Environment variable SUNW_MP_WARN is set to TRUE. Runtime error checking will be enabled. At barrier 1. At barrier 1. WARNING (libmtsk): Threads at barrier from different directives. Thread at barrier from bad1.c:8. Thread at barrier from bad1.c:13. Possible Reasons: Worksharing constructs not encountered by all threads in the team in the same order. Incorrect placement of barrier directives. WARNING (libmtsk): Runtime shutting down while some parallel region is still active. |
The C and C++ compilers also provide a function that can be used to register a callback function when errors are detected. When an error is detected, the registered callback function is called and passed a pointer to an error message string as an argument.
int sunw_mp_register_warn(void (*func) (void *) )
Access to the prototype for this function requires adding #include <sunw_mp_misc.h>
For example:
% cat bad2.c #include <omp.h> #include <sunw_mp_misc.h> #include <stdio.h> void handle_warn(void *msg) { printf("handle_warn: %s\n", (char *)msg); } void set(int i) { static int k; #pragma omp critical { k++; } #pragma omp barrier } int main(void) { int i, rc; omp_set_dynamic(0); omp_set_num_threads(4); if (sunw_mp_register_warn(handle_warn) != 0) { printf ("Installing callback failed\n"); } #pragma omp parallel for for (i = 0; i < 20; i++) { set(i); } return 0; } % cc -xopenmp -xO3 bad2.c % a.out WARNING (libmtsk): Environment variable SUNW_MP_WARN is set to TRUE. Runtime error checking will be enabled. handle_warn: WARNING (libmtsk): at bad2.c:15. BARRIER is not permitted in the dynamic extent of FOR / DO. |
handle_warn() is installed as the callback function when an error is detected by the OpenMP runtime library. The callback function in this example merely prints the error message passed to it from the library, but could be used to trap certain errors.
Regarding Specific Constructs:
sections construct
The structured blocks in a sections construct are divided among the members of the team executing the sections region, so that the threads execute an approximately equal number of sections.
single construct
The structured block of a single construct will be executed by the thread that encounters the single region first.
atomic construct
This implementation replaces all ATOMIC directives and pragmas by enclosing the target statement in a CRITICAL construct.
Binding Thread Set for OpenMP Library Routines:
omp_set_num_threads routine
When called from within an explicit parallel region, the binding thread set for the omp_set_num_threads region is the calling thread.
omp_get_max_threads routine
When called from within an explicit parallel region, the binding thread set for the omp_get_max_threads region is the calling thread.
omp_set_dynamic routine
When called from within any explicit parallel region, the binding thread set for the omp_set_dynamic region is the calling thread only.
omp_get_dynamic routine
When called from within an explicit parallel region, the binding thread set for the omp_get_dynamic region is the calling thread only.
omp_set_nested routine
When called from within an explicit parallel region, the binding thread set for the omp_set_nested region is the calling thread only.
omp_get_nested routine
When called from within an explicit parallel region, the binding thread set for the omp_get_nested region is the calling thread only.
Fortran 95-Specific Issues:
threadprivate directive
If the conditions for values of data in the threadprivate objects of threads (other than the initial thread) to persist between two consecutive active parallel regions do not all hold, then the allocation status of an allocatable array in the second region may be "not currently allocated".
shared clause
Passing a shared variable to a non-intrinsic procedure may result in the value of the shared variable being copied into temporary storage before the procedure reference, and back out of the temporary storage into the actual argument storage after the procedure reference. This copying into and out of temporary storage can occur only if conditions a, b, and c in Section 2.8.3.2 of the OpenMP 2.5 Specification hold.
Include and module files
Both the include file omp_lib.h and the module file omp_lib are provided in this implementation.
On Solaris, the OpenMP runtime library routines that take an argument are extended with a generic interface so arguments of different Fortran KIND types can be accommodated.