C H A P T E R  4

Implementation-Defined Behaviors

This chapter notes specific behaviors in the OpenMP 2.5 specification that are implementation dependent. For last-minute information regarding the latest compiler releases, see the compiler documentation on the Sun Developer Network portal, http://developers.sun.com/sunstudio

There is no guarantee that memory accesses by multiple threads to the same variable without synchronization are atomic with respect to each other.

Several implementation-dependent and application-dependent factors affect whether accesses are atomic or not. Some variables might be larger than the largest atomic memory operation on the target platform. Some variables might be mis-aligned or of unknown alignment and the compiler or the run-time system may need to use multiple loads/stores to access the variable. Sometimes there are faster code sequences that use more loads/stores.

The OpenMP runtime library maintains the following internal control variables:

nthreads-var - stores the number of threads requested for future parallel regions.

dyn-var - controls whether dynamic adjustment of the number of threads to be used for future parallel regions is enabled.

nest-var - controls whether nested parallelism is enabled for future parallel regions.

run-sched-var - stores scheduling information to be used for loop regions using the RUNTIME schedule clause.

def-sched-var - stores implementation defined default scheduling information for loop regions.

The runtime library maintains separate copies of each of nthreads-var, dyn-var, and nest-var for each thread. On the other hand, the runtime library maintains one copy of each of run-sched-var and def-sched-var that applies to all threads.

The default value of nthreads-var is 1. That is, without an explicit num_threads() clause, a call to the omp_set_num_threads() routine, or an explicit definition of the OMP_NUM_THREADS environment variable, the default number of threads in a team is 1.

A call to omp_set_num_threads() modifies the value of nthreads-var for the calling thread only and applies to parallel regions at the same or inner nesting level encountered by the calling thread.

If the requested number of threads is greater than the number of threads an implementation can support or if the value is not a positive integer, then if SUNW_MP_WARN is set to TRUE or a callback function is registered by a call to sunw_mp_register_warn(), a warning message will be issued.

Nested parallelism is supported. Nested parallel regions can be executed by multiple threads.

The default value of nest-var is false. That is, nested parallelism is disabled by default. Set the OMP_NESTED environment variable, or call the omp_set_nested() routine to enable it.

A call to omp_set_nested() modifies the value of nest-var for the calling thread only and applies to parallel regions at the same or inner nesting level encountered by the calling thread.

By default, the maximum number of active nesting levels supported is 4. You can change that maximum by setting the environment variable SUNW_MP_MAX_NESTED_LEVELS.

The default value of dyn-var is true. That is, dynamic adjustment is enabled by default. Set the OMP_DYNAMIC environment variable, or call the omp_set_dynamic() routine to disable dynamic adjustment.

A call to omp_set_dynamic() modifies the value of dyn-var for the calling thread only and applies to parallel regions at the same or inner nesting level encountered by the calling thread.

If dynamic adjustment is enabled, then the number of threads in the team is adjusted to be the minimum of:

bullet the number of threads the user requested

bullet 1 + the number of available threads in the pool

bullet the number of available processors

On the other hand, if dynamic adjustment is disabled, then the number of threads in the team will be the minimum of:

bullet the number of threads the user requested

bullet 1 + the number of available threads in the pool

In exceptional situations, such as when there is lack of system resources, the number of threads supplied will be less than described above. In these situations, if SUNW_MP_WARN is set to TRUE or a callback function is registered via a call to sunw_mp_register_warn(), a warning message will be issued.

Refer to Chapter 2 for more information about the pool of threads and the nested parallelism execution model.

The default value of def-sched-var is STATIC scheduling. To specify a different schedule for a loop region, use the SCHEDULE clause.

The default value of run-sched-var is also STATIC scheduling. You can change the default by setting the OMP_SCHEDULE environment variable

The default chunk size for SCHEDULE(GUIDED) when chunksize is not specified is 1. The OpenMP runtime library uses the following formula for computing the chunk sizes for a loop with GUIDED scheduling:

chunksize = unassigned_iterations / (weight * num_threads)

where:

unassigned_iterations is the number of iterations in the loop that have not yet been assigned to any thread;

weight is a floating-point constant that can be specified by the user at runtime with the SUNW_MP_GUIDED_WEIGHT environment variable (Section 5.3, OpenMP Environment Variables). The current default, if not specified, assumes weight is 2.0;

num_threads is the number of threads used to execute the loop.

Choice of the weight value affects the sizes of the initial and subsequent chunks of iterations assigned to threads in loops, and has a direct affect on load balancing. Experimental results show that the default weight of 2.0 works well generally. However some applications could benefit from a different weight value.

 

Programs that are explicitly threaded using POSIX or Solaris threads can contain OpenMP directives or call routines that contain OpenMP directives.

 

For example, the following code will fall into an endless loop as threads wait at different barriers, and must be terminated with a control-C from the terminal:


% cat bad1.c
 
#include <omp.h>
#include <stdio.h>
 
int
main(void)
{
   omp_set_dynamic(0);
   omp_set_num_threads(4);
 
   #pragma omp parallel
   {
       int i = omp_get_thread_num();
 
       if (i % 2) {
           printf("At barrier 1.\n");
           #pragma omp barrier
       }
   }
   return 0;
}
% cc -xopenmp -xO3 bad1.c
% ./a.out                    run the program
At barrier 1.
At barrier 1.
                  program hung in endless loop
Control-C   to terminate execution
 

But if we set SUNW_MP_WARN before execution, the runtime library will detect the problem:


% setenv SUNW_MP_WARN TRUE
% ./a.out
At barrier 1.
At barrier 1.
WARNING (libmtsk): Threads at barrier from different directives.
    Thread at barrier from bad1.c:11.
    Thread at barrier from bad1.c:17.
    Possible Reasons:
    Worksharing constructs not encountered by all threads in the team in the       same order.
    Incorrect placement of barrier directives.

int sunw_mp_register_warn(void (*func) (void *) )

Access to the prototype for this function requires adding
#include <sunw_mp_misc.h>

For example:


% cat bad2.c
#include <omp.h>
#include <sunw_mp_misc.h>
#include <stdio.h>
 
void handle_warn(void *msg)
{
    printf("handle_warn: %s\n", (char *)msg);
}
 
void set(int i)
{
    static int k;
#pragma omp critical
    {
        k++;
    }
#pragma omp barrier
}
 
int main(void)
{
  int i, rc;
  omp_set_dynamic(0);
  omp_set_num_threads(4);
  if (sunw_mp_register_warn(handle_warn) != 0) {
      printf ("Installing callback failed\n");
  }
#pragma omp parallel for
  for (i = 0; i < 20; i++) {
      set(i);
  }
  return 0;
}
 
% cc -xopenmp -xO3 bad2.c
% a.out
handle_warn: WARNING (libmtsk): at bad2.c:21 Barrier is not permitted in dynamic extent of for / DO.

handle_warn() is installed as the callback function when an error is detected by the OpenMP runtime library. The callback function in this example merely prints the error message passed to it from the library, but could be used to trap certain errors.

 

sections construct

The structured blocks in a sections construct are divided among the members of the team executing the sections region, so that the threads execute an approximately equal number of sections.

single construct

The structured block of a single construct will be executed by the thread that encounters the single region first.

atomic construct

This implementation replaces all ATOMIC directives and pragmas by enclosing the target statement in a CRITICAL construct.

omp_set_num_threads routine

When called from within an explicit parallel region, the binding thread set for the omp_set_num_threads region is the calling thread.

omp_get_max_threads routine

When called from within an explicit parallel region, the binding thread set for the omp_get_max_threads region is the calling thread.

omp_set_dynamic routine

When called from within any explicit parallel region, the binding thread set for the omp_set_dynamic region is the calling thread only.

omp_get_dynamic routine

When called from within an explicit parallel region, the binding thread set for the omp_get_dynamic region is the calling thread only.

omp_set_nested routine

When called from within an explicit parallel region, the binding thread set for the omp_set_nested region is the calling thread only.

omp_get_nested routine

When called from within an explicit parallel region, the binding thread set for the omp_get_nested region is the calling thread only.

 

threadprivate directive

If the conditions for values of data in the threadprivate objects of threads (other than the initial thread) to persist between two consecutive active parallel regions do not all hold, then the allocation status of an allocatable array in the second region may be "not currently allocated".

shared clause

Passing a shared variable to a non-intrinsic procedure may result in the value of the shared variable being copied into temporary storage before the procedure reference, and back out of the temporary storage into the actual argument storage after the procedure reference. This copying into and out of temporary storage can occur only if conditions a, b, and c in Section 2.8.3.2 of the OpenMP 2.5 Specification hold.

Include and module files

Both the include file omp_lib.h and the module file omp_lib are provided in this implementation.

The OpenMP runtime library routines that take an argument are extended with a generic interface so arguments of different Fortran KIND type can be accommodated.