Sun Studio 12 Update 1: OpenMP API User's Guide

5.5 Programming Considerations

Tasking introduces a layer of complexity to an OpenMP program. The programmer needs to pay special attention to how a program with tasks works. Here are some programming issues to consider.

5.5.1 THREADPRIVATE and Thread-Specific Information

When a thread encounters a task scheduling point, the implementation may choose to suspend the current task and schedule the thread to work on another task. This implies that the value of a threadprivate variable, or other thread-specific information such as the thread number, may change across a task scheduling point.

If the suspended task is tied, then the thread that resumes executing the task will be the same thread that suspended it. Therefore, the thread number will remain the same after the task is resumed. However, the value of a threadprivate variable may change because the thread may have been scheduled to work on another task that modified the threadprivate variable before resuming the suspended task.

If the suspended task is untied, then the thread that resumes executing the task may be different from the thread that suspended it. Therefore, both the thread number and the value of a threadprivate variable before and after the task scheduling point may be different.

5.5.2 Locks

OpenMP 3.0 specifies that locks are no longer owned by threads, but by tasks. Once a lock is acquired, the current task owns it, and the same task must release it before task completion.

The critical construct, on the other hand, remains as a thread-based mutual exclusion mechanism.

The change in lock ownership requires extra care when using locks. The following program (it appears as Example A.43.1c in the OpenMP Specification version 3.0) is conforming in OpenMP 2.5 because the thread that releases the lock lck in the parallel region is the same thread that acquired the lock in the sequential part of the program (the master thread of a parallel region and the initial thread are the same). However, it is not conforming in OpenMP 3.0, because the task region that releases the lock lck is different from the task region that acquires the lock.


Example 5–2 Example Using Locks: Non-Conformingn in OpenMP 3.0


#include <stdlib.h>
#include <stdio.h>
#include <omp.h>

int main()
{
  int x;
  omp_lock_t lck;

  omp_init_lock (&lck);
  omp_set_lock (&lck);
  x = 0;

  #pragma omp parallel shared (x)
  {
    #pragma omp master
    {
      x = x + 1;
      omp_unset_lock (&lck);
    }
  }
  omp_destroy_lock (&lck);
}

5.5.3 References to Stack Data

A task is likely to have references to data on the stack of the routine where the task construct appears. Since the execution of a task may be deferred until the next implicit or explicit barrier, it is possible that a given task will execute after the stack of the routine where it appears has already been popped and the stack data overwritten, thereby destroying the stack data listed as shared by the task.

It is the programmer's responsibility to insert the needed synchronizations to ensure that variables are still on the stack when the task references them. Here are two examples.

In the first example, i is specified to be shared in the task construct, and the task accesses the copy of i that is allocated on the stack of work().

Task execution may be deferred, so tasks are executed at the implicit barrier at the end of the parallel region in main() after the work() routine has already returned. So when a task references i, it accesses some undetermined value that happens to be on the stack at that time.

For correct results, the programmer needs to make sure that work() does not exit before the tasks have completed. This is done by inserting a taskwait directive after the task construct. Alternatively, i can be specified to be firstprivate in the task construct, instead of shared.


Example 5–3 Stack Data: First Example — Incorrect Version


#include <stdio.h>
#include <omp.h>
void work()
 {
   int i;

   i = 10;
   #pragma omp task shared(i)
   {
     #pragma omp critical
     printf("In Task, i = %d\n",i);
   }
 }

int main(int argc, char** argv)
 {
    omp_set_num_threads(8);
    omp_set_dynamic(0);

    #pragma omp parallel 
    {
      work();
    }
 }


Example 5–4 Stack Data: First Example — Corrected Version


#include <stdio.h>
#include <omp.h>

void work()
 {
   int i;

   i = 10;
   #pragma omp task shared(i)
   {
     #pragma omp critical
     printf("In Task, i = %d\n",i);
   }

   /* Use TASKWAIT for synchronization. */
   #pragma omp taskwait
 }

int main(int argc, char** argv)
 {
    omp_set_num_threads(8);
    omp_set_dynamic(0);

    #pragma omp parallel 
    {
      work();
    }
 }

In our second example, j in the task construct is shared with the j in the sections construct. So the task accesses the firstprivate copy of j in the sections construct, which (in some implementations, including the Sun Studio compilers) is a local variable on the stack of the outlined routine for the sections construct.

Task execution may deferred so the task is executed at the implicit barrier at the end of the sections region, after the outlined routine for the sections construct has exited. So when the task references j, it accesses some undetermined value on the stack.

For correct results, the programmer needs to make sure that the task is executed before the sections region reaches its implicit barrier. This can be done by inserting a taskwait directive after the task construct. Alternatively, j can be specified to be firstprivate in the task construct, instead of shared.


Example 5–5 Second Example — Incorrect Version


#include <stdio.h>
#include <omp.h>

int main(int argc, char** argv)
 {
    omp_set_num_threads(2);
    omp_set_dynamic(0);
    int j=100;

    #pragma omp parallel shared(j)
    {
       #pragma omp sections firstprivate(j)
       {
          #pragma omp section
          {
             #pragma omp task shared(j)
             {
               #pragma omp critical
               printf("In Task, j = %d\n",j);
             }
          }
       }
    }

    printf("After parallel, j = %d\n",j);
 }


Example 5–6 Second Example — Corrected Version


#include <stdio.h>
#include <omp.h>

int main(int argc, char** argv)
 {
    omp_set_num_threads(2);
    omp_set_dynamic(0);
    int j=100;

    #pragma omp parallel shared(j)
    {
       #pragma omp sections firstprivate(j)
       {
          #pragma omp section
          {
             #pragma omp task shared(j)
             {
               #pragma omp critical
               printf("In Task, j = %d\n",j);
             }

             /* Use TASKWAIT for synchronization. */
             #pragma omp taskwait
          }
       }
    }

    printf("After parallel, j = %d\n",j);
 }

5.5.4 Data Scoping Attributes

The OpenMP 3.0 specification version 3.0 (in section 2.9) describes how the data-sharing attributes of variables referenced in parallel, task, and worksharing regions are determined.

The data-sharing attributes of variables referenced in a construct may be one of the following: predetermined, explicitly determined, or implicitly determined. Variables with explicitly determined data-sharing attributes are those that are referenced in a given construct and are listed in a data-sharing attribute clause on the construct. Variables with implicitly determined data-sharing attributes are those that are referenced in a given construct, do not have predetermined data-sharing attributes, and are not listed in a data-sharing attribute clause on the construct.

The rules for how the data-sharing attributes of variables are implicitly determined may not always be obvious (see 5.2 Data Environment). It is therefore recommended that to avoid any surprises, the programmer explicitly scope all variables that are referenced in a task construct (using the default, shared, private, and firstprivate clauses), rather than rely on the OpenMP implicit scoping rules.