Sun Studio 12 Update 1: OpenMP API User's Guide

Chapter 5 Tasking

This chapter describes the OpenMP 3.0 Tasking Model.

5.1 The Tasking Model

OpenMP specification version 3.0 introduced a new feature called tasking. Tasking facilitates the parallelization of applications where units of work are generated dynamically, as in recursive structures or while loops.

In OpenMP, an explicit task is specified using the task directive. The task directive defines the code associated with the task and its data environment. The task construct can be placed anywhere in the program; whenever a thread encounters a task construct, a new task is generated.

When a thread encounters a task construct, it may choose to execute the task immediately or defer its execution until a later time. If task execution is deferred, then the task is placed in a conceptual pool of tasks that is associated with the current parallel region. The threads in the current team will take tasks out of the pool and execute them until the pool is empty. A thread that executes a task may be different from the thread that originally encountered it.

The code associated with a task construct will be executed only once. A task is tiedif the code is executed by the same thread from beginning to end. A task is untiedif the code can be executed by more than one thread, so that different threads execute different parts of the code. By default, tasks are tied, and task can be specified to be untied by using the untied clause with the task directive.

Threads are allowed to suspend execution of a task region at a task scheduling point in order to execute a different task. If the suspended task is tied, then the same thread later resumes execution of the suspended task. If the suspended task is untied, then any thread in the current team may resume the task execution.

The OpenMP specification defines the following task scheduling points for tied tasks:

As implemented in the Sun Studio compilers, the above scheduling points are also the task scheduling points for untied tasks.

In addition to explicit tasks specified using the task directive, the OpenMP specification version 3.0 introduces the notion of implicit tasks. An implicit task is a task generated by the implicit parallel region, or generated when a parallel construct is encountered during execution. The code for each implicit task is the code inside the parallel construct. Each implicit task is assigned to a different thread in the team and is tied; that is, an implicit task is always executed from beginning to end by the thread to which it is initially assigned.

All implicit tasks generated when a parallel construct is encountered are guaranteed to be complete when the master thread exits the implicit barrier at the end of the parallel region. On the other hand, all explicit tasks generated within a parallel region are guaranteed to be complete on exit from the next implicit or explicit barrier within the parallel region.

When an if clause is present on a task construct and the value of the scalar-expression evaluates to false, the thread that encounters the task must immediately execute the task. The if clause can be used to avoid the overhead of generating many finely grained tasks and placing them in the conceptual pool.

5.2 Data Environment

The task directive takes the following data attribute clauses that define the data environment of the task:

All references within a task to a variable listed in the shared clause refer to the variable with that same name known immediately prior to the task directive.

For each private and firstprivate variable, new storage is created and all references to the original variable in the lexical extent of the task construct are replaced by references to the new storage. A firstprivate variable is initialized with the value of the original variable at the moment the task is encountered.

The data-sharing attributes of variables that are not listed in data attribute clauses of a task construct, and are not predetermined according to the OpenMP rules, are implicitly determined as follows:

It follows that:

5.3 TASKWAIT Directive

Completion of a subset of all explicit tasks bound to a given parallel region may be specified through the use of the taskwait directive. The taskwait directive specifies a wait on the completion of child tasks generated since the beginning of the current (implicit or explicit) task. Note that the taskwait directive specifies a wait on the completion of direct children tasks, not all descendant tasks.

5.4 Tasking Example

The following C/C++ program illustrates how the OpenMP task and taskwait directives can be used to compute Fibonacci numbers recursively.

In the example, the parallel directive denotes a parallel region which will be executed by four threads. In the parallel construct, the single directive is used to indicate that only one of the threads will execute the print statement that calls fib(n).

The call to fib(n) generates two tasks, indicated by the task directive. One of the tasks computes fib(n-1) and the other computes fib(n-2), and the return values are added together to produce the value returned by fib(n). Each of the calls to fib(n-1) and fib(n-2) will in turn generate two tasks. Tasks will be recursively generated until the argument passed to fib() is less than 2.

The taskwait directive ensures that the two tasks generated in an invocation of fib() are completed (that is. the tasks compute i and j) before that invocation of fib() returns.

Note that although only one thread executes the single directive and hence the call to fib(n), all four threads will participate in executing the tasks generated.

The example is compiled using the Sun Studio 12 Update 1 C++ compiler.


Example 5–1 Tasking Example: Computing Fibonacci Numbers


#include <stdio.h>
#include <omp.h>
int fib(int n)
{
  int i, j;
  if (n<2)
    return n;
  else
    {
       #pragma omp task shared(i) firstprivate(n)
       i=fib(n-1);

       #pragma omp task shared(j) firstprivate(n)
       j=fib(n-2);

       #pragma omp taskwait
       return i+j;
    }
}

int main()
{
  int n = 10;

  omp_set_dynamic(0);
  omp_set_num_threads(4);

  #pragma omp parallel shared(n)
  {
    #pragma omp single
    printf ("fib(%d) = %d\n", n, fib(n));
  }
}


% CC -xopenmp -xO3 task_example.cc
% a.out
fib(10) = 55

5.5 Programming Considerations

Tasking introduces a layer of complexity to an OpenMP program. The programmer needs to pay special attention to how a program with tasks works. Here are some programming issues to consider.

5.5.1 THREADPRIVATE and Thread-Specific Information

When a thread encounters a task scheduling point, the implementation may choose to suspend the current task and schedule the thread to work on another task. This implies that the value of a threadprivate variable, or other thread-specific information such as the thread number, may change across a task scheduling point.

If the suspended task is tied, then the thread that resumes executing the task will be the same thread that suspended it. Therefore, the thread number will remain the same after the task is resumed. However, the value of a threadprivate variable may change because the thread may have been scheduled to work on another task that modified the threadprivate variable before resuming the suspended task.

If the suspended task is untied, then the thread that resumes executing the task may be different from the thread that suspended it. Therefore, both the thread number and the value of a threadprivate variable before and after the task scheduling point may be different.

5.5.2 Locks

OpenMP 3.0 specifies that locks are no longer owned by threads, but by tasks. Once a lock is acquired, the current task owns it, and the same task must release it before task completion.

The critical construct, on the other hand, remains as a thread-based mutual exclusion mechanism.

The change in lock ownership requires extra care when using locks. The following program (it appears as Example A.43.1c in the OpenMP Specification version 3.0) is conforming in OpenMP 2.5 because the thread that releases the lock lck in the parallel region is the same thread that acquired the lock in the sequential part of the program (the master thread of a parallel region and the initial thread are the same). However, it is not conforming in OpenMP 3.0, because the task region that releases the lock lck is different from the task region that acquires the lock.


Example 5–2 Example Using Locks: Non-Conformingn in OpenMP 3.0


#include <stdlib.h>
#include <stdio.h>
#include <omp.h>

int main()
{
  int x;
  omp_lock_t lck;

  omp_init_lock (&lck);
  omp_set_lock (&lck);
  x = 0;

  #pragma omp parallel shared (x)
  {
    #pragma omp master
    {
      x = x + 1;
      omp_unset_lock (&lck);
    }
  }
  omp_destroy_lock (&lck);
}

5.5.3 References to Stack Data

A task is likely to have references to data on the stack of the routine where the task construct appears. Since the execution of a task may be deferred until the next implicit or explicit barrier, it is possible that a given task will execute after the stack of the routine where it appears has already been popped and the stack data overwritten, thereby destroying the stack data listed as shared by the task.

It is the programmer's responsibility to insert the needed synchronizations to ensure that variables are still on the stack when the task references them. Here are two examples.

In the first example, i is specified to be shared in the task construct, and the task accesses the copy of i that is allocated on the stack of work().

Task execution may be deferred, so tasks are executed at the implicit barrier at the end of the parallel region in main() after the work() routine has already returned. So when a task references i, it accesses some undetermined value that happens to be on the stack at that time.

For correct results, the programmer needs to make sure that work() does not exit before the tasks have completed. This is done by inserting a taskwait directive after the task construct. Alternatively, i can be specified to be firstprivate in the task construct, instead of shared.


Example 5–3 Stack Data: First Example — Incorrect Version


#include <stdio.h>
#include <omp.h>
void work()
 {
   int i;

   i = 10;
   #pragma omp task shared(i)
   {
     #pragma omp critical
     printf("In Task, i = %d\n",i);
   }
 }

int main(int argc, char** argv)
 {
    omp_set_num_threads(8);
    omp_set_dynamic(0);

    #pragma omp parallel 
    {
      work();
    }
 }


Example 5–4 Stack Data: First Example — Corrected Version


#include <stdio.h>
#include <omp.h>

void work()
 {
   int i;

   i = 10;
   #pragma omp task shared(i)
   {
     #pragma omp critical
     printf("In Task, i = %d\n",i);
   }

   /* Use TASKWAIT for synchronization. */
   #pragma omp taskwait
 }

int main(int argc, char** argv)
 {
    omp_set_num_threads(8);
    omp_set_dynamic(0);

    #pragma omp parallel 
    {
      work();
    }
 }

In our second example, j in the task construct is shared with the j in the sections construct. So the task accesses the firstprivate copy of j in the sections construct, which (in some implementations, including the Sun Studio compilers) is a local variable on the stack of the outlined routine for the sections construct.

Task execution may deferred so the task is executed at the implicit barrier at the end of the sections region, after the outlined routine for the sections construct has exited. So when the task references j, it accesses some undetermined value on the stack.

For correct results, the programmer needs to make sure that the task is executed before the sections region reaches its implicit barrier. This can be done by inserting a taskwait directive after the task construct. Alternatively, j can be specified to be firstprivate in the task construct, instead of shared.


Example 5–5 Second Example — Incorrect Version


#include <stdio.h>
#include <omp.h>

int main(int argc, char** argv)
 {
    omp_set_num_threads(2);
    omp_set_dynamic(0);
    int j=100;

    #pragma omp parallel shared(j)
    {
       #pragma omp sections firstprivate(j)
       {
          #pragma omp section
          {
             #pragma omp task shared(j)
             {
               #pragma omp critical
               printf("In Task, j = %d\n",j);
             }
          }
       }
    }

    printf("After parallel, j = %d\n",j);
 }


Example 5–6 Second Example — Corrected Version


#include <stdio.h>
#include <omp.h>

int main(int argc, char** argv)
 {
    omp_set_num_threads(2);
    omp_set_dynamic(0);
    int j=100;

    #pragma omp parallel shared(j)
    {
       #pragma omp sections firstprivate(j)
       {
          #pragma omp section
          {
             #pragma omp task shared(j)
             {
               #pragma omp critical
               printf("In Task, j = %d\n",j);
             }

             /* Use TASKWAIT for synchronization. */
             #pragma omp taskwait
          }
       }
    }

    printf("After parallel, j = %d\n",j);
 }

5.5.4 Data Scoping Attributes

The OpenMP 3.0 specification version 3.0 (in section 2.9) describes how the data-sharing attributes of variables referenced in parallel, task, and worksharing regions are determined.

The data-sharing attributes of variables referenced in a construct may be one of the following: predetermined, explicitly determined, or implicitly determined. Variables with explicitly determined data-sharing attributes are those that are referenced in a given construct and are listed in a data-sharing attribute clause on the construct. Variables with implicitly determined data-sharing attributes are those that are referenced in a given construct, do not have predetermined data-sharing attributes, and are not listed in a data-sharing attribute clause on the construct.

The rules for how the data-sharing attributes of variables are implicitly determined may not always be obvious (see 5.2 Data Environment). It is therefore recommended that to avoid any surprises, the programmer explicitly scope all variables that are referenced in a task construct (using the default, shared, private, and firstprivate clauses), rather than rely on the OpenMP implicit scoping rules.