C H A P T E R 3 - Automatic Scoping of Variables

C H A P T E R 3

Automatic Scoping of Variables

Declaring the scope attributes of variables in an OpenMP parallel region is called scoping. In general, if a variable is scoped as SHARED, all threads share a single copy of the variable. If a variable is scoped as PRIVATE, each thread has its own copy of the variable. OpenMP has a rich data environment. In addition to SHARED and PRIVATE, the scope of a variable can also be declared FIRSTPRIVATE, LASTPRIVATE, REDUCTION, or THREADPRIVATE.

OpenMP requires the user to declare the scope of each variable used in a parallel region. This is a tedious and error-prone process and many find this to be the hardest part of using OpenMP to parallelize programs.

The Sun Studio C, C++, and Fortran 95 compilers provide an automatic scoping feature. The compilers analyze the execution and synchronization pattern of a parallel region and determine automatically what the scope of a variable should be, based on a set of scoping rules.

3.1 The Autoscoping Data Scope Clause

The autoscoping data scope clause is a Sun extension to the OpenMP specification. A user can specify a variable to be autoscoped by using one of the following two clauses.

3.1.1 `__AUTO` Clause

__AUTO(list-of-variables) on Fortran 95 directives
__auto(list-of-variables) on C and C++ pragmas

The compiler will determine the scope of the variables listed within a parallel region. (Note the two underscores before AUTO and auto).

The __AUTO or __auto clause can appear on a PARALLEL, PARALLEL DO, PARALLEL SECTIONS, or on a Fortran 95 PARALLEL WORKSHARE directive.

If a variable is listed in the clause, then it cannot be specified in any other data scope clause.

3.1.2 `DEFAULT(__AUTO)` Clause

DEFAULT(__AUTO) on Fortran 95 directives
default(__auto) on C and C++ pragmas

Set the default scoping in this parallel region to be __AUTO.

The DEFAULT(__AUTO) clause can appear on a PARALLEL, PARALLEL DO, PARALLEL SECTIONS, or on a Fortran 95 PARALLEL WORKSHARE directive.

3.2 Scoping Rules

Under automatic scoping, the compiler applies the following rules to determine the scope of a variable in a parallel region.

These rules do not apply to variables scoped implicitly by the OpenMP specification, such as loop index variables of worksharing DO or FOR loops.

3.2.1 Scoping Rules For Scalar Variables

S1: If the use of the variable in the parallel region is free of data race^[1] conditions for the threads in the team executing the region, then the variable is scoped SHARED.

S2: If in each thread executing the parallel region, the variable is always written before being read by the same thread, then the variable is scoped PRIVATE. The variable is scoped as LASTPRIVATE if it can be scoped PRIVATE and is read before it is written after the parallel region, and the construct is either a PARALLEL DO or a PARALLEL SECTIONS.

S3: If the variable is used in a reduction operation that can be recognized by the compiler, then the variable is scoped REDUCTION with that particular operation type.

3.2.2 Scoping Rules for Arrays

A1: If the use of the array in the parallel region is free of data race conditions for the threads in the team executing the region, then the array is scoped as SHARED.

3.3 General Comments About Autoscoping

When autoscoping a variable that does not have implicit scope, the compiler checks the use of the variable against the rules in the given order. If a rule matches, the compiler will scope the variable according to the matching rule. If a rule does not match, the compiler tries the next rule. If the compiler is unable to find a match, the compiler gives up attempting to determine the scope of that variable and it is scoped SHARED and the binding parallel region is serialized as if an IF (.FALSE.) or if(0) clause were specified.

There are two reasons why autoscoping fails. One is that the use of the variable does not match any of the rules. The other is that the source code is too complex for the compiler to do a sufficient analysis. Function calls, complicated array subscripts, memory aliasing, and user-implemented synchronization are some typical causes. (See Section 3.5, Known Limitations of the Current Implementation.)

3.3.1 Autoscoping Rules for Fortran 95:

For Fortran, specifying the following kinds of variables to be autoscoped by an __AUTO or DEFAULT(__AUTO) directive causes the compiler to scope the variable according to the implicit scoping rules in the OpenMP specification:

A THREADPRIVATE variable

A Cray pointee.

A loop iteration variable used only in sequential loops in the lexical extent of the region or worksharing DO loops that bind to the region.

Implied DO or FORALL indices.

Variables that are used only in work-sharing constructs that bind to the region, and are specified in a data scope attribute clause for each such construct.

3.3.2 Autoscoping Rules for C/C++:

For C/C++, if a user specifies the following variables to be autoscoped by a __auto or default(__auto) pragma, the compiler will scope the variable according to the implicit scoping rules of the OpenMP Specification:

The variable is declared within the parallel construct.

The variable has THREADPRIVATE attribute.

The variable has a const-qualified type.

The variable is the loop control variable for a for loop that immediately follows a for or parallel for pragma, and the variable reference appears inside the loop.

Autoscoping in C and C++ applies only to basic data types: integer, floating point, and pointer. If a user specified a structure variable or class variable to be autoscoped, the compiler will scope the variable as shared and the enclosing parallel region will be executed by a single thread.

3.4 Checking the Results of Autoscoping

Use compiler commentary to check autoscoping results and to see if any parallel regions are serialized because autoscoping failed.

The compiler will produce an inline commentary when compiled with the -g debug option. This generated commentary can be viewed with the er_src command, as shown in CODE EXAMPLE 3-2 . (The er_src command is provided as part of the Sun Studio software; for more information, see the er_src(1) man page or the Sun Studio Performance Analyzer manual.)

A good place to start is to compile with the -xvpara option. A warning message will be printed out if autoscoping fails, as shown in CODE EXAMPLE 3-1.

CODE EXAMPLE 3-1 Compiling With `-vpara`
>`cat t.f` INTEGER X(100), Y(100), I, T C$OMP PARALLEL DO DEFAULT(__AUTO) DO I=1, 100 T = Y(I) CALL FOO(X) X(I) = T*T END DO C$OMP END PARALLEL DO END >`f95 -xopenmp -xO3 -vpara -c t.f` "t.f", line 3: Warning: parallel region is serialized because the autoscoping of following variables failed - x

CODE EXAMPLE 3-1 Compiling With -vpara

>cat t.f

      INTEGER X(100), Y(100), I, T

C$OMP PARALLEL DO DEFAULT(__AUTO)

      DO I=1, 100

         T = Y(I)

         CALL FOO(X)

         X(I) = T*T

      END DO

C$OMP END PARALLEL DO

END

>f95 -xopenmp -xO3 -vpara -c t.f

"t.f", line 3: Warning: parallel region is serialized

	because the autoscoping of following variables failed

- x

Compile with -vpara with f95, -xvpara with cc. (This option has not yet been implemented in CC.)

CODE EXAMPLE 3-2 Using Compiler Commentary
>`cat t.f` INTEGER X(100), Y(100), I, T C$OMP PARALLEL DO DEFAULT(__AUTO) DO I=1, 100 T = Y(I) X(I) = TT END DO C$OMP END PARALLEL DO END >`f95 -xopenmp -xO3 -g -c t.f` >er_src omp_t.o Source file: ./omp_t.f Object file: ./omp_t.o Load Object: ./omp_t.o 1. INTEGER X(100), Y(100), I, T <Function: MAIN_> Source OpenMP region below has tag R1 Variables autoscoped as PRIVATE in R1: t, i Variables autoscoped as SHARED in R1: x, y Private variables in R1: i, t Shared variables in R1: y, x 2. C$OMP PARALLEL DO DEFAULT(__AUTO) Source loop below has tag L1 Source loop below has tag L1 L1 parallelized by explicit user directive Discovered loop below has tag L2 L-unknown scheduled with steady-state cycle count = 3 L-unknown unrolled 4 times L-unknown has 0 loads, 0 stores, 2 prefetches, 0 FPadds, 0 FPmuls, and 0 FPdivs per iteration L-unknown has 1 int-loads, 1 int-stores, 4 alu-ops, 1 muls, 0 int-divs and 1 shifts per iteration 3. DO I=1, 100 4. T = Y(I) 5. X(I) = TT 6. END DO 7. C$OMP END PARALLEL DO 8. END

CODE EXAMPLE 3-2 Using Compiler Commentary

>cat t.f

      INTEGER X(100), Y(100), I, T

C$OMP PARALLEL DO DEFAULT(__AUTO)

      DO I=1, 100

         T = Y(I)

         X(I) = T*T

      END DO

C$OMP END PARALLEL DO

END

>f95 -xopenmp -xO3 -g -c t.f

>er_src omp_t.o

Source file: ./omp_t.f

Object file: ./omp_t.o

Load Object: ./omp_t.o

     1.       INTEGER X(100), Y(100), I, T

        <Function: MAIN_>

    Source OpenMP region below has tag R1

    Variables autoscoped as PRIVATE in R1: t, i

    Variables autoscoped as SHARED in R1: x, y

    Private variables in R1: i, t

    Shared variables in R1: y, x

     2. C$OMP PARALLEL DO DEFAULT(__AUTO)

    Source loop below has tag L1

    Source loop below has tag L1

    L1 parallelized by explicit user directive

    Discovered loop below has tag L2

    L-unknown scheduled with steady-state cycle count = 3

    L-unknown unrolled 4 times

    L-unknown has 0 loads, 0 stores, 2 prefetches, 0 FPadds, 0 FPmuls, and 0 FPdivs per iteration

    L-unknown has 1 int-loads, 1 int-stores, 4 alu-ops, 1 muls, 0 int-divs and 1 shifts per iteration

     3.       DO I=1, 100

     4.          T = Y(I)

     5.          X(I) = T*T

     6.       END DO

     7. C$OMP END PARALLEL DO

     8.       END

Next, a more complicated example to illustrate how the autoscoping rules work.

CODE EXAMPLE 3-3 A More Complicated Example
1. REAL FUNCTION FOO (N, X, Y) 2. INTEGER N, I 3. REAL X(), Y() 4. REAL W, MM, M 5. 6. W = 0.0 7. 8. C$OMP PARALLEL DEFAULT(__AUTO) 9. 10. C$OMP SINGLE 11. M = 0.0 12. C$OMP END SINGLE 13. 14. MM = 0.0 15. 16. C$OMP DO 17. DO I = 1, N 18. T = X(I) 19. Y(I) = T 20. IF (MM .GT. T) THEN 21. W = W + T 22. MM = T 23. END IF 24. END DO 25. C$OMP END DO 26. 27. C$OMP CRITICAL 28. IF ( MM .GT. M ) THEN 29. M = MM 30. END IF 31. C$OMP END CRITICAL 32. 33. C$OMP END PARALLEL 34. 35. FOO = W - M 36. 37. RETURN 38. END

CODE EXAMPLE 3-3 A More Complicated Example

 1.      REAL FUNCTION FOO (N, X, Y)

 2.      INTEGER       N, I

 3.      REAL          X(*), Y(*)

 4.      REAL          W, MM, M

5.

 6.      W = 0.0

7.

 8. C$OMP PARALLEL DEFAULT(__AUTO)

9.

10. C$OMP SINGLE

11.       M = 0.0

12. C$OMP END SINGLE

13.

14.       MM = 0.0

15.

16. C$OMP DO

17.       DO I = 1, N

18.          T = X(I)

19.          Y(I) = T

20.          IF (MM .GT. T) THEN

21.             W = W + T

22.             MM = T

23.          END IF

24.       END DO

25. C$OMP END DO

26.

27. C$OMP CRITICAL

28.       IF ( MM .GT. M ) THEN

29.          M = MM

30.       END IF

31. C$OMP END CRITICAL

32.

33. C$OMP END PARALLEL

34.

35.      FOO = W - M

36.

37.      RETURN

38.      END

The function FOO() contains a parallel region, which contains a SINGLE construct, a work-sharing DO construct and a CRITICAL construct. If we ignore all the OpenMP parallel constructs, what the code in the parallel region does is:

1. Copy the value in array X to array Y

2. Find the maximum positive value in X, and store it in M

3. Accumulate the value of some elements of X into variable W.

Let's see how the compiler uses the above rules to find the appropriate scopes for the variables in the parallel region.

The following variables are used in the parallel region, I, N, MM, T, W, M, X, and Y. The compiler will determine the following.

Scalar I is the loop index of the work-sharing DO loop. The OpenMP specification mandates that I be scoped PRIVATE.

Scalar N is only read in the parallel region and therefore will not cause data race, so it is scoped as SHARED following rule S1.

Any thread executing the parallel region will execute statement 14, which sets the value of scalar MM to 0.0. This write will cause data race, so rule S1 does not apply. The write happens before any read of MM in the same thread, so MM is scoped as PRIVATE according to rule S2.

Similarly, scalar T is scoped as PRIVATE.

Scalar W is read and then written at statement 21, so rules S1 and S2 do not apply. The addition operation is both associative and communicative, therefore, W is scoped as REDUCTION(+) according to rule S3.

Scalar M is written in statement 11 which is inside a SINGLE construct. The implicit barrier at the end of the SINGLE construct ensures that the write in statement 11 will not happen concurrently with either the read in statement 28 or the write in statement 29, and the latter two will not happen at the same time because both are inside the same CRITICAL construct. No two threads can access M at the same time. Therefore, the writes and reads of M in the parallel region do not cause a data race, and, following rule S1, M is scoped SHARED.

Array X is only read and not written in the region, so it is scoped as SHARED by rule A1.

The writes to array Y is distributed among the threads, and no two threads will write to the same elements of Y. As there is no data race, Y is scoped SHARED according to rule A1.

3.5 Known Limitations of the Current Implementation

Here are the known limitations to autoscoping in the current Sun Studio Fortran 95 compiler.

Only OpenMP directives are recognized and used in the analysis. Calls to OpenMP runtime routines are not recognized. For example, if a program uses OMP_SET_LOCK() and OMP_UNSET_LOCK() to implement a critical section, the compiler is not able to detect the existence of the critical section. Use CRITICAL and END CRITICAL directives if possible.

Only synchronizations specified by using OpenMP synchronization directives, such as BARRIER and MASTER, are recognized and used in the analysis. User-implemented synchronizations, such as busy-waiting, are not recognized.

Autoscoping is not supported when compiling with -xopenmp=noopt.

^{1 (Footnote) A data race exists when two threads can access the same shared variable at the same time with at least one thread modifying the variable. To remove a data race condition, put the accesses in a critical section or synchronize the threads.}