C H A P T E R  3

Automatic Scoping in Fortran

Declaring the scope attributes of variables in an OpenMP parallel region is called scoping. In general, if a variable is scoped as SHARED, all threads share a single copy of the variable. If a variable is scoped as PRIVATE, each thread has its own copy of the variable. OpenMP has a rich data environment. In addition to SHARED and PRIVATE, the scope of a variable can also be declared FIRSTPRIVATE, LASTPRIVATE, REDUCTION, or THREADPRIVATE.

OpenMP requires the user to declare the scope of each variable used in a parallel region. This is a tedious and error-prone process and many find this the hardest part of using OpenMP to parallelize programs.

The Sun Studio 9 release of the Fortran 95 compiler, f95, provides an automatic scoping feature. The compiler analyzes the execution and synchronization pattern of a parallel region and determines what the scope of a variable should be, based on a set of scoping rules.


3.1 The Autoscoping Data Scope Clause

The autoscoping data scope clause is a Sun extension to the Fortran OpenMP specification. A user can specify a variable to be autoscoped by using one of the following two clauses.

3.1.1 __AUTO Clause

__AUTO(list-of-variables)

The compiler will determine the scope of the variables listed within a parallel region. (Note the two underscores before AUTO).

The __AUTO clause can appear on a PARALLEL, PARALLEL DO, PARALLEL SECTIONS, or PARALLEL WORKSHARE directive.

If a variable is listed in the __AUTO clause, then it cannot be specified in any other data scope clause.

3.1.2 DEFAULT(__AUTO) Clause

Set the default scoping in this parallel region to be __AUTO.

The DEFAULT(__AUTO) clause can appear on a PARALLEL, PARALLEL DO, PARALLEL SECTIONS, or PARALLEL WORKSHARE directive.


3.2 Scoping Rules

Under automatic scoping, the compiler applies the following rules to determine the scope of a variable in a parallel region.

These rules do not apply to variables scoped implicitly by the OpenMP Specification, such as loop index variables of worksharing DO loops.

3.2.1 Scoping Rules For Scalar Variables

3.2.2 Scoping Rules for Arrays


3.3 General Comments About Autoscoping

If a user specifies the following variables to be autoscoped by __AUTO(list-of-variables) or DEFAULT(__AUTO), the compiler will scope the variable according to the implicit scoping rules in the OpenMP Specification.

When autoscoping a variable that does not have implicit scope, the compiler checks the use of the variable against the rules above, in the order shown. If a rule matches, the compiler will scope the variable according to the matching rule. If a rule does not match, the compiler tries the next rule. If the compiler is unable to find a match, autoscoping fails for that variable.

When autoscoping of a variable fails, the variable is scoped as SHARED, and the binding parallel region will be serialized as if an IF (.FALSE.) clause were specified.

There are two reasons why autoscoping fails. One is that the use of the variable does not match any of the rules. The other is that the source code is too complex for the compiler to do a sufficient analysis. Function calls, complicated array subscripts, memory aliasing, and user-implemented synchronization are some typical causes. (See Section 3.5, Known Limitations of the Current Implementation.)


3.4 Checking the Results of Autoscoping

Use compiler commentary to check autoscoping results and to see if any parallel regions are serialized because autoscoping failed.

The compiler will produce an inline commentary when compiled with the -g debug option. This generated commentary can be viewed with the er_src command, as shown in CODE EXAMPLE 3-2. (The er_src command is provided as part of the Sun Studio software; for more information, see the er_src(1) man page or the Sun Studio Performance Analyzer manual.)

A good place to start is to compile with the -vpara option. A warning message will be printed out if autoscoping fails, as shown in CODE EXAMPLE 3-1.


CODE EXAMPLE 3-1 Compiling With -vpara
>cat t.f
      INTEGER X(100), Y(100), I, T
C$OMP PARALLEL DO DEFAULT(__AUTO) 
      DO I=1, 100
         T = Y(I)
         CALL FOO(X)
         X(I) = T*T
      END DO
C$OMP END PARALLEL DO
      END
>f95 -xopenmp -xO3 -vpara -c t.f
"t.f", line 3: Warning: parallel region is serialized
	because the autoscoping of following variables failed 
	- x


CODE EXAMPLE 3-2 Using Compiler Commentary
>cat t.f
      INTEGER X(100), Y(100), I, T
C$OMP PARALLEL DO DEFAULT(__AUTO) 
      DO I=1, 100
         T = Y(I)
         X(I) = T*T
      END DO
C$OMP END PARALLEL DO
      END
 
>f95 -xopenmp -xO3 -g -c t.f
>er_src t.o
Source file: ./t.f
Object file: ./t.o
Load Object: ./t.o
   
     1.       INTEGER X(100), Y(100), I, T
     2. 
   
Private variables in OpenMP construct below: t,i
Shared variables in OpenMP construct below: y,x
Variables autoscoped as PRIVATE in OpenMP construct below: 
	i, t
Variables autoscoped as SHARED in OpenMP construct below: 
	y, x
     3. C$OMP PARALLEL DO DEFAULT(__AUTO) 
   
Loop below parallelized by explicit user directive
     4.       DO I=1, 100
   
Loop below scheduled with steady-state cycle count = 3
Loop below unrolled 2 times
Loop below has 1 loads, 1 stores, 0 prefetches, 0 FPadds, 0 FPmuls, and 0 FPdivs per iteration
     5.          T = Y(I)
     6.          X(I) = T*T
     7.       END DO
     8. C$OMP END PARALLEL DO
     9. 
    10.       END
 

Next, a more complicated example to illustrate how the autoscoping rules work.


CODE EXAMPLE 3-3 A More Complicated Example
 1.      REAL FUNCTION FOO (N, X, Y)
 2.      INTEGER       N, I
 3.      REAL          X(*), Y(*)
 4.      REAL          W, MM, M
 5.
 6.      W = 0.0
 7.
 8. C$OMP PARALLEL DEFAULT(__AUTO) 
 9.
10. C$OMP SINGLE
11.       M = 0.0
12. C$OMP END SINGLE
13. 
14.       MM = 0.0
15.
16. C$OMP DO
17.       DO I = 1, N
18.          T = X(I)
19.          Y(I) = T
20.          IF (MM .GT. T) THEN
21.             W = W + T
22.             MM = T
23.          END IF 
24.       END DO
25. C$OMP END DO
26.
27. C$OMP CRITICAL
28.       IF ( MM .GT. M ) THEN
29.          M = MM
30.       END IF
31. C$OMP END CRITICAL
32.
33. C$OMP END PARALLEL
34.
35.      FOO = W - M
36.
37.      RETURN
38.      END

The function FOO() contains a parallel region, which contains a SINGLE construct, a work-sharing DO construct and a CRITICAL construct. If we ignore all the OpenMP parallel constructs, what the code in the parallel region does is:

1. Copy the value in array X to array Y

2. Find the maximum positive value in X, and store it in M

3. Accumulate the value of some elements of X into variable W.

Let's see how the compiler uses the above rules to find the appropriate scopes for the variables in the parallel region.

The following variables are used in the parallel region, I, N, MM, T, W, M, X, and Y. The compiler will determine the following.


3.5 Known Limitations of the Current Implementation

Here are the known limitations to autoscoping in the Sun Studio 9 Fortran 95 compiler.


1 (Footnote) A data race exists when two threads can access the same shared variable at the same time with at least one thread modifying the variable. To remove a data race condition, put the accesses in a critical section or synchronize the threads.