Sun Studio 12: Fortran Programming Guide

10.3 Explicit Parallelization

This section describes the source code directives recognized by f95 to explicitly indicate which loops to parallelize and what strategy to use.

The Fortran 95 compiler now fully supports the OpenMP Fortran API as the primary parallelization model. See the OpenMP API User’s Guide for additional information..

Legacy Sun-style and Cray-style parallelization directives are no longer supported by Sun Studio compilers on SPARC platforms, and are not accepted by the compilers on x86 platforms.

Explicit parallelization of a program requires prior analysis and deep understanding of the application code as well as the concepts of shared-memory parallelization.

DO loops are marked for parallelization by directives placed immediately before them. Compile with -xopenmp to enable recognition of OpenMP Fortran 95 directives and generation of parallelized DO loop code. Parallelization directives are comment lines that tell the compiler to parallelize (or not to parallelize) the DO loop that follows the directive. Directives are also called pragmas.

Take care when choosing which loops to mark for parallelization. The compiler generates threaded, parallel code for all loops marked with parallelization directives, even if there are data dependencies that will cause the loop to compute incorrect results when run in parallel.

If you do your own multithreaded coding using the libthread primitives, do not use any of the compilers’ parallelization options—the compilers cannot parallelize code that has already been parallelized with user calls to the threads library.

10.3.1 Parallelizable Loops

A loop is appropriate for explicit parallelization if:

It is a DO loop, but not a DO WHILE or Fortran 95 array syntax.
The values of array variables for each iteration of the loop do not depend on the values of array variables for any other iteration of the loop.
If the loop changes a scalar variable, that variable’s value is not used after the loop terminates. Such scalar variables are not guaranteed to have a defined value after the loop terminates, since the compiler does not automatically ensure a proper storeback for them.
For each iteration, any subprogram that is invoked inside the loop does not reference or change values of array variables for any other iteration.
The DO loop index must be an integer.

10.3.1.1 Scoping Rules: Private and Shared

A private variable or array is private to a single iteration of a loop. The value assigned to a private variable or array in one iteration is not propagated to any other iteration of the loop.

A shared variable or array is shared with all other iterations. The value assigned to a shared variable or array in an iteration is seen by other iterations of the loop.

If an explicitly parallelized loop contains shared references, then you must ensure that sharing does not cause correctness problems. The compiler does not synchronize on updates or accesses to shared variables.

If you specify a variable as private in one loop, and its only initialization is within some other loop, the value of that variable may be left undefined in the loop.

10.3.1.2 Subprogram Call in a Loop

A subprogram call in a loop (or in any subprograms called from within the called routine) may introduce data dependencies that could go unnoticed without a deep analysis of the data and control flow through the chain of calls. While it is best to parallelize outermost loops that do a significant amount of the work, these tend to be the very loops that involve subprogram calls.

Because such an interprocedural analysis is difficult and could greatly increase compilation time, automatic parallelization modes do not attempt it. With explicit parallelization, the compiler generates parallelized code for a loop marked with a PARALLEL DO or DOALL directive even if it contains calls to subprograms. It is still the programmer’s responsibility to insure that no data dependencies exist within the loop and all that the loop encloses, including called subprograms.

Multiple invocations of a routine by different threads can cause problems resulting from references to local static variables that interfere with each other. Making all the local variables in a routine automatic rather than static prevents this. Each invocation of a subprogram then has its own unique store of local variables maintained on the stack, and no two invocations will interfere with each other.

Local subprogram variables can be made automatic variables that reside on the stack either by listing them on an AUTOMATIC statement or by compiling the subprogram with the -stackvar option. However, local variables initialized in DATA statements must be rewritten to be initialized in actual assignments.

Note –

Allocating local variables to the stack can cause stack overflow. See 10.1.6 Stacks, Stack Sizes, and Parallelization about increasing the size of the stack.

10.3.1.3 Inhibitors to Explicit Parallelization

In general, the compiler parallelizes a loop if you explicitly direct it to. There are exceptions—some loops the compiler will not parallelize.

The following are the primary detectable inhibitors that might prevent explicitly parallelizing a DO loop:

The DO loop is nested inside another DO loop that is parallelized.

This exception holds for indirect nesting, too. If you explicitly parallelize a loop that includes a call to a subroutine, then even if you request the compiler to parallelize loops in that subroutine, those loops are not run in parallel at runtime.
A flow control statement allows jumping out of the DO loop.
The index variable of the loop is subject to side effects, such as being equivalenced.

By compiling with -vpara and -loopinfo, you will get diagnostic messages if the compiler detects a problem while explicitly parallelizing a loop.

The following table lists typical parallelization problems detected by the compiler:

Table 10–3 Explicit Parallelization Problems


Problem	Parallelized	Warning Message
Loop is nested inside another loop that is parallelized.	No	No
Loop is in a subroutine called within the body of a parallelized loop.	No	No
Jumping out of loop is allowed by a flow control statement.	No	Yes
Index variable of loop is subject to side effects.	Yes	No
Some variable in the loop has a loop-carried dependency.	Yes	Yes
I/O statement in the loop—usually unwise, because the order of the output is not predictable.	Yes	No

Example: Nested loops:

      ...
!$OMP PARALLEL DO
      do 900 i = 1, 1000      !  Parallelized (outer loop)
        do 200 j = 1, 1000    !  Not parallelized, no warning
            ...
200   continue
900      continue
      ...

Example: A parallelized loop in a subroutine:

 program main
      ...
!$OMP PARALLEL DO
      do 100 i = 1, 200      <-parallelized
        ...
        call calc (a, x)
        ...
100      continue
      ...
subroutine calc ( b, y )
      ...
!$OMP PARALLEL DO
      do 1 m = 1, 1000       <-not parallelized
        ...
1      continue
      return
      end

In the example, the loop within the subroutine is not parallelized because the subroutine itself is run in parallel.

Example: Jumping out of a loop:

!$omp parallel do
      do i = 1, 1000     ! <- Not parallelized, error issued
        ...
        if (a(i) .gt. min_threshold ) go to 20
        ...
      end do
20      continue
      ...

The compiler issues an error diagnostic if there is a jump outside a loop marked for parallelization.

Example: A variable in a loop has a loop-carried dependency:

demo% cat vpfn.f
      real function fn (n,x,y,z)
      real y(*),x(*),z(*)
      s = 0.0
!$omp parallel do private(i,s) shared(x,y,z)
      do  i = 1, n
          x(i) = s
          s = y(i)*z(i)
      enddo
      fn=x(10)
      return
      end
demo% f95 -c -vpara -loopinfo -openmp -O4 vpfn.f
"vpfn.f", line 5: Warning: the loop may have parallelization inhibiting reference
"vpfn.f", line 5: PARALLELIZED, user pragma used

Here the loop is parallelized but the possible loop carried dependency is diagnosed in a warning. However, be aware that not all loop dependencies can be diagnosed by the compiler.

10.3.1.4 I/O With Explicit Parallelization

You can do I/O in a loop that executes in parallel, provided that:

It does not matter that the output from different threads is interleaved (program output is nondeterministic.)
You can ensure the safety of executing the loop in parallel.

Example: I/O statement in loop

!$OMP PARALLEL DO PRIVATE(k)
      do i = 1, 10     !  Parallelized
        k = i
        call show ( k )
      end do
      end
      subroutine show( j )
      write(6,1) j
1      format(’Line number ’, i3, ’.’)
      end
demo% f95 -openmp t13.f
demo% setenv PARALLEL 4
demo% a.out

Line number 9.
Line number 4.
Line number 5.
Line number 6.
Line number 1.
Line number 2.
Line number 3.
Line number 7.
Line number 8.

However, I/O that is recursive, where an I/O statement contains a call to a function that itself does I/O, will cause a runtime error.

10.3.2 OpenMP Parallelization Directives

OpenMP is a parallel programming model for multi-processor platforms that is becoming standard programming practice for Fortran 95, C, and C++ applications. It is the preferred parallel programming model for Sun Studio compilers.

To enable OpenMP directives, compile with the -openmp option flag. Fortran 95 OpenMP directives are identified with the comment-like sentinel !$OMP followed by the directive name and subordinate clauses.

The !$OMP PARALLEL directive identifies the parallel regions in a program. The !$OMP DO directive identifies DO loops within a parallel region that are to be parallelized. These directives can be combined into a single !$OMP PARALLEL DO directive that must be placed immediately before the DO loop.

The OpenMP specification includes a number of directives for sharing and synchronizing work in a parallel region of a program, and subordinate clauses for data scoping and control.

One major difference between OpenMP and legacy Sun-style directives is that OpenMP requires explicit data scoping as either private or shared, but and automatic scoping feature is provided.

For more information, including guidelines for converting legacy programs using Sun and Cray parallelization directives, see the OpenMP API User’s Guide.