DO loops that have no cross-iteration data dependencies are automatically parallelized by -autopar or -parallel. The general criteria for automatic parallelization are:
DO loops are parallelized, but not DO WHILE or Fortran 90 array operations.
The values of array variables for each iteration of the loop must not depend on the values of array variables for any other iteration of the loop.
Calculations within the loop must not conditionally change any pure scalar variable that is referenced after the loop terminates.
Calculations within the loop must not change a scalar variable across iterations. This is called a loop-carried dependency.
The f77 compiler may automatically eliminate a reference that appears to create a dependency transforming the compiled code. One of the many such transformations makes use of private versions of some of the arrays. Typically, the compiler does this if it can determine that such arrays are used in the original loops only as temporary storage.
Example: Using -autopar, with dependencies eliminated by private arrays:
parameter (n=1000) real a(n), b(n), c(n,n) do i = 1, 1000 <--Parallelized do k = 1, n a(k) = b(k) + 2.0 end do do j = 1, n c(i,j) = a(j) + 2.3 end do end do end
In the preceding example, the outer loop is parallelized and run on independent processors. Although the inner loop references to array a(*) appear to result in a data dependency, the compiler generates temporary private copies of the array to make the outer loop iterations independent.
Under automatic parallelization, the compilers do not parallelize a loop if:
The DO loop is nested inside another DO loop that is parallelized.
Flow control allows jumping out of the DO loop.
A user-level subprogram is invoked inside the loop.
An I/O statement is in the loop.
Calculations within the loop change an aliased scalar variable.
On multiprocessor systems, it is most effective to parallelize the outermost loop in a loop nest, rather than the innermost. Because parallel processing typically involves relatively large loop overhead, parallelizing the outermost loop minimizes the overhead and maximizes the work done for each processor. Under automatic parallelization, the compilers start their loop analysis from the outermost loop in a nest and work inward until a parallelizable loop is found. Once a loop within the nest is parallelized, loops contained within the parallel loop are passed over.