Fortran Programming Guide

Loop Parallelization

The compiler's dependency analysis transforms a DO loop into a parallelizable task. The compiler may restructure the loop to split out unparallelizable sections that will run serially. It then distributes the work evenly over the available processors. Each processor executes a different chunk of iterations.

For example, with four CPUs and a parallelized loop with 1000 iterations:

Processor 1 executing iterations 

through 

250 

Processor 2 executing iterations 

251 

through 

500 

Processor 3 executing iterations 

501 

through 

750 

Processor 4 executing iterations 

751 

through 

1000 

Only loops that do not depend on the order in which the computations are performed can be successfully parallelized. The compiler's dependency analysis rejects loops with inherent data dependencies. If it cannot fully determine the data flow in a loop, the compiler acts conservatively and does not parallelize. Also, it may choose not to parallelize a loop if it determines the performance gain does not justify the overhead.

Note that the compiler always chooses to parallelize loops using a chunk distribution--simply dividing the work in the loop into equal blocks of iterations. Other distribution schemes may be specified using explicit parallelization directives described later in this chapter.