Analyzing Program Performance With Sun WorkShop

Compiler Hints

LoopTool and LoopReport present hints about the optimizations applied to a particular loop, and about why a loop might not have been parallelized. The hints are heuristics gathered by the compiler during optimization. They should be understood in that context; they are not absolute facts about the code generated for a given loop. However, the hints are often very useful indications of how you can transform your code so that the compiler can perform more aggressive optimizations, including parallelizing loops.

For some useful explanations and tips, read the sections in the Sun WorkShop Fortran User's Guide that address parallelization.

Table 3-2 lists the hints about optimizations applied to loops.

Table 3-2 Loop Optimization Hints


Hint #	Hint Definition
0	No hint available
1	Loop contains procedure call
2	Compiler generated two versions of this loop
3	The variable(s) "`list`" cause a data dependency in this loop
4	Loop was significantly transformed during optimization
5	Loop may or may not hold enough work to be profitably parallelized
6	Loop was marked by user-inserted pragma, `DOALL`
7	Loop contains multiple exits
8	Loop contains I/O, or other function calls, that are not MT safe
9	Loop contains backward flow of control
10	Loop may have been distributed
11	Two or more loops may have been fused
12	Two or more loops may have been interchanged

0. No Hint Available

None of the other hints applied to this loop. This hint does not mean that none of the other hints might apply; it means that the compiler did not infer any of those hints.

1. Loop contains procedure call

The loop could not be parallelized since it contains a procedure call that is not MT safe. If such a loop were parallelized, multiple copies of the loop might instantiate the function call simultaneously, trample on each other's use of any variables local to that function, or trample on return values, and generally invalidate the function's purpose. If you are certain that the procedure calls in this loop are MT safe, you can direct the compiler to parallelize this loop no matter what by inserting the DOALL pragma before the body of the loop. For example, if foo is an MT-safe function call, then you can force it to be parallelized by inserting c$par DOALL:

c$par DOALL
	do 19 i = 1, n*n
			do 29 k = i, n*n
				a(i) = a(i) + x*b(k)*c(i-k)
				call foo()
29			continue
19	continue

The computer interprets the DOALL pragmas only when you compile with -parallel or -explicitpar; if you compile with -autopar, then the compiler ignores the DOALL pragmas.

2. Compiler generated two versions of this loop

The compiler could not tell at compile time if the loop contained enough work to be profitable to parallelize. The compiler generated two versions of the loop, a serial version and a parallel version, and a runtime check that will choose at runtime which version to execute. The runtime check determines the amount of work that the loop has to do by checking the loop iteration values.

3. The variable(s) "`list`" cause a data dependency in this loop

A variable inside the loop is affected by the value of a variable in a previous iteration of the loop. For example:

do 99 i=1,n
	do 99 j = 1,m
		a[i, j+1] = a[i,j] + a[i,j-1]
99 continue

This is a contrived example, since for such a simple loop the optimizer would simply swap the inner and outer loops, so that the inner loop could be parallelized. But this example demonstrates the concept of data dependency, often referred to as, "loop-carried data dependency."

The compiler can often tell you the names of the variables that cause the loop-carried data dependency. If you rearrange your program to remove (or minimize) such dependencies, then the compiler can perform more aggressive optimizations.

4. Loop was significantly transformed during optimization

The compiler performed some optimizations on this loop that might make it almost impossible to associate the generated code with the source code. For this reason, line numbers may be incorrect. Examples of optimizations that can radically alter a loop are loop distribution, loop fusion, and loop interchange (see Hint 10, Hint 11, and Hint 12).

5. Loop may or may not hold enough work to be profitably parallelized

The compiler was not able to determine at compile time whether this loop held enough work to warrant parallelizing. Often loops that are labeled with this hint may also be labeled "parallelized," meaning that the compiler generated two versions of the loop (see Hint 2), and that it will be decided at runtime whether the parallel version or the serial version should be used.

Since all the compiler hints, including the flag that indicates whether or not a loop is parallelized, are generated at compile time, there's no way to be certain that a loop labeled "parallelized" actually executes in parallel.

6. Loop was marked by user-inserted pragma, `DOALL`

This loop was parallelized because the compiler was instructed to do so by the DOALL pragma. This hint is a useful reminder to help you easily identify those loops that you explicitly wanted to parallelize.

The DOALL pragmas are interpreted by the compiler only when you compile with -parallel or -explicitpar; if you compile with -autopar, then the compiler will ignore the DOALL pragmas.

7. Loop contains multiple exits

The loop contains a GOTO or some other branch out of the loop other than the natural loop end point. For this reason, it is not safe to parallelize the loop, since the compiler has no way of predicting the loop's runtime behavior.

8. Loop contains I/O, or other function calls, that are not MT safe

This hint is similar to Hint 1. The difference is that this hint often focuses on I/O that is not multithread-safe, whereas Hint 1 can refer to any sort of multithread-unsafe function call.

9. Loop contains backward flow of control

The loop contains a GOTO or other control flow up and out of the body of the loop. That is, some statement inside the loop appears to the compiler to jump back to some previously executed portion of code. As with the case of a loop that contains multiple exits, this loop is not safe to parallelize.

If you can reduce or minimize backward flows of control, the compiler will be able to perform more aggressive optimizations.

10. Loop may have been distributed

The contents of the loop may have been distributed over several iterations of the loop. That is, the compiler may have been able to rewrite the body of the loop so that it could be parallelized. However, since this rewriting takes place in the language of the internal representation of the optimizer, it's very difficult to associate the original source code with the rewritten version. For this reason, hints about a distributed loop may refer to line numbers that don't correspond to line numbers in your source code.

11. Two or more loops may have been fused

Two consecutive loops were combined into one, so the resulting larger loop contains enough work to be profitably parallelized. Again, in this case, source line numbers for the loop may be misleading.

12. Two or more loops may have been interchanged

The loop indices of an inner and an outer loop have been swapped, to move data dependencies as far away from the inner loop as possible, and to enable this nested loop to be parallelized. In the case of deeply nested loops, the interchange may have occurred with more than two loops.