Analyzing Program Performance with Sun WorkShop |
Loop Analysis Tools
The Fortran and C compilers automatically parallelize loops for which they determine that it is safe and profitable to do so. LoopTool is a performance analysis tool that reads loop timing files created by these compilers. LoopTool uses a graphical user interface (GUI). LoopReport is the command-line version of LoopTool.
This chapter covers the following topics:
- Basic Concepts
- Setting Up Your Environment
- Creating a Loop Timing File
- Starting LoopTool
- Using LoopTool
- Starting LoopReport
- Compiler Hints
- How Optimization Affects Loops
Basic Concepts
LoopTool and LoopReport enable you to:
- Time all loops, whether serial or parallel.
- Produce a table of loop timings.
- Collect hints from the compiler during compilation.
LoopTool displays a graph of loop runtimes showing which loops were parallelized. You can go directly from the graphical display of loops to the source code for any loop you want, so you can edit your source code while in LoopTool.
LoopReport reports loop runtimes in an ASCII file instead of a graphical display.
There are four basic steps for using LoopTool and LoopReport:
- Setting up environment variables
- Compiling the program with the options required to create a timing file for loop analysis
- Running the program to generate a timing file
- Invoking LoopTool or LoopReport on the timing file
Note The examples in this section use the Fortran (f77 and f95) compilers. The options shown (such as-xparallel
,-Zlp
) also work for C.
Setting Up Your Environment
Before running an executable compiled with
-Zlp
, set the environment variablePARALLEL
to the number of processors on your machine.The following command makes use of
psrinfo
, a system utility. Note the backquotes:
%setenv PARALLEL `/usr/sbin/psrinfo | wc -l`
You may want to put this command in a shell startup file (such as
.cshrc
or.profile
).Creating a Loop Timing File
To create a loop timing file, you compile your program with compiler options that automatically parallelize and optimize your code (
-xparallel
and-xO4)
. You also add the-Zlp
option to compile for LoopTool or LoopReport. When you run the program compiled with these options, Sun WorkShop creates a timing file for LoopTool or LoopReport to process.The three compiler options are illustrated in this example:
%f77 -xO4 -xparallel -Zlp
source_file
Note All examples apply to FORTRAN 77, Fortran 95, and C programs.
There are a number of other useful options for looking at and parallelizing loops. TABLE 7-1 lists these options.
TABLE 7-1 Compilation Options -o
programRenames the executable to program -xexplicitpar
Parallelizes loops marked with DOALL
pragma-xloopinfo
Prints hints to stderr
Other Compilation Options
Many combinations of compiler options work for LoopTool and LoopReport.
To compile for automatic parallelization, typical compiler options are
-xparallel
and-x04
. To compile for LoopTool and LoopReport, add-Zlp
.
%f77 -x04 -xparallel -Zlp
source_fileYou can use either
-xO3
or-xO4
with-xparallel
. If you do not specify-xO3
or-xO4
but you do use-xparallel
, then the compiler uses-xO3
. TABLE 7-2 summarizes how optimization leveloptions are added for specific options.
Other compilation options include
-xexplicitpar
and-xloopinfo
.The Fortran compiler option
-xexplicitpar
is used with the pragmaDOALL
. If you insertDOALL
before a loop in your source code, you are explicitly marking that loop for parallelization. The compiler parallelizes the loop when you compile with-xexplicitpar
.The following code fragment shows how to mark a loop explicitly for parallelization.
subroutine adj(a,b,c,x,n)real*8 a(n), b(n), c(-n:0), xinteger nc$par DOALLdo 19 i = 1, n*ndo 29 k = i, n*na(i) = a(i) + x*b(k)*c(i-k)29 continue19 continuereturnendWhen you use
-Zlp
by itself,-xdepend
and-xO3
are added. The-xdepend
option instructs the compiler to perform the data dependency analysis that it needs to do to identify loops. The option-xparallel
includes-xdepend
, but-xdepend
does not imply (or trigger)-xparallel
.The
-xloopinfo
option prints hints about loops tostderr
(the UNIX standard error file, on file descriptor 2) when you compile your program. The hints include the routine names, the line number for the start of the loop, whether the loop was parallelized, and the reason it was not parallelized, if applicable.The following example redirects hints about loops in the source file
gamteb.F
to the filegamtab.loopinfo
:
%f77 -xO3 -parallel -xloopinfo -Zlp gamteb.F 2> gamteb.loopinfo
The main difference between
-Zlp
and-xloopinfo
is that in addition to providing compiler hints about loops,-Zlp
also instruments your program so that timing statistics are recorded at runtime. For this reason, also, LoopTool and LoopReport analyze only programs that have been compiled with-Zlp
.Running the Program
After compiling with
-Zlp
, run the executable. This creates the loop timing file, program.looptimes
. Both LoopTool and LoopReport process two files: the instrumented executable and the loop timing file.Starting LoopTool
You can start LoopTool by giving it the name of a program (an executable) to load:
%looptool
program&
If you start LoopTool without specifying a file, the Open File dialog box opens, which allows you to select a file to examine:
%looptool &
LoopTool reads the timing file associated with your program. The timing file contains information about loops. Typically, this file has a name of the format program.
looptimes
and is in the same directory as your program.By default, LoopTool looks in the executable's directory for a timing file. Therefore, if the timing file is there (the usual case), you do not need to specify where to look for it:
%looptool
program&
If you name a timing file on the command line, then LoopTool and LoopReport use that file.
%looptool
program program.looptimes &
If you use the command line option
-p
, LoopTool and LoopReport check for a timing file in the directory indicated by-p
:
%looptool -p
timing_file_directoryprogram &
If the environment variable
LVPATH
is set, the tools check that directory for a timing file.
%setenv LVPATH
timing_file_directory%looptool
program &Using LoopTool
The main window displays the runtimes of your program's loops in a bar chart arranged in the order that the source files were presented to the compiler.
FIGURE 7-1 shows the components of the LoopTool main window.
FIGURE 7-1 The LoopTool Main WindowOpening Files
To open executable and timing files, choose File Open in the main window.
There are two ways to specify the files you want to open:
Once you enter the executable's path, you do not need to type in the timing file, unless it is in a different directory or has a non-default name (or both).
For more information about opening files, see the Analyzing the Loops in Your Program section of the Sun WorkShop Online Help.
Creating a Report on All Loops
To open a window with detailed information on all the loops in your program:
- Choose File Create Report in the main window (see FIGURE 7-2).
The generated report is identical to that produced by LoopReport.
The Help button in the report window links to the Sun WorkShop online help section containing compiler hints.
FIGURE 7-2 The LoopReport WindowPrinting the LoopTool Graph
To print the LoopTool graph, choose File Print Graph in the main window and type the name of your chosen printer. To save the graph to a file, type a filename instead of a printer name.
For more information about printing see the Sun WorkShop online help.
Choosing an Editor
Choose File Options in the main window to open the Text Editor Options dialog box, where you can choose an editor for editing source code. The available editors are
vi
,gnuemacs
, andxemacs
.
Note vi
andxemacs
are installed with LoopTool into your install directory (usually/opt/SUNWspro/bin
) if they are not already on your system. You must providegnuemacs
yourself. In all cases, the editor you use must be in a directory in your search path in order for LoopTool to find it. For example, yourPATH
environment variable should include/usr/local
if that is wheregnuemacs
is located on your system.
For more information about choosing an editor, see "Changing Text Editors" in the Sun WorkShop online help.
Editing Source Code and Getting Hints
Clicking a loop in the main window (see FIGURE 7-1) does two things:
- It brings up a window in which you can edit your source code (see FIGURE 7-3). The available editors are
vi
,xemacs
, andgnuemacs
.
- For information on
vi
, see the vi(1) manual page.xemacs
andgnuemacs
have online help (click the Help button).- The Sun WorkShop
vi
editor has a special Version menu that allows you to make use of the Source Code Control System (SCCS) utility for sharing files. See the online help, as well as the sccs(1) manual page, for more information.- It brings up a separate window that displays one or more hints about the loop you have selected. The Help button in this window displays the Sun WorkShop online help compiler hints section. See also Compiler Hints, which explains the hints in detail.
FIGURE 7-3 shows an
xemacs
editor window with a loop selected, and a hint window with an explanation of a compiler hint.
FIGURE 7-3 The Text Editor and Hints Windows
Caution If you edit your source code, line numbers shown by LoopTool may become inconsistent with the source. You must save and recompile the edited source and then run LoopTool with the new executable, producing new loop information, for the line numbers to remain consistent.
Starting LoopReport
When you start LoopReport, you usually enter the name of your program. Type
loopreport
and the name of the program (an executable) you want to examine.
%loopreport
programYou can also start LoopReport with no file specified. However, if you invoke LoopReport without giving it the name of a program, it looks for a file named
a.out
in the current working directory.
%loopreport > a.out.loopreport
You can also direct the output into a file, or pipe it into another command:
%loopreport
program>
program.loopreport
%loopreport
program| more
Timing File
LoopReport also reads the timing file associated with your program. The timing file is created when you use the
-zlp
option, and it contains information about loops. Typically, this file has a name of the format program.looptimes
and is found in the same directory as your program.However, there are four ways to specify the location of a timing file. LoopReport chooses a timing file according to the following rules.
- If a timing file is named on the command line, LoopReport uses that file.
%loopreport
program newtimes>
program.loopreport
- If the command-line option
-p
is used, LoopReport looks in the directory named by-p
for a timing file.
%loopreport
program-p
/home/timingfiles>
program.loopreport
- If the environment variable
LVPATH
is set,LoopReport looks in that directory for a timing file.
%setenv LVPATH
/home/timingfiles%loopreport
program>
program.loopreport
- LoopReport writes the table of loop statistics to standard output,
stdout
. You can also redirect the output to a file, or pipe it into another command:
%loopreport
program>
program.loopreport
%loopreport
program| more
FIGURE 7-4 shows a sample loop report.
FIGURE 7-4 Sample Loop ReportFields in the Loop Report
The following descriptions apply equally to LoopTool's "Create Report" output and LoopReport's output.
The loop report contains the following information:
LoopID
- An arbitrary number, assigned by the compiler during compile time. This is just an internal loopID, useful for talking about loops, but not really related in any way to your program.
Line
#
- The line number of the first statement of the loop in the source file.
Par?
Par
is an abbreviation for "Parallelized by the compiler?"Y
means that this loop was marked for parallelization;N
means that the loop was not.Hints
- Number corresponding to hint text in the "Legend for compiler hints" list.
Entries
- Number of times this loop was entered from above. This is distinct from the number of loop iterations, which is the total number of times a loop executes. For example, these are two loops in Fortran.
do 10 i=1,17do 10 j=1,50...some code...10 continue- The first loop is entered once, and it iterates 17 times. The second loop is entered 17 times, and it iterates 17*50 = 850 times.
Nest
- Nesting level of the loop. If a loop is a top-level loop, its nesting level is 0. If the loop is the child of another loop, its nesting level is 1.
- For example, in this C code, the
i
loop has a nesting level of 0, thej
loop has a nesting level of 1, and thek
loop has a nesting level of 2.
for (i=0; i<17; i++)for (j=0; j<42; j++)for (k=0; k<1000; k++)do something;Wallclock
- The total amount of elapsed wall-clock time spent executing this loop for the whole program. The elapsed time for an outer loop includes the elapsed time for an inner loop. For example:
for (i=1; i<10; i++)for (j=1; j<10; j++)do something;- The time assigned to the outer loop (the
i
loop) might be 10 seconds, and the time assigned to the inner loop (thej
loop) might be 9.9 seconds.Percentage
- The percentage of total program runtime measured as wall-clock time spent executing this loop. As with wall-clock time, outer loops are credited with time spent in loops they contain.
Variables
- The names of the variables that cause a data dependency in this loop. This field only appears when the compiler hint indicates that this loop suffers from a data dependency. A data dependency occurs when parallelization of a loop can not be done safely because the values computed in one iteration of a loop are used in another. The following illustrates a data dependency:
do i = 1, Na(i) = b(i) + c(i)b(i) = 2 * a(i + 1)end do- If the example loop above is run in parallel, iteration 1 which recomputes
b(1)
based on the value ofa(2)
, may run after iteration 2 which has recomputeda(2)
. The value ofb(1)
is determined by the new value ofa(2)
rather than the original value as would happen if the loop is not parallelized.Compiler Hints
LoopTool and LoopReport present hints about the optimizations applied to a particular loop, and about why a loop might not have been parallelized. The hints are heuristics gathered by the compiler during optimization. They should be understood in that context; they are not absolute facts about the code generated for a given loop. However, the hints are often very useful indications of how you can transform your code so that the compiler can perform more aggressive optimizations, including parallelizing loops.
For some useful explanations and tips, read the sections in the Sun WorkShop Fortran User's Guide that address parallelization.
0. No hint available
None of the other hints applied to this loop. This hint does not mean that none of the other hints might apply; it means that the compiler did not infer any of those hints.
1. Loop contains procedure call
The loop could not be parallelized since it contains a procedure call that is not MT safe. If such a loop were parallelized, multiple copies of the loop might instantiate the function call simultaneously, trample on each other's use of any variables local to that function, or trample on return values, and generally invalidate the function's purpose. If you are certain that the procedure calls in this loop are MT safe, you can direct the compiler to parallelize this loop no matter what by inserting the
DOALL
pragma before the body of the loop. For example, iffoo
is an MT-safe function call, then you can force it to be parallelized by insertingc$par
DOALL
:
c$par DOALLdo 19 i = 1, n*ndo 29 k = i, n*na(i) = a(i) + x*b(k)*c(i-k)call foo()29 continue19 continueThe computer interprets the
DOALL
pragmas only when you compile with-parallel
or-explicitpar
; if you compile with-autopar
, then the compiler ignores theDOALL
pragmas.2. Compiler generated two versions of this loop
The compiler could not tell at compile time if the loop contained enough work to be profitable to parallelize. The compiler generated two versions of the loop, a serial version and a parallel version, and a runtime check that will choose at runtime which version to execute. The runtime check determines the amount of work that the loop has to do by checking the loop iteration values.
3. The variable(s) "list" cause a data dependency in this loop
A variable inside the loop is affected by the value of a variable in a previous iteration of the loop. For example:
do 99 i=1,ndo 99 j = 1,ma[i, j+1] = a[i,j] + a[i,j-1]99 continueThis is a contrived example, since for such a simple loop the optimizer would simply swap the inner and outer loops, so that the inner loop could be parallelized. But this example demonstrates the concept of data dependency, which is often referred to as "loop-carried data dependency."
The compiler can often tell you the names of the variables that cause the loop-carried data dependency. If you rearrange your program to remove (or minimize) such dependencies, then the compiler can perform more aggressive optimizations.
4. Loop was significantly transformed during optimization
The compiler performed some optimizations on this loop that might make it almost impossible to associate the generated code with the source code. For this reason, line numbers may be incorrect. Examples of optimizations that can radically alter a loop are loop distribution, loop fusion, and loop interchange (see Hint 10, Hint 11, and Hint 12).
5. Loop may or may not hold enough work to be profitably parallelized
The compiler was not able to determine at compile time whether this loop held enough work to warrant parallelizing. Often loops that are labeled with this hint may also be labeled "parallelized," meaning that the compiler generated two versions of the loop (see Hint 2), and that it will be decided at runtime whether the parallel version or the serial version should be used.
Since all the compiler hints, including the flag that indicates whether or not a loop is parallelized, are generated at compile time, there's no way to be certain that a loop labeled "parallelized" actually executes in parallel.
6. Loop was marked by user-inserted pragma,
DOALL
This loop was parallelized because the compiler was instructed to do so by the
DOALL
pragma. This hint is a useful reminder to help you easily identify those loops that you explicitly wanted to parallelize.The
DOALL
pragmas are interpreted by the compiler only when you compile with-parallel
or-explicitpar
; if you compile with-autopar
, then the compiler will ignore theDOALL
pragmas.7. Loop contains multiple exits
The loop contains a
GOTO
or some other branch out of the loop other than the natural loop end point. For this reason, it is not safe to parallelize the loop, since the compiler has no way of predicting the loop's runtime behavior.8. Loop contains I/O, or other function calls, that are not MT safe
This hint is similar to Hint 1. The difference is that this hint often focuses on I/O that is not multithread-safe, whereas Hint 1 can refer to any sort of multithread-unsafe function call.
9. Loop contains backward flow of control
The loop contains a
GOTO
or other control flow up and out of the body of the loop. That is, some statement inside the loop appears to the compiler to jump back to some previously executed portion of code. As with the case of a loop that contains multiple exits, this loop is not safe to parallelize.If you can reduce or minimize backward flows of control, the compiler will be able to perform more aggressive optimizations.
10. Loop may have been distributed
The contents of the loop may have been distributed over several iterations of the loop. That is, the compiler may have been able to rewrite the body of the loop so that it could be parallelized. However, since this rewriting takes place in the language of the internal representation of the optimizer, it's very difficult to associate the original source code with the rewritten version. For this reason, hints about a distributed loop may refer to line numbers that don't correspond to line numbers in your source code.
11. Two or more loops may have been fused
Two consecutive loops were combined into one, so the resulting larger loop contains enough work to be profitably parallelized. Again, in this case, source line numbers for the loop may be misleading.
12. Two or more loops may have been interchanged
The loop indices of an inner and an outer loop have been swapped, to move data dependencies as far away from the inner loop as possible, and to enable this nested loop to be parallelized. In the case of deeply nested loops, the interchange may have occurred with more than two loops.
How Optimization Affects Loops
As you might infer from the descriptions of the compiler hints, associating optimized code with source code can be tricky. Clearly, you would prefer to see information from the compiler presented to you in a way that relates as directly as possible to your source code. Unfortunately, the compiler optimizer "reads" your program in terms of its internal language, and although it tries to relate that to your source code, it is not always successful.
Some particular optimizations that can cause confusion are described in the following sections.
Inlining
Inlining is an optimization applied only at optimization level
-O4
and only for functions contained within one file. That is, if one file contains 17 Fortran functions, 16 of those can be inlined into the first function, and you compile at-O4
, then the source code for those 16 functions may be copied into the body of the first function. Then, when further optimizations are applied, it becomes difficult to determine which loop on which source line number was subjected to which optimization.If the compiler hints seem particularly opaque, consider compiling with
-O3
-parallel
-Zlp
, so that you can see what the compiler says about your loops before it tries to inline any of your functions.In particular, "phantom" loops--that is, loops that the compiler claims exist, but you know do not exist in your source code--could well be a symptom of inlining.
Loop Transformations: Unrolling, Jamming, Splitting, and Transposing
The compiler performs many loop optimizations that radically change the body of the loop. These include optimizations, unrolling, jamming, splitting, and transposing.
LoopTool and LoopReport attempt to provide hints that make as much sense as possible, but given the nature of the problem of associating optimized code with source code, the hints may be misleading.
Parallel Loops Nested Inside Serial Loops
If a parallel loop is nested inside a serial loop, the runtime information reported by LoopTool and LoopReport may be misleading because each loop is stipulated to use the wall-clock time of each of its loop iterations. If an inner loop is parallelized, it is assigned the wall-clock time of each iteration, although some of those iterations are running in parallel.
However, the outer loop is assigned only the runtime of its child, the parallel loop, which will be the runtime of the longest parallel instantiation of the inner loop. This double timing leads to the anomaly of the outer loop apparently consuming less time than the inner loop.
Sun Microsystems, Inc. Copyright information. All rights reserved. Feedback |
Library | Contents | Previous | Next | Index |