Fortran Programming Guide

Debugging Parallelized Programs

Compiling with the -g option cancels any of the parallelization options -autopar, -explicitpar, and -parallel, as well as -reduction and -depend. Some alternative ways to debug parallelized code are suggested in the following section.

Debugging Without `dbx`

Debugging parallelized programs requires some cleverness. The following schemes suggest ways to approach the problem:

Turn off parallelization.

You can do one of the following:
- Turn off the parallelization options--Verify that the program works correctly by compiling with -O3 or -O4, but without any parallelization.
- Set the CPUs to one--run the program with the environment variable PARALLEL=1.
  
  If the problem disappears, then you know it was due to parallelization.
  
  Check also for out of bounds array references by compiling with -C.
  
  Problems using -autopar may indicate that the compiler is parallelizing something it should not.

Turn off -reduction.

If you are using the -reduction option, summation reduction may be occurring and yielding slightly different answers. Try running without this option.

Reduce the number of compile options.

Compile with just -parallel -O3 and check the results.

Use fsplit or f90split.

If you have a lot of subroutines in your program, use fsplit(1) to break them into separate files. (Use f90split(1) on Fortran 90 source codes.) Then compile some files with and without -parallel, and use f77 or f90 to link the .o files. You must specify -parallel on this link step as well. (See Fortran User's Guide section on consistent compiling and linking.)

Execute the binary and verify results.

Repeat this process until the problem is narrowed down to one subroutine.

You can proceed using a dummy subroutine or explicit parallelization to track down the loop that causes the problem.

Use -loopinfo.

Check which loops are being parallelized and which loops are not.

Use a dummy subroutine.

Create a dummy subroutine or function that does nothing. Put calls to this subroutine in a few of the loops that are being parallelized. Recompile and execute. Use -loopinfo to see which loops are being parallelized.

Continue this process until you start getting the correct results.

Then remove the calls from the other loops, compile, and execute to verify that you are getting the correct results.

Use explicit parallelization.

Add the C$PAR DOALL directive to a couple of the loops that are being parallelized. Compile with -explicitpar, then execute and verify the results. Use -loopinfo to see which loops are being parallelized. This method permits the addition of I/O statements to the parallelized loop.

Repeat this process until you find the loop that causes the wrong results.

Note -
If you need -explicitpar only (without -autopar), do not compile with -explicitpar and -depend. This method is the same as compiling with -parallel, which, of course, includes -autopar.

Run loops backward serially.

Replace DO I=1,N with DO I=N,1,-1. Different results point to data dependencies.

Avoid using the loop index. It is safer to do so in the loop body, especially if the index is used as an argument in a call.

Replace:
    DO I=1,N                               
      ...
      CALL SNUBBER(I)
      ...
    ENDDO

With:
      DO I1=1,N            
      I=I1
      ...
      CALL SNUBBER(I)
      ...
    ENDDO

Using `dbx`

To use dbx on a parallel loop, temporarily rewrite the program as follows:

Isolate the body of the loop in a file and subroutine of its own.
In the original routine, replace loop body with a call to the new subroutine.
Compile the new subroutine with -g and no parallelization options.

Compile the changed original routine with parallelization and no -g.

Example: Manually transform a loop to allow using dbx in parallel:

Original code:
demo% cat loop.f
C$PAR DOALL
      DO i = 1,10
            WRITE(0,*) 'Iteration ', i
      END DO
      END
Split into two parts: caller loop and loop body as a subroutine
demo% cat loop1.f
C$PAR DOALL
      DO i = 1,10
            k = i
            CALL loop_body ( k )
      END DO
      END

demo% cat loop2.f
      SUBROUTINE loop_body ( k )
      WRITE(0,*) 'Iteration ', k 
      RETURN
            END
Compile caller loop with parallelization but no debugging
demo% f77 -O3 -c -explicitpar loop1.f
Compile the subprogram with debugging but not parallelized
demo% f77 -c -g loop2.f
Link together both parts into a.out
demo% f77 loop1.o loop2.o -explicitpar
Run a.out under dbx and put breakpoint into loop body subroutine
demo% dbx a.out          ¨ Various 
dbx messages not shown
(dbx) stop in loop_body
(2) stop in loop_body
(dbx) run
Running: a.out
(process id 28163)
dbx stops at breakpoint
t@1 (l@1) stopped in loop_body at line 2 in file  
    "loop2.f"
    2           write(0,*) 'Iteration ', k
Now show value of k
(dbx) print k
k = 1                  ¨ Various values other than 1  are possible
(dbx)

Debugging Parallelized Programs

Debugging Without dbx

Using dbx

Debugging Without `dbx`

Using `dbx`