Debugging parallelized programs requires some cleverness. The following schemes suggest ways to approach the problem:
Turn off parallelization.
You can do one of the following:
Turn off the parallelization options--Verify that the program works correctly by compiling with -O3 or -O4, but without any parallelization.
Set the CPUs to one--run the program with the environment variable PARALLEL=1.
If the problem disappears, then you know it was due to parallelization.
Check also for out of bounds array references by compiling with -C.
Problems using -autopar may indicate that the compiler is parallelizing something it should not.
Turn off -reduction.
If you are using the -reduction option, summation reduction may be occurring and yielding slightly different answers. Try running without this option.
Reduce the number of compile options.
Compile with just -parallel -O3 and check the results.
Use fsplit or f90split.
If you have a lot of subroutines in your program, use fsplit(1) to break them into separate files. (Use f90split(1) on Fortran 90 source codes.) Then compile some files with and without -parallel, and use f77 or f90 to link the .o files. You must specify -parallel on this link step as well. (See Fortran User's Guide section on consistent compiling and linking.)
Execute the binary and verify results.
Repeat this process until the problem is narrowed down to one subroutine.
You can proceed using a dummy subroutine or explicit parallelization to track down the loop that causes the problem.
Use -loopinfo.
Check which loops are being parallelized and which loops are not.
Use a dummy subroutine.
Create a dummy subroutine or function that does nothing. Put calls to this subroutine in a few of the loops that are being parallelized. Recompile and execute. Use -loopinfo to see which loops are being parallelized.
Continue this process until you start getting the correct results.
Then remove the calls from the other loops, compile, and execute to verify that you are getting the correct results.
Use explicit parallelization.
Add the C$PAR DOALL directive to a couple of the loops that are being parallelized. Compile with -explicitpar, then execute and verify the results. Use -loopinfo to see which loops are being parallelized. This method permits the addition of I/O statements to the parallelized loop.
Repeat this process until you find the loop that causes the wrong results.
If you need -explicitpar only (without -autopar), do not compile with -explicitpar and -depend. This method is the same as compiling with -parallel, which, of course, includes -autopar.
Run loops backward serially.
Replace DO I=1,N with DO I=N,1,-1. Different results point to data dependencies.