The OpenMPTM application programming interface (API) is a portable, parallel programming model for shared memory multiprocessor architectures, developed in collaboration with a number of computer vendors. Support for debugging Fortran, C++, and C OpenMP programs with dbx is based on the general multi-threaded debugging features of dbx. All of the dbx commands that operate on threads and LWPs can be used for OpenMP debugging. dbx does not support asynchronous thread control in OpenMP debugging.
This chapter is organized in to the following sections:
See the OpenMP API Users Guide for information on the directives, run-time library routines, and environment variables comprising the OpenMP Version 2.0 Application Program Interfaces, as implemented by the Sun Studio Fortran 95 and C compilers.
To better describe OpenMP debugging, it is helpful to understand how OpenMP code is transformed by the compilers. Consider the following Fortran example:
| 1 program example 2 integer i, n 3 parameter (n = 1000000) 4 real sum, a(n) 5 6 do i = 1, n 7 a(i) = i*i 8 end do 9 10 sum = 0 11 12 !$OMP PARALLEL DO DEFAULT(PRIVATE), SHARED(a, sum) 13 14 do i = 1, n 15 sum = sum + a(i) 16 end do 17 18 !$OMP END PARALLEL DO 19 20 print*, sum 21 end program example | 
The code in line 12 through line 18 is a parallel region. The f95 compiler converts this section of code to an outlined subroutine that will be called from the OpenMP runtime library. This outlined subroutine has an internally generated name, in this case _$d1A12.MAIN_. The f95 compiler then replaces the code for the parallel region with a call to the OpenMP runtime library and passes the outlined subroutine as one of its arguments. The OpenMP runtime library handles all the thread-related issues and dispatches slave threads that execute the outlined subroutine in parallel. The C compiler works in the same way.
When debugging an OpenMP program, the outlined subroutine is treated by dbx as any other function, with the exception that you cannot explicitly set a breakpoint in that function by using its internally generated name.
In addition to the usual functionality for debugging multithreaded programs, dbx allows you to do the following in an OpenMP program:
Single step into a parallel region. Because a parallel region is outlined and called from the OpenMP runtime library, a single step of execution actually involves several layers of runtime library calls that are executed by slave threads created for this purpose. When you single step into the parallel region, the first thread that reaches the breakpoint causes the program to stop. This thread might be a slave thread rather than the master thread that initiated the stepping.
For example, refer to the Fortran code inHow Compilers Transform OpenMP Code, and assume that master thread t@1 is at line 10. You single step into line 12, and slave threads t@2, t@3, and t@4 are created to execute the runtime library calls. Thread t@3 reaches the breakpoint first and causes the program execution to stop. So the single step that was initiated by thread t@1ends on thread t@3.This behavior is different from normal stepping in which you are usually on the same thread after the single step as before.
Print shared, private, and threadprivate variables. dbx can print all shared, private, and threadprivate variables. If you try to print a threadprivate variable outside of a parallel region, the master thread’s copy is printed. The whatis command does not tell you whether a variable is shared, private, or threadprivate.
When execution is stopped in parallel region, a where command shows a stack trace that contains the outlined subroutine as well as several runtime library calls. Using the Fortran example from How Compilers Transform OpenMP Code, and stopping execution at line 15, the where command produces the following stack trace.
| [t@4 l@4]: where current thread: t@4 =>[1] _$d1A12.MAIN_(), line 15 in "example.f90" [2] __mt_run_my_job_(0x45720, 0xff82ee48, 0x0, 0xff82ee58, 0x0, 0x0), at 0x16860 [3] __mt_SlaveFunction_(0x45720, 0x0, 0xff82ee48, 0x0, 0x455e0, 0x1), at 0x1aaf0 | 
The top frame on the stack is the frame of the outlined function. Even though the code is outlined, the source line number still maps back to 15. The other two frames are for runtime library routines.
When execution is stopped in a parallel region, a where command from a slave thread does not have a stack traceback to its parent thread, as shown in the above example. A where command from the master thread, however, has a full traceback:
| [t@4 l@4]: thread t@1 t@1 (l@1) stopped in _$d1A12.MAIN_ at line 15 in file "example.f90" 15 sum = sum + a(i) [t@1 l@1]: where current thread: t@1 =>[1] _$d1A12.MAIN_(), line 15 in "example.f90" [2] __mt_run_my_job_(0x41568, 0xff82ee48, 0x0, 0xff82ee58, 0x0, 0x0), at 0x16860 [3] __mt_MasterFunction_(0x1, 0x0, 0x6, 0x0, 0x0, 0x40d78), at 0x16150 [4] MAIN(), line 12 in "example.f90" | 
If the number of threads is not large, you might be able to determine how execution reached the breakpoint in a slave thread by using the threads command (see threads Command) to list all the threads, and then switch to each thread to determine which one is the master thread.
When execution is stopped in a parallel region, a dump command may print more than one copy of private variables. In the following example, the dump command prints two copies of the variable i:
| [t@1 l@1]: dump i = 1 sum = 0.0 a = ARRAY i = 1000001 | 
Two copies of variable i are printed because the outlined routine is implemented as a nested function of the hosting routine, and private variables are implemented as local variables of the outlined routine. Since a dump command prints all the variables in scope, both the i in hosting routine and the i in the outlined routine are displayed.
When you are single stepping inside of a parallel region in an OpenMP program, the execution sequence may not be the same as the source code sequence. This difference in sequence occurs because the code in the parallel region is usually transformed and rearranged by the compiler. Single stepping in OpenMP code is similar to single stepping in optimized code where the optimizer has usually moved code around.