Sun Studio 12 Update 1: Debugging a Program With dbx

Chapter 13 Debugging OpenMP Programs

The OpenMPTM application programming interface (API) is a portable, parallel programming model for shared memory multiprocessor architectures, developed in collaboration with a number of computer vendors. Support for debugging Fortran, C++, and C OpenMP programs with dbx is based on the general multi-threaded debugging features of dbx. All of the dbx commands that operate on threads and LWPs can be used for OpenMP debugging. dbx does not support asynchronous thread control in OpenMP debugging.

This chapter is organized in to the following sections:

See the OpenMP API Users Guide for information on the directives, run-time library routines, and environment variables comprising the OpenMP Version 2.0 Application Program Interfaces, as implemented by the Sun Studio Fortran 95 and C compilers.

How Compilers Transform OpenMP Code

To better describe OpenMP debugging, it is helpful to understand how OpenMP code is transformed by the compilers. Consider the following Fortran example:


1    program example
2        integer i, n
3        parameter (n = 1000000)
4        real sum, a(n)
5    
6        do i = 1, n
7        a(i) = i*i
8        end do
9    
10        sum = 0
11    
12    !$OMP PARALLEL DO DEFAULT(PRIVATE), SHARED(a, sum)
13    
14        do i = 1, n
15        sum = sum + a(i)
16        end do
17    
18    !$OMP END PARALLEL DO
19    
20        print*, sum
21        end program example

The code in line 12 through line 18 is a parallel region. The f95 compiler converts this section of code to an outlined subroutine that will be called from the OpenMP runtime library. This outlined subroutine has an internally generated name, in this case _$d1A12.MAIN_. The f95 compiler then replaces the code for the parallel region with a call to the OpenMP runtime library and passes the outlined subroutine as one of its arguments. The OpenMP runtime library handles all the thread-related issues and dispatches slave threads that execute the outlined subroutine in parallel. The C compiler works in the same way.

When debugging an OpenMP program, the outlined subroutine is treated by dbx as any other function, with the exception that you cannot explicitly set a breakpoint in that function by using its internally generated name.

dbx Functionality Available for OpenMP Code

In addition to the usual functionality for debugging multithreaded programs, dbx allows you to do the following in an OpenMP program:

Using Stack Traces With OpenMP Code

When execution is stopped in parallel region, a where command shows a stack trace that contains the outlined subroutine as well as several runtime library calls. Using the Fortran example from How Compilers Transform OpenMP Code, and stopping execution at line 15, the where command produces the following stack trace.


[t@4 l@4]: where
current thread: t@4
=>[1] _$d1A12.MAIN_(), line 15 in "example.f90"
[2] __mt_run_my_job_(0x45720, 0xff82ee48, 0x0, 0xff82ee58, 0x0, 0x0), at 0x16860
[3] __mt_SlaveFunction_(0x45720, 0x0, 0xff82ee48, 0x0, 0x455e0, 0x1), at 0x1aaf0

The top frame on the stack is the frame of the outlined function. Even though the code is outlined, the source line number still maps back to 15. The other two frames are for runtime library routines.

When execution is stopped in a parallel region, a where command from a slave thread does not have a stack traceback to its parent thread, as shown in the above example. A where command from the master thread, however, has a full traceback:


[t@4 l@4]: thread t@1
t@1 (l@1) stopped in _$d1A12.MAIN_ at line 15 in file "example.f90"
15           sum = sum + a(i)
[t@1 l@1]: where
current thread: t@1
=>[1] _$d1A12.MAIN_(), line 15 in "example.f90"
[2] __mt_run_my_job_(0x41568, 0xff82ee48, 0x0, 0xff82ee58, 0x0, 0x0), at 0x16860
[3] __mt_MasterFunction_(0x1, 0x0, 0x6, 0x0, 0x0, 0x40d78), at 0x16150
[4] MAIN(), line 12 in "example.f90"

If the number of threads is not large, you might be able to determine how execution reached the breakpoint in a slave thread by using the threads command (see threads Command) to list all the threads, and then switch to each thread to determine which one is the master thread.

Using the dump Command on OpenMP Code

When execution is stopped in a parallel region, a dump command may print more than one copy of private variables. In the following example, the dump command prints two copies of the variable i:


[t@1 l@1]: dump
i = 1
sum = 0.0
a = ARRAY
i = 1000001

Two copies of variable i are printed because the outlined routine is implemented as a nested function of the hosting routine, and private variables are implemented as local variables of the outlined routine. Since a dump command prints all the variables in scope, both the i in hosting routine and the i in the outlined routine are displayed.

Execution Sequence of OpenMP Code

When you are single stepping inside of a parallel region in an OpenMP program, the execution sequence may not be the same as the source code sequence. This difference in sequence occurs because the code in the parallel region is usually transformed and rearranged by the compiler. Single stepping in OpenMP code is similar to single stepping in optimized code where the optimizer has usually moved code around.