Sun Microsystems Documentation
Table of Contents
 
 

Debugging Multithreaded Programs

By Ann Rice, May 2008

While a traditional UNIX® process contains a single thread of control, multithreading separates a process into many execution threads, each of which runs independently. Multithreading your code has a number of benefits, but it can also introduce bugs that might be difficult to find. This article suggests ways of avoiding such bugs in your code as well as strategies for finding these bugs using the dbx command-line debugger.

Benefits From Multithreading

Multithreading your code can help in the following areas.

Improving Application Responsiveness

Any program in which many activities are not dependent upon each other can be redesigned so that each independent activity is defined as a thread. For example, the user of a multithreaded GUI does not have to wait for one activity to complete before starting another.

Using Multiprocessors Efficiently

Typically, applications that express concurrency requirements with threads need not take into account the number of available processors. The performance of the application improves transparently with additional processors because the operating system takes care of scheduling threads for the number of processors that are available. When multicore processors and multithreaded processors are available, a multithreaded application's performance scales appropriately because the cores and threads are viewed by the OS as processors.

Numerical algorithms and applications with a high degree of parallelism, such as matrix multiplications, can run much faster when implemented with threads on a multiprocessor.

Improving Program Structure

Many programs are more efficiently structured as multiple independent or semi-independent units of execution instead of as a single, monolithic thread. For example, a non-threaded program that performs many different tasks might need to devote much of its code just to coordinating the tasks. When the tasks are programmed as threads, the code can be simplified. Multithreaded programs. especially programs that provide service to multiple concurrent users, can be more adaptive to variations in user demands than single-threaded programs.

Using Fewer System Resources

Programs that use two or more processes that access common data through shared memory are applying more than one thread of control.

However, each process has a full address space and operating environment state. The cost of creating and maintaining this large amount of state information makes each process much more expensive than a thread in both time and space.

In addition, the inherent separation between processes can require a major effort by the programmer. This effort includes handling communication between the threads in different processes, or synchronizing their actions When the threads are in the same process, communication and synchronization becomes much easier.

Oversights That Can Cause Bugs

The following list point out some of the more frequent oversights that can cause bugs in multithreaded programs:

Multithreaded programs, especially those containing bugs, often behave differently in two successive runs, given identical inputs, because of differences in the thread scheduling order.

In general, multithreading bugs are statistical instead of deterministic. Tracing is usually a more effective method of finding order of execution problems than is breakpoint-based debugging.

Debugging Multithreaded Programs With dbx

When it detects a multithreaded program, dbx tries to load libthread_db.so, a special system library for thread debugging located in /usr/lib. dbx is synchronous; when any thread or lightweight process (LWP) stops, all other threads and LWPs sympathetically stop. (An LWP is a thread in the SolarisTM kernel that executes kernel code and system calls.) This behavior is sometimes referred to as the “stop the world” model.

Setting Breakpoints in Multithreaded Code

You can set breakpoints in multithreaded code using the stop command, trace command, or when command. The basic syntax of these commands is:

stop event-specification [modifier]
trace event-spcification [modifier]
when event-specification [ modifier ] { command; ... }

Two thread-specific events were added in Sun Studio 11 dbx:

You can get an idea of how often your application creates and destroys threads by using the thr_create event and thr_exit event as in the following example:

(dbx) trace thr_create
(dbx) trace thr_exit
(dbx) run

trace: thread created t@2 on l@2
trace: thread created t@3 on l@3
trace: thread created t@4 on l@4
trace: thr_exit t@4
trace: thr_exit t@3
trace: thr_exit t@2

The application created three threads. Note how the threads exited in reverse order from their creation, which might indicate that had the application had more threads, the threads would accumulate and consume resources.

To get more interesting information, you could try the following in a different session:

(dbx) when thr_create { echo "XXX thread $newthread created by $thread"; }
XXX thread t@2 created by t@1
XXX thread t@3 created by t@1
XXX thread t@4 created by t@1

The output shows that all three threads were created by thread t@1, which is a common multithreading pattern.

Suppose you want to debug thread t@3 from its outset. You could stop the application at the point that thread t@3 is created as follows:

(dbx) stop thr_create t@3
(dbx) run
t@1 (l@1) stopped in tdb_event_create at 0xff38409c
0xff38409c: tdb_event_create       :    retl
Current function is main
216       stat = (int) thr_create(NULL, 0, consumer, q, tflags, &tid_cons2);
(dbx)

If your application occasionally spawns a new thread from thread t@5 instead of thread t@1, you could capture that event as follows:

(dbx) stop thr_create -thread t@5

See the Sun Studio 12: Debugging a Program With dbx manual for a complete list of event specifications. Bear in mind that the event you specify may occur in more than one thread, so your program may hit the breakpoint many times. You can specify a thread_id or lwp_id as a modifier to the stop command and the trace command. The action associated with the event is then executed only if the thread or LWP that caused the event matches the thread_id or lwp_id. However, the specific thread of LWP you have in mind might be assigned a different thread_id or lwp_id from one execution of the program to the next.

Stepping Through Multithreaded Code

dbx supports two basic single-step commands: next and step, plus two variants of step, called step up and step to. Both the next command and the step command let the program execute one source line before stopping again. The basic difference between the next and step commands is in how they handle function calls. If the line executed contains a function call:

The syntax of the next command is:

next [ n ] [ -sig signal ] [ thread_id ] [lwp_id ]

The syntax of the step command is:

step [ n ] [ up ] [ -sig signal ] [ thread_id ] [lwp_id ] [ to function ]

To step one line in the current thread or LWP, type:

next

or

step

To step multiple (n) lines in the current thread or LWP, type:

next n

or

step n

To step one line in a different thread, type:

step thread_id

For example:

(dbx) step t@2

With multithreaded programs when a function call is stepped into or stepped over, all LWPs are implicitly resumed for the duration of that function call in order to avoid deadlock.

You can specify a specific thread_id or lwp_id to the next command or the step command, thus changing the current thread or LWP. However, if you do so, this deadlock avoidance measure is defeated. So to avoid deadlocks, it is safer to change the current thread or LWP using the thread command or lwp command, and then use the next command or step command to step in the new current thread or LWP.

Whenever you give a command that steps a single thread or LWP, you need to be aware of potential deadlocks. If the thread that continues executing needs to acquire a lock that is held by a thread that has not resumed execution, your program deadlocks. If such a deadlock occurs, you can break it only by typing ctrl-C and then resuming all threads.

To step up and out of the current function in the current thread or LWP, type:

step up

or

step up lwp_id

To step into a specified function at the current source line, type:

step to function_name

To step into the last function called as determined by the assembly code for the current source line, type:

step to

To deliver a signal while stepping, you can add -sig signal to any of the above next and step commands.

Resuming Execution

To resume execution of your multithreaded program after hitting a breakpoint or after single-stepping through your code, use the cont command. For multithreaded programs, the syntax is:

cont [ at line ] [ thread_id | lwp_id ] [ -sig signal ]

To continue execution of all threads, type:

cont

To continue execution of a specific thread or LWP, type:

cont thread_id

or

cont lwp_id

For example:

(dbx) cont l@3

To continue execution at a specific source line, type:

cont at line_number thread_id

or

cont at line_number lwp_id

To continue execution with a specific signal, you can add -sig signal in any of the above cont commands. Whenever you give a command that resumes a single thread or LWP, you need to be aware of potential deadlocks. If the thread that continues executing needs to acquire a lock that is held by a thread that has not resumed execution, your program deadlocks. If such a deadlock occurs, you can break it only by typing ctrl-C and then resuming all threads.

Viewing the Threads List

To view the threads list, use the threads command. The syntax is:

threads [ -all ] [ -mode [ all|filter ] [ auto|manual ] ] 

The threads command displays the thread information shown in the following example:

(dbx) threads
    t@1 a l@1  ?()  running   in main()
    t@2      ?() asleep on 0xef751450  in_swtch()
    t@3 b l@2  ?()  running in sigwait()
    t@4     consumer()  asleep on 0x22bb0 in _lwp_sema_wait()
  *>t@5 b l@4 consumer()  breakpoint     in Queue_dequeue()
    t@6 b l@5 producer()     running       in _thread_start()
(dbx>

For native code, each line of information displayed by the threads command is composed of the following:

Table 1 Thread and LWP States

State

Description

suspended

The thread has been explicitly suspended.

runnable

The thread is runnable and is waiting for an LWP as a computational resource.

zombie

When a detached thread exits (thr_exit()()), it is in a zombie state until it has rejoined through the use of thr_join().() THR_DETACHED is a flag specified at thread creation time (thr_create()()). A non-detached thread that exits is in a zombie state until it has been reaped.

asleep on syncobj

The thread is blocked on the given synchronization object. Depending on what level of support libthread.so and libthread_db.so provide, syncobj might be as simple as a hexadecimal address or something with more information content.

active

The thread is active on an LWP, but dbx cannot access the LWP.

unknown

dbx cannot determine the state.

lwpstate

A bound or active thread state has the state of the LWP associated with it.

running

The LWP was running but was stopped in synchrony with some other LWP.

syscall num

The LWP stopped on an entry into the given system call number.

syscall return num

The LWP stopped on an exit from the given system call number.

job control

The LWP stopped due to job control.

LWP suspended

The LWP is blocked in the kernel.

single stepped

The LWP has just completed a single step.

breakpoint

The LWP has just hit a breakpoint.

fault num

The LWP has incurred the given fault number.

signal name

The LWP has incurred the given signal.

process sync

The process to which this LWP belongs has just started executing.

LWP death

The LWP is in the process of exiting.

To print the list of all known threads, type:

threads

The output of this command might be:

*>    t@1  a  l@1   ?()   signal SIGINT in  _XFlushInt() 
      t@2  b  l@2   ?()   running          in  _signotifywait() 
      t@3  b  l@3   ?()   running          in  _lwp_sema_wait() 
      t@4      ?()   sleep on (unknown) in  _reap_wait() 

To print threads normally not printed (zombies), type:

threads -all

The output of this command might be:

*>    t@1  a  l@1   ?()   signal SIGINT in  _XFlushInt() 
      t@2  b  l@2   ?()   running          in  _signotifywait() 
      t@3  b  l@3   ?()   running          in  _lwp_sema_wait() 
      t@4      ?()   sleep on (unknown) in  _reap_wait() 
      t@5       myThread()   zombie  in  in 
      t@5       myThread()   zombie  in  in 

By default, the threads command runs in filter mode, meaning that hidden threads and zombie threads are not printed. To print all threads including hidden threads and zombies, type:

threads -mode all

Displaying, Changing, Suspending, or Resuming the Current Thread

The thread command lists or changes the current thread. The syntax is:

thread [ -blocks ] [ -blockedby ] [ -info ] [ -hide ] [ -unhide ] [ -suspend ] [ -resume ] thread_id

To change the current thread, type:

thread thread_id

To print everything known about the current thread or given thread, type:

thread -info [thread_id]

For example, this command might produce the following output:

thread -info t@4

Thread t@4 (0xfe60bd70) at priority 127
        state: asleep on (unknown)
        base function: 0x0: 0x00000000() stack: 0xfe60bd70[1047920]
        flags: DETACHED|DAEMON 
        masked signals: HUP INT QUIT ILL TRAP ABRT EMT FPE BUS SEGV SYS PIPE ALRM TERM USR1 USR2 CLD PWR WINCH 
         URG POLL TSTP CONT TTIN TTOU VTALRM PROF XCPU XFSZ WAITING FREEZE THAW LOST RTMIN RTMIN+1 
         RTMIN+2 RTMIN+3 RTMIN+4 RTMIN+5 RTMIN+6 RTMIN+7 
        Currently inactive in _reap_wait 

To print all locks held by the current thread or given thread blocking other threads, type:

thread -blocks [thread_id]

To show which synchronization object the current thread or given thread is blocked by, if any, type:

thread -blockedby [thread_id]

To suspend the current thread or given thread, type:

thread -suspend [thread_id]

To resume (unsuspend) the current thread or given thread, type:

thread -resume [thread_id]

To hide the current thread or given thread so that it will not be displayed by the threads command, type:

thread -hide [thread_id] 

To unhide the current thread or given thread so that it will be displayed by the threads command, type:

thread -unhide [thread_id]

Displaying LWP Information

Normally, you need not be aware of LWPs, but there are times when thread level queries cannot be completed. In these cases, you can use the lwp command and lwps command to show information about LWPs.

The syntax of the lwp command is:

lwp [ -info ] lwp_id

To list the current LWP, type:

lwp

For example:

lwp l@3

To change the current LWP, type:

lwp lwp_id

To display the name, home, and masked signals of the current LWP, type:

lwp -info

For example, this command might produce the following output:

lwp -info l@2

l@2 running          in _signotifywait()
masked signals are: 

To list all LWPs in the current process, use the lwps command:

lwps

The lwps command displays the LWP information shown in the following example:

(dbx) lwps
    l@1 running in main()
    l@2 running in sigwait()
    l@3 running in _lwp_sema_wait()
  *>l@4 breakpoint in Queue_dequeue()
    l@5 running in _thread_start()
(dbx) 

Each line of the LWP list contains the following:

Runtime Checking Multithreaded Applications

Runtime checking in dbx supports multithreaded applications. Along with each access checking error report, RTC prints the ID of the thread on which the error occurred. The leak report generated by RTC includes the leaks from all the threads in the program.

Potential Problems With Dynamic Function Calls

If you are accustomed to using dynamic function calls from dbx when debugging single-threaded programs, take care in using the same technique when debugging multithreaded code.

dbx allows you to use function calls in expressions. For example, the following command forces the target program to call foo()():

print foo()

Forcing a function call can be useful because it lets you use the program code to examine the state of the program.

You can use the when command to stop execution at particular locations in the program, print data, and then resume execution:

when in bar {print var;}

If you combine these two examples, as in the following command, you are stopping the program at various locations, forcing it to call a function, and then continuing execution:

when in bar {print foo();}

If you give dbx such a command, you must be sure that calling foo()() at the times you stop execution in bar() does not interfere with your program's intended execution.

If your program is multithreaded, it is more difficult to predict when it is safe to force a call to foo()(). One thread might be stopped in bar(), which you know is safe, but other threads might be in the process of modifying data that foo()() relies on.

For More Information

For further details, see:

Multithreaded Programming Guide

Sun Studio 12: Debugging a Program With dbx

Would you recommend this Sun site to a friend or colleague?
ContactAbout SunNewsEmploymentSite MapPrivacyTerms of UseTrademarksCopyright Sun Microsystems, Inc.