Understanding Multithreaded Debugging

Language:

dbx recognizes a multithreaded program by detecting whether it utilizes libthread.so. The program uses libthread.so either by explicitly being compiled with -lthread or -mt, or implicitly by being compiled with -lpthread.

When it detects a multithreaded program, dbx tries to load libthread_db.so, a special system library for thread debugging located in /usr/lib.

dbx is synchronous, so when any thread or lightweight process (LWP) stops, all other threads and LWPs sympathetically stop. This behavior is sometimes referred to as the “stop the world” model.

Note - For information on multithreaded programming and LWPs, see the Oracle Solaris Multithreaded Programming Guide.

Thread Information

The thread information shown in the following example is available in dbx.

(dbx) threads
    t@1 a l@1  ?()  running   in main()
    t@2      ?() asleep on 0xef751450  in_swtch()
    t@3 b l@2  ?()  running in sigwait()
    t@4     consumer()  asleep on 0x22bb0 in _lwp_sema_wait()
  *>t@5 b l@4 consumer()  breakpoint     in Queue_dequeue()
    t@6 b l@5 producer()     running       in _thread_start()
(dbx)

For native code, each line of information is composed of the following:

The * (asterisk) indicates that an event requiring user attention has occurred in this thread. Usually this is a breakpoint.

An ’o’ instead of an asterisk indicates that a dbx internal event has occurred.
The > (arrow) denotes the current thread.
t@number, the thread id, refers to a particular thread. The number is the thread_t value passed back by thr_create.
b l@number or a l@number means the thread is bound to or active on the designated LWP, meaning the thread is actually runnable by the operating system.
The “Start function” of the thread as passed to thr_create. A ?() means that the start function is not known.
The thread state .
The function that the thread is currently executing.

For Java code, each line of information is composed of the following:

t@number, a dbx-style thread ID
The thread state
The thread name in single quotation marks
A number indicating the thread priority

Thread and LWP States

suspended: The thread has been explicitly suspended.
runnable: The thread is runnable and is waiting for an LWP as a computational resource.
zombie: When a detached thread exits (thr_exit)), it is in a zombie state until it has rejoined through the use of thr_join(). THR_DETACHED is a flag specified at thread creation time (thr_create()). A non-detached thread that exits is in a zombie state until it has been reaped.
asleep on syncobj: Thread is blocked on the given synchronization object. Depending on what level of support libthread and libthread_db provide, syncobj might be as simple as a hexadecimal address or something with more information content.
active: The thread is active on an LWP but dbx cannot access the LWP.
unknown: dbx cannot determine the state.
lwpstate: A bound or active thread state has the state of the LWP associated with it.
running: LWP was running but was stopped in synchrony with some other LWP.
syscall num: LWP stopped on an entry into the given system call #.
syscall return num: LWP stopped on an exit from the given system call #.
job control: LWP stopped due to job control.
LWP suspended: LWP is blocked in the kernel.
single stepped: LWP has just completed a single step.
breakpoint: LWP has just hit a breakpoint.
fault num: LWP has incurred the given fault #.
signal name: LWP has incurred the given signal.
process sync: The process to which this LWP belongs has just started executing.
LWP death: LWP is in the process of exiting.

Viewing the Context of Another Thread

To switch the viewing context to another thread, use the thread command. The syntax is:

thread [-blocks] [-blockedby] [-info] [-hide] [-unhide] [-suspend] [-resume] thread_id

To display the current thread:

thread

To switch to thread thread-ID:

thread thread-ID

For more information, see thread Command.

Viewing the Threads List

To view the threads list, use the threads command. The syntax is:

threads [-all] [-mode [all|filter] [auto|manual]]

To print the list of all known threads:

threads

To print threads normally not printed (zombies):

threads -all

For an explanation of the threads list, see Thread Information.

For more information on the threads command, see threads Command.

Resuming Execution

Use the cont command to resume program execution. Currently, threads use synchronous breakpoints, so all threads resume execution. However, you can resume a single thread using the call command with the –resumeone option.

Consider the following two scenarios when debugging a multithreaded application where many threads call the function lookup():

You set a conditional breakpoint:
```
stop in lookup -if strcmp(name, "troublesome") == 0
```
When t@1 stops at the call to lookup(), dbx attempts to evaluate the condition and calls strcmp().
You set a breakpoint:
```
stop in lookup
```
When t@1 stops at the call to lookup(), you issue the command:
```
call strcmp(name, "troublesome")
```

When calling strcmp(), dbx would resume all threads for the duration of the call, which is similar to what dbx does when you are single-stepping with the next command. It does so because resuming only t@1 has the potential to cause a deadlock if strcmp() tries to grab a lock that is owned by another thread.

A drawback to resuming all threads in this case is that dbx cannot handle another thread, such as t@2, hitting the breakpoint at lookup() whilestrcmp() is being called. It emits a warning like one of the following:

event infinite loop causes missed events in following handlers:

Event reentrancy first event BPT(VID 6, TID 6, PC echo+0x8) second event BPT(VID 10, TID 10, PC echo+0x8) the following handlers will miss events:

In such cases, if you can ascertain that the function called in the conditional expression will not grab a mutex, you can use the -resumeone event modifier to force dbx to resume only t@1:

stop in lookup -resumeone -if strcmp(name, "troublesome") == 0

Only the thread that hit the breakpoint in lookup() would be resumed in order to evaluate strcmp().

This approach does not help in cases such as the following examples:

If the second breakpoint on lookup() happens in the same thread because the conditional recursively calls lookup()
If the thread on which the conditional runs yields, sleeps, or in some manner relinquishes control to another thread