dbx can debug multithreaded applications that use either Solaris threads or POSIX threads. With dbx, you can examine stack traces of each thread, resume all threads, step or next a specific thread, and navigate between threads.
dbx recognizes a multithreaded program by detecting whether it utilizes libthread.so. The program uses libthread.so either by explicitly being compiled with -lthread or -mt, or implicitly by being compiled with -lpthread.
This chapter describes how to find information about and debug threads using the dbx thread commands.
This chapter is organized into the following sections:
When it detects a multithreaded program, dbx tries to load libthread_db.so, a special system library for thread debugging located in /usr/lib.
dbx is synchronous; when any thread or lightweight process (LWP) stops, all other threads and LWPs sympathetically stop. This behavior is sometimes referred to as the “stop the world” model.
For information on multithreaded programming and LWPs, see the Solaris Multithreaded Programming Guide.
The following thread information is available in dbx:
(dbx) threads t@1 a l@1 ?() running in main() t@2 ?() asleep on 0xef751450 in_swtch() t@3 b l@2 ?() running in sigwait() t@4 consumer() asleep on 0x22bb0 in _lwp_sema_wait() *>t@5 b l@4 consumer() breakpoint in Queue_dequeue() t@6 b l@5 producer() running in _thread_start() (dbx) |
For native code, each line of information is composed of the following:
The * (asterisk) indicates that an event requiring user attention has occurred in this thread. Usually this is a breakpoint.
An ’o’ instead of an asterisk indicates that a dbx internal event has occurred.
The > (arrow) denotes the current thread.
t@number, the thread id, refers to a particular thread. The number is the thread_t value passed back by thr_create.
b l@number or a l@number means the thread is bound to or active on the designated LWP, meaning the thread is actually runnable by the operating system.
The “Start function” of the thread as passed to thr_create. A ?() means that the start function is not known.
The thread state (see Table 11–1 for descriptions of the thread states).
The function that the thread is currently executing.
For Java code, each line of information is composed of the following:
t@number, a dbx-style thread ID
The thread state (See Table 11–1 for descriptions of the thread states.)
The thread name in single quotation marks
A number indicating the thread priority
Thread and LWP States |
Description |
---|---|
suspended |
The thread has been explicitly suspended. |
runnable |
The thread is runnable and is waiting for an LWP as a computational resource. |
zombie |
When a detached thread exits (thr_exit)), it is in a zombie state until it has rejoined through the use of thr_join(). THR_DETACHED is a flag specified at thread creation time (thr_create()). A non-detached thread that exits is in a zombie state until it has been reaped. |
asleep on syncobj |
Thread is blocked on the given synchronization object. Depending on what level of support libthread and libthread_db provide, syncobj might be as simple as a hexadecimal address or something with more information content. |
active |
The thread is active on an LWP, but dbx cannot access the LWP. |
unknown |
dbx cannot determine the state. |
lwpstate |
A bound or active thread state has the state of the LWP associated with it. |
running |
LWP was running but was stopped in synchrony with some other LWP. |
syscall num |
LWP stopped on an entry into the given system call #. |
syscall return num |
LWP stopped on an exit from the given system call #. |
job control |
LWP stopped due to job control. |
LWP suspended |
LWP is blocked in the kernel. |
single stepped |
LWP has just completed a single step. |
breakpoint |
LWP has just hit a breakpoint. |
fault num |
LWP has incurred the given fault #. |
signal name |
LWP has incurred the given signal. |
process sync |
The process to which this LWP belongs has just started executing. |
LWP death |
LWP is in the process of exiting. |
To switch the viewing context to another thread, use the thread command. The syntax is:
thread [-blocks] [-blockedby] [-info] [-hide] [-unhide] [-suspend] [-resume] thread_id |
To display the current thread, type:
thread |
To switch to thread thread_id, type:
thread thread_id |
For more information on the thread command, see thread Command.
To view the threads list, use the threads command. The syntax is:
threads [-all} [-mode [all|filter] [auto|manual]] |
To print the list of all known threads, type:
threads |
To print threads normally not printed (zombies), type:
threads -all |
For an explanation of the threads list, see Thread Information.
For more information on the threads command, see threads Command.
Use the cont command to resume program execution. Currently, threads use synchronous breakpoints, so all threads resume execution.
However, you can resume a single thread using the call command with the -resumeone option (see call Command).
Consider the following two scenarios when debugging a multithreaded application where many threads call the function lookup():
You set a conditional breakpoint:
stop in lookup -if strcmp(name, "troublesome") == 0 |
When t@1 stops at the call to lookup(), dbx attempts to evaluate the condition and calls strcmp().
You set a breakpoint:
stop in lookup |
When t@1 stops at the call to lookup(), you issue the command:
call strcmp(name, "troublesome") |
When calling strcmp(), dbx would resume all threads for the duration of the call, which similar to what dbx does when you are single stepping with the next command. It does so because resuming only t@1 has the potential to cause a deadlock if strcmp() tries to grab a lock that is owned by another thread.
A drawback to resuming all threads in this case is that dbx cannot handle another thread, such as t@2, hitting the breakpoint at lookup() whilestrcmp() is being called. It emits a warning like one of the following:
event infinite loop causes missed events in following handlers:
Event reentrancy first event BPT(VID 6, TID 6, PC echo+0x8) second event BPT(VID 10, TID 10, PC echo+0x8) the following handlers will miss events:
In such cases, if you can ascertain that the function called in the conditional expression will not grab a mutex, you can use the -resumeone event modifier to force dbx to resume only t@1:
stop in lookup -resumeone -if strcmp(name, "troublesome") == 0 |
Only the thread that hit the breakpoint in lookup() would be resumed in order to evaluate strcmp().
This approach does not help in cases such as the following:
If the second breakpoint on lookup() happens in the same thread because the conditional recursively calls lookup()
If the thread on which the conditional runs yields, sleeps, or in some manner relinquishes control to another thread
You can get an idea of how often your application creates and destroys threads by using the thr_create event and thr_exit event as in the following example:
(dbx) trace thr_create (dbx) trace thr_exit (dbx) run trace: thread created t@2 on l@2 trace: thread created t@3 on l@3 trace: thread created t@4 on l@4 trace: thr_exit t@4 trace: thr_exit t@3 trace: thr_exit t@2 |
The application created three threads. Note how the threads exited in reverse order from their creation, which might indicate that had the application had more threads, the threads would accumulate and consume resources.
To get more interesting information, you could try the following in a different session:
(dbx) when thr_create { echo "XXX thread $newthread created by $thread"; } XXX thread t@2 created by t@1 XXX thread t@3 created by t@1 XXX thread t@4 created by t@1 |
The output shows that all three threads were created by thread t@1, which is a common multi-threading pattern.
Suppose you want to debug thread t@3 from its outset. You could stop the application at the point that thread t@3 is created as follows:
(dbx) stop thr_create t@3 (dbx) run t@1 (l@1) stopped in tdb_event_create at 0xff38409c 0xff38409c: tdb_event_create : retl Current function is main 216 stat = (int) thr_create(NULL, 0, consumer, q, tflags, &tid_cons2); (dbx) |
If your application occasionally spawns a new thread from thread t@5 instead of thread t@1, you could capture that event as follows:
(dbx) stop thr_create -thread t@5 |
Normally, you need not be aware of LWPs. There are times, however, when thread level queries cannot be completed. In these cases, use the lwps command to show information about LWPs.
(dbx) lwps l@1 running in main() l@2 running in sigwait() l@3 running in _lwp_sema_wait() *>l@4 breakpoint in Queue_dequeue() l@5 running in _thread_start() (dbx) |
Each line of the LWP list contains the following:
The * (asterisk) indicates that an event requiring user attention has occurred in this LWP.
The arrow denotes the current LWP.
l@number refers to a particular LWP.
The next item represents the LWP state.
in function_name() identifies the function that the LWP is currently executing.
Use thelwp Command to list or change the current LWP.