Multithreaded Programming Guide

Debugging a Multithreaded Program

Common Oversights

The following list points out some of the more frequent oversights that can cause bugs in multithreaded programs.

Passing a pointer to the caller's stack as an argument to a new thread.

Accessing global memory (shared changeable state) without the protection of a synchronization mechanism.

Creating deadlocks caused by two threads trying to acquire rights to the same pair of global resources in alternate order (so that one thread controls the first resource and the other controls the second resource and neither can proceed until the other gives up).

Trying to reacquire a lock already held (recursive deadlock).

Creating a hidden gap in synchronization protection. This is caused when a code segment protected by a synchronization mechanism contains a call to a function that frees and then reacquires the synchronization mechanism before it returns to the caller. The result is that it appears to the caller that the global data has been protected when it actually has not.

Mixing UNIX signals with threads—it is better to use the sigwait(2) model for handling asynchronous signals.

Using setjmp(3C) and longjmp(3C), and then long-jumping away without releasing the mutex locks.

Failing to reevaluate the conditions after returning from a call to *_cond_wait() or *_cond_timedwait().

Forgetting that default threads are created PTHREAD_CREATE_JOINABLE and must be reclaimed with pthread_join(3THR); note, pthread_exit(3THR) does not free up its storage space.

Making deeply nested, recursive calls and using large automatic arrays can cause problems because multithreaded programs have a more limited stack size than single-threaded programs.

Specifying an inadequate stack size, or using nondefault stacks.

And, note that multithreaded programs (especially those containing bugs) often behave differently in two successive runs, given identical inputs, because of differences in the thread scheduling order.

In general, multithreading bugs are statistical instead of deterministic. Tracing is usually a more effective method of finding order of execution problems than is breakpoint-based debugging.

Tracing and Debugging With the TNF Utilities

Use the TNF utilities (included as part of the Solaris system) to trace, debug, and gather performance analysis information from your applications and libraries. The TNF utilities integrate trace information from the kernel and from multiple user processes and threads, and so are especially useful for multithreaded code.

With the TNF utilities, you can easily trace and debug multithreaded programs. See the TNF manual pages for detailed information on using prex(1) and tnfdump(1).

Using truss(1)

See truss(1) for information on tracing system calls, signals and user-level function calls.

Using mdb(1)

The following mdb commands can be used to access the LWPs of a multithreaded program.

Table 7–3 MT mdb Commands


`pid``:A`	Attaches to process # `pid`. This stops the process and all its LWPs.
`:R`	Detaches from process. This resumes the process and all its LWPs.
`$L`	Lists all active LWPs in the (stopped) process.
`n``:l`	Switches focus to LWP # `n.`
`$l`	Shows the LWP currently focused.
`num`:`i`	Ignores signal number `num.`

These commands to set conditional breakpoints are often useful.

Table 7–4 Setting mdb Breakpoints


`[``label``],[``count``]:b [``expression``]`	Breakpoint is detected when expression equals zero
`foo``,ffff:b <g7-0xabcdef`	Stop at foo when `g7` = the hex value `0xABCDEF`

Using `dbx`

With the dbx utility you can debug and execute source programs written in C++, ANSI C, and FORTRAN. dbx accepts the same commands as the Debugger, but uses a standard terminal (TTY) interface. Both dbx and the Debugger support debugging multithreaded programs. For a full overview of dbx and Debugger features see the dbx(1) reference manual page and the Using Sun Workshop user's guide.

All the dbx options listed in Table 7–5 can support multithreaded applications.

Table 7–5 dbx Options for MT Programs


Option	Meaning
`cont at line [sig signo id]`	Continues execution at `line` with signal `signo`. The `id`, if present, specifies which thread or LWP to continue. The default value is `all`.
`lwp`	Displays current LWP. Switches to given LWP [lwpid].
`lwps`	Lists all LWPs in the current process.
`next ... tid`	Steps the given thread. When a function call is skipped, all LWPs are implicitly resumed for the duration of that function call. Nonactive threads cannot be stepped.
`next ... lid`	Steps the given LWP. Does not implicitly resume all LWPs when skipping a function. The LWP on which the given thread is active. Does not implicitly resume all LWP when skipping a function.
`step... tid`	Steps the given thread. When a function call is skipped, all LWPs are implicitly resumed for the duration of that function call. Nonactive threads cannot be stepped.
`step... lid`	Steps the given LWP. Does not implicitly resume all LWPs when skipping a function.
`stepi... lid`	The given LWP.
`stepi... tid`	The LWP on which the given thread is active.
`thread`	Displays current thread. Switches to thread `tid`. In all the following variations, an optional tid implies the current thread.
`thread -info [ tid ]`	Prints everything known about the given thread.
`thread -locks [ tid ]`	Prints all locks held by the given thread.
`thread -suspend [ tid ]`	Puts the given thread into suspended state.
`thread -continue [ tid ]`	Unsuspends the given thread.
`thread -hide [ tid ]`	Hides the given (or current) thread. It will not appear in the generic `threads` listing.
`thread -unhide [ tid ]`	Unhides the given (or current) thread.
`allthread-unhide`	Unhides all threads.
`threads`	Prints the list of all known threads.
`threads-all`	Prints threads that are not usually printed (zombies).
`all\|filterthreads-mode`	Controls whether `threads` prints all threads or filters them by default.
`auto\|manualthreads-mode`	Enables automatic updating of the thread listing.
`threads-mode`	Echoes the current modes. Any of the previous forms can be followed by a thread or LWP ID to get the traceback for the specified entity.