The following discussion describes characteristics that can cause bugs in multithreaded programs. Utilities that you can use to help debug your program are also described.
The following list points out some of the more frequent oversights that can cause bugs in multithreaded programs.
A pointer passed to the caller's stack as an argument to a new thread.
The shared changeable state of global memory accessed without the protection of a synchronization mechanism leading to a data race. A data race occurs when two or more threads in a single process access the same memory location concurrently, and at least one of the threads tries to write to the location. When the threads do not use exclusive locks to control their accesses to that memory, the order of accesses is non-deterministic, and the computation may give different results from run to run depending on that order. Some data races may be benign (for example, when the memory access is used for a busy-wait), but many data races are bugs in the program. The Thread Analyzer tool is useful for detecting data races. See Detecting Data Races and Deadlocks Using Thread Analyzer.
Deadlocks caused by two threads trying to acquire rights to the same pair of global resources in alternate order. One thread controls the first resource and the other controls the second resource. Neither thread can proceed until the other gives up. The Thread Analyzer tool is also useful for detecting deadlocks. See Detecting Data Races and Deadlocks Using Thread Analyzer.
Trying to reacquire a lock already held (recursive deadlock).
Creating a hidden gap in synchronization protection. This gap in protection occurs when a protected code segment contains a function that frees and reacquires the synchronization mechanism before returning to the caller. The result is misleading. To the caller, the appearance is that the global data has been protected when the data actually has not been protected.
When mixing UNIX signals with threads, and not using the sigwait(2) model for handling asynchronous signals.
Calling setjmp(3C) and longjmp(3C), and then long-jumping away without releasing the mutex locks.
Failing to re-evaluate the conditions after returning from a call to *_cond_wait() or *_cond_timedwait().
Forgetting that default threads are created PTHREAD_CREATE_JOINABLE and must be reclaimed with pthread_join(3C). Note that pthread_exit(3C) does not free up its storage space.
Making deeply nested, recursive calls and using large automatic arrays can cause problems because multithreaded programs have a more limited stack size than single-threaded programs.
Specifying an inadequate stack size, or using nondefault stacks.
Multithreaded programs, especially those containing bugs, often behave differently in two successive runs, even with identical inputs. This behavior is caused by differences in the order that threads are scheduled.
In general, multithreading bugs are statistical instead of deterministic. Tracing is usually a more effective method of finding the order of execution problems than is breakpoint-based debugging.
DTrace is a comprehensive dynamic tracing facility that is built into the Solaris OS. The DTrace facility can be used to examine the behavior of your multithreaded program. DTrace inserts probes into running programs to collect data at points in the execution path that you specify. The collected data can be examined to determine problem areas. See the Solaris Dynamic Tracing Guide and the DTrace User Guide for more information about using DTrace.
The Sun Developers Network web site contains several articles about DTrace, including the DTrace Quick Reference Guide.
The Performance Analyzer tool, included in the Sun Studio software, can be used for extensive profiling of multithreaded and single threaded programs. The tool enables you to see in detail what a thread is doing at any given point. See the Sun Studio web page and Sun Studio Information Center for more information.
The Sun Studio software includes a tool called the Thread Analyzer. This tool enables you to analyze the execution of a multithreaded program. It can detect multithreaded programming errors such as data races or deadlocks in code that is written using the Pthread API, the Solaris thread API, OpenMP directives, Sun parallel directives, Cray® parallel directives, or a mix of these technologies.
See the Sun Studio 12: Thread Analyzer User’s Guide.
The dbx utility is a debugger included in the Sun Studio developer tools, available from http://developers.sun.com/sunstudio/. With the Sun Studio dbx command-line debugger, you can debug and execute source programs that are written in C, C++, and Fortran. You can use dbx by starting it in a terminal window and interactively debugging your program with dbx commands. If you prefer a graphical interface, you can use the same dbx functionality in the Debugging windows of the Sun Studio IDE (Integrated Development Environment). For a description of how to start dbx, see the dbx(1) man page. See the manual Sun Studio 12: Debugging a Program With dbx for an overview of dbx. The Debugging features in the Sun Studio IDE are described in the IDE online help.
See Chapter 11, Debugging Multithreaded Applications, in Sun Studio 12: Debugging a Program With dbx for detailed information about debugging multithreaded programs. The dbx debugger provides commands to manipulate event handlers for thread events, which are described in Appendix B, Event Management, in Sun Studio 12: Debugging a Program With dbx.
All the dbx options that are listed in Table 8–1 can support multithreaded applications.
Table 8–1 dbx Options for MT Programs
Option |
Action |
---|---|
cont at line [-sig signo id] |
Continues execution at line with signal signo. The id, if present, specifies which thread or LWP to continue. The default value is all. |
lwp [lwpid] |
Displays current LWP. Switches to given LWP [lwpid]. |
lwps |
Lists all LWPs in the current process. |
next ... tid |
Steps the given thread. When a function call is skipped, all LWPs are implicitly resumed for the duration of that function call. Nonactive threads cannot be stepped. |
next ... lwpid |
Steps the given LWP. Does not implicitly resume all LWPs when skipping a function. The LWP on which the given thread is active. Does not implicitly resume all LWP when skipping a function. |
step... tid |
Steps the given thread. When a function call is skipped, all LWPs are implicitly resumed for the duration of that function call. Nonactive threads cannot be stepped. |
step... lwpid |
Steps the given LWP. Does not implicitly resume all LWPs when skipping a function. |
stepi... lwpid |
Steps machine instructions (stepping into calls) in the given LWP. |
stepi... tid |
Steps machine instructions in the LWP on which the given thread is active. |
thread [ tid ] |
Displays current thread, or switches to thread tid. In all the following variations, omitting the l tid implies the current thread. |
thread -info [ tid ] |
Prints everything known about the given thread. |
thread -blocks [ tid ] |
Prints all locks held by the given thread blocking other threads. |
thread -suspend [ tid ] |
Puts the given thread into suspended state, which prevents it from running. A suspended thread displays with an “S” in the threads listing. |
thread -resume [ tid ] |
Unsuspends the given thread so it resumes running. |
thread -hide [ tid ] |
Hides the given or current thread. The thread does not appear in the generic threads listing. |
thread -unhide [ tid ] |
Unhides the given or current thread. |
thread -unhide all |
Unhides all threads. |
threads |
Prints the list of all known threads. |
threads -all |
Prints threads that are not usually printed (zombies). |
threads -mode all|filter |
Controls whether threads prints all threads or filters threads by default. When filtering is on, threads that have been hidden by the thread -hide command are not listed. |
threads -mode auto|manual |
Enables automatic updating of the thread listing. |
threads -mode |
Echoes the current modes. Any of the previous forms can be followed by a thread or LWP ID to get the traceback for the specified entity. |
Although Dtrace, Performance Analyzer, Thread Analyzer, and dbx are more modern tools, you can also still use the older TNF utilities to trace, debug, and gather performance analysis information from your applications and libraries. The TNF utilities integrate trace information from the kernel as well as from multiple user processes and threads. The TNF utilities have long been included as part of the Solaris software. See the tracing(3TNF) man page for information about these utilities.
See the truss(1)man page for information on tracing system calls, signals and user-level function calls.
For information about mdb, see the Solaris Modular Debugger Guide.
The following mdb commands can be used to access the LWPs of a multithreaded program.
Prints the LWP ID of the representative thread if the target is a user process.
Prints the LWP IDs of each LWP in the target if the target is a user process.
Attaches to process # pid.
Releases the previously attached process or core file. The process can subsequently be continued by prun(1) or it can be resumed by applying MDB or another debugger.
These commands to set conditional breakpoints are often useful.
Set a breakpoint at the specified locations.
Delete the event specifiers with the given ID number.