Multithreaded Programming Guide

Chapter 5 Programming With the Solaris Software

This chapter describes how multithreading interacts with the Solaris software and how the software has changed to support multithreading.

Forking Issues in Process Creation

The default handling of fork() in the Solaris 9 product and earlier Solaris releases is somewhat different from the way fork() is handled in POSIX threads. For Solaris releases after Solaris 9, fork() behaves as specified for POSIX threads in all cases.

Table 5–1 compares the differences and similarities of fork() handling in Solaris threads and pthreads. When the comparable interface is not available either in POSIX threads or in Solaris threads, the `—' character appears in the table column.

Table 5–1 Comparing POSIX and Solaris fork() Handling

Solaris Interface 

POSIX Threads Interface 

Fork-one model 




Fork-all model 



Fork safety 



Fork-One Model

As shown in Table 5–1, the behavior of the pthreads fork(2) function is the same as the behavior of the Solaris fork1(2) function. Both the pthreads fork(2) function and the Solaris fork1(2) function create a new process, duplicating the complete address space in the child. However, both functions duplicate only the calling thread in the child process.

Duplication of the calling thread in the child process is useful when the child process immediately calls exec(), which is what happens after most calls to fork(). In this case, the child process does not need a duplicate of any thread other than the thread that called fork().

In the child, do not call any library functions after calling fork() and before calling exec(). One of the library functions might use a lock that was held in the parent at the time of the fork(). The child process may execute only Async-Signal-Safe operations until one of the exec() handlers is called. See Signal Handlers and Async-Signal Safety for more information about Async-Signal-Safe functions.

Fork-One Safety Problem and Solution

Besides the usual concerns such as locking shared data, a library should be well behaved with respect to forking a child process when only the thread that called fork() is running. The problem is that the sole thread in the child process might try to grab a lock held by a thread not duplicated in the child.

Most programs are not likely to encounter this problem. Most programs call exec() in the child right after the return from fork(). However, if the program has to carry out actions in the child before calling exec(), or never calls exec(), then the child could encounter deadlocks. Each library writer should provide a safe solution, although not providing a fork-safe library is not a large concern because this condition is rare.

For example, assume that T1 is in the middle of printing something and holds a lock for printf(), when T2 forks a new process. In the child process, if the sole thread (T2) calls printf(), T2 promptly deadlocks.

The POSIX fork() or Solaris fork1() function duplicates only the thread that calls fork() or fork1() . If you call Solaris forkall() to duplicate all threads, this issue is not a concern.

However, forkall() can cause other problems and should be used with care. For instance, if a thread calls forkall(), the parent thread performing I/O to a file is replicated in the child process. Both copies of the thread will continue performing I/O to the same file, one in the parent and one in the child, leading to malfunctions or file corruption.

To prevent deadlock when calling fork1(), ensure that no locks are being held at the time of forking. The most obvious way to prevent deadlock is to have the forking thread acquire all the locks that could possibly be used by the child. Because you cannot acquire all locks for printf() because printf() is owned by libc, you must ensure that printf() is not being used at fork() time.

Tip –

The Thread Analyzer utility included in the Sun Studio software enables you to detect deadlocks in a running program. See Sun Studio 12: Thread Analyzer User’s Guide for more information.

To manage the locks in your library, you should perform the following actions:

In the following example, the list of locks used by the library is { L1,...Ln}. The locking order for these locks is also L1...Ln.


When using either Solaris threads or POSIX threads, you can add a call to pthread_atfork(f1, f2, f3) in your library's .init() section. The f1(), f2(), f3() are defined as follows:

f1() /* This is executed just before the process forks. */
 mutex_lock(L1); |
 mutex_lock(...); | -- ordered in lock order
 mutex_lock(Ln); |
 } V

f2() /* This is executed in the child after the process forks. */

f3() /* This is executed in the parent after the process forks. */

Virtual Forks–vfork

The standard vfork(2) function is unsafe in multithreaded programs. vfork(2) , like fork1(2), copies only the calling thread in the child process. As in nonthreaded implementations, vfork() does not copy the address space for the child process.

Be careful that the thread in the child process does not change memory before the thread calls exec(2). vfork() gives the parent address space to the child. The parent gets its address space back after the child calls exec() or exits. The child must not change the state of the parent.

For example, disastrous problems occur if you create new threads between the call to vfork() and the call to exec().

Solution: pthread_atfork

Use pthread_atfork() to prevent deadlocks whenever you use the fork-one model.

#include <pthread.h>

int pthread_atfork(void (*prepare) (void), void (*
parent) (void),
    void (*child) (void) );

The pthread_atfork() function declares fork() handlers that are called before and after fork() in the context of the thread that called fork().

Any handler argument can be set to NULL. The order in which successive calls to pthread_atfork() are made is significant.

For example, a prepare handler could acquire all the mutexes needed. Then the parent and child handlers could release the mutexes. The prepare handler acquiring all required mutexes ensures that all relevant locks are held by the thread calling the fork function before the process is forked. This technique prevents a deadlock in the child.

See the pthread_atfork(3C) man page for more information.

Fork-All Model

The Solaris forkall(2) function duplicates the address space and all the threads in the child. Address space duplication is useful, for example, when the child process never calls exec(2) but does use its copy of the parent address space.

When one thread in a process calls Solaris forkall(2), threads that are blocked in an interruptible system call will return EINTR.

Be careful not to create locks that are held by both the parent and child processes. Locks held in both parent and child processes occur when locks are allocated in shared memory by calling mmap() with the MAP_SHARED flag. This problem does not occur if the fork-one model is used.

Choosing the Right Fork

Starting with the Solaris 10 release, a call to fork() is identical to a call to fork1(). Specifically, only the calling thread is replicated in the child process. The behavior is the same as the POSIX fork().

In previous releases of the Solaris software, the behavior of fork() was dependent on whether the application was linked with the POSIX threads library. When linked with -lthread (Solaris threads) but not linked with -lpthread (POSIX threads), fork() was the same as forkall(). When linked with -lpthread, regardless of whether fork() was also linked with -lthread , fork() was the same as fork1().

Starting with the Solaris 10 release, neither -lthread nor -lpthread is required for multithreaded applications. The -mt option is used to indicate that you are compiling a multithreaded application. The standard C library provides all threading support for both sets of application program interfaces. Applications that require replicate all fork semantics must call forkall().

Process Creation: exec and exit Issues

Both the exec(2) and exit(2) system calls work as these functions do in single-threaded processes with the following exception. In a multithreaded application, the functions destroy all the threads in the address space. Both calls block until all the execution resources, and so all active threads, are destroyed.

When exec() rebuilds the process, exec() creates a single lightweight process (LWP). The process startup code builds the initial thread. As usual, if the initial thread returns, the thread calls exit() and the process is destroyed.

When all the threads in a process exit, the process exits. A call to any exec() function from a process with more than one thread terminates all threads, and loads and executes the new executable image. No destructor functions are called.

Timers, Alarms, and Profiling

Over several releases, the Solaris OS has evolved to a per-process mode for alarms, interval timers, and profiling.


All timers are per-process except for the real time profile interval timer, which is per_LWP. See the setitimer(2) man page for a description of the ITIMER_REALPROF timer.

The timer IDs of per-process timers are usable from any LWP. The expiration signals are generated for the process rather than directed to a specific LWP.

The per-process timers are deleted only by timer_delete(3RT), or when the process terminates.


Alarms operate at the process level, not at the thread level. The alarm() function sends the signal SIGALRM to the calling process rather than the calling thread.

Profiling a Multithreaded Program

The profil() system call for multithreaded processes has global impact on all LWPs and threads in the process. Threads cannot use profil() for individual thread profiling. See the profil(2) man page for more information.

Tip –

The Performance Analyzer tool, included in the Sun Studio software, can be used for extensive profiling of multithreaded and single threaded programs. The tool enables you to see in detail what a thread is doing at any given point. See the Sun Studio web page and Sun Studio Information Center for more information.

Nonlocal Goto: setjmp and longjmp

The scope of setjmp() and longjmp() is limited to one thread, which is acceptable most of the time. However, the limited scope does mean that a thread that handles a signal can execute a longjmp() only when a setjmp() is performed in the same thread.

Resource Limits

Resource limits are set on the entire process and are determined by adding the resource use of all threads in the process. When a soft resource limit is exceeded, the offending thread is sent the appropriate signal. The sum of the resources that are used in the process is available through getrusage(3C).

LWPs and Scheduling Classes

The Solaris kernel has three ranges of dispatching priority. The highest-priority range (100 to 159) corresponds to the Realtime (RT) scheduling class. The middle-priority range (60 to 99) corresponds to the system (SYS) scheduling class. The system class cannot be applied to a user process. The lowest-priority range (0 to 59) is shared by the timesharing (TS), interactive (IA), fair-share (FSS), and fixed priority (FX) scheduling classes.

A scheduling class is maintained for each LWP. When a process is created, the initial LWP inherits the scheduling class and priority of the creating LWP in the parent process. As more threads are created, their associated LWPs also inherit this scheduling class and priority.

Threads have the scheduling class and priority of their underlying LWPs. Each LWP in a process can have a unique scheduling class and priority that are visible to the kernel.

Thread priorities regulate contention for synchronization objects. By default, LWPs are in the timesharing class. For compute-bound multithreading, thread priorities are not very useful. For multithreaded applications that use the MT libraries to do synchronization frequently, thread priorities are more meaningful.

The scheduling class is set by priocntl(2). How you specify the first two arguments determines whether only the calling LWP or all the LWPs of one or more processes are affected. The third argument of priocntl() is the command, which can be one of the following commands.

The user-level priority of an LWP is its priority within its class, not its dispatch priority. This does not change over time except by the application of the priocntl() system call. The kernel determines the dispatch priority of an LWP based on its scheduling class, its priority within that class, and possibly other factors such as its recently-used CPU time.

Timeshare Scheduling

Timeshare scheduling attempts to distribute processor resources fairly among the LWPs in the timesharing (TS) and interactive (IA) scheduling classes.

The priocntl(2) call sets the class priority of one or more processes or LWPs. The normal range of timesharing class priorities is -60 to +60. The higher the value, the higher the kernel dispatch priority. The default timesharing class priority is 0.

The old concept of a nice value for a process, where a lower nice value means a higher priority, is maintained for all of the TS, IA, and FSS scheduling classes. The old nice-based setpriority(3C) and nice(2) interfaces continue to work by mapping nice values into priority values. Setting a nice value changes the priority and vice-versa. The range of nice values is -20 to +20. A nice value of 0 corresponds to a priority of 0. A nice value of -20 corresponds to a priority of +60.

The dispatch priority of time-shared LWPs is calculated from the instantaneous CPU use rate of the LWP and from its class priority. The class priority indicates the relative priority of the LWPs to the timeshare scheduler.

LWPs with a smaller class priority value get a smaller, but nonzero, share of the total processing. An LWP that has received a larger amount of processing is given lower dispatch priority than an LWP that has received little or no processing.

Realtime Scheduling

The Realtime class (RT) can be applied to a whole process or to one or more LWPs in a process. You must have superuser privilege to use the Realtime class.

The normal range of realtime class priorities is 0 to 59. The dispatch priority of an LWP in the realtime class is fixed at its class priority plus 100.

The scheduler always dispatches the highest-priority Realtime LWP. The high-priority Realtime LWP preempts a lower-priority LWP when a higher-priority LWP becomes runnable. A preempted LWP is placed at the head of its level queue.

A Realtime LWP retains control of a processor until the LWP is preempted, the LWP suspends, or its Realtime priority is changed. LWPs in the RT class have absolute priority over processes in the TS class.

A new LWP inherits the scheduling class of the parent process or LWP. An RT class LWP inherits the parent's time slice, whether finite or infinite.

A finite time slice LWP runs until the LWP terminates, blocks on an I/O event, gets preempted by a higher-priority runnable Realtime process, or the time slice expires.

An LWP with an infinite time slice ceases execution only when the LWP terminates, blocks, or is preempted.

Fair Share Scheduling

The fair share scheduler (FSS) scheduling class allows allocation of CPU time based on shares.

The normal range of fair share scheduler class priorities is -60 to 60, which get mapped by the scheduler into dispatch priorities in the same range (0 to 59) as the TS and IA scheduling classes. All LWPs in a process must run in the same scheduling class. The FSS class schedules individual LWPs, not whole processes. Thus, a mix of processes in the FSS and TS/IA classes could result in unexpected scheduling behavior in both cases.

The TS/IA or the FSS scheduling class processes do not compete for the same CPUs. Processor sets enable mixing TS/IA with FSS in a system. However, all processes in each processor set must be in either the TS/IA or the FSS scheduling class.

Fixed Priority Scheduling

The FX, fixed priority, scheduling class assigns fixed priorities and time quantum not adjusted to accommodate resource consumption. Process priority can be changed only by the process that assigned the priority or an appropriately privileged process. For more information about FX, see the priocntl(1) and dispadmin(1M) man pages.

The normal range of fixed priority scheduler class priorities is 0 to 60, which get mapped by the scheduler into dispatch priorities in the same range (0 to 59) as the TS and IA scheduling classes.

Extending Traditional Signals

The traditional UNIX signal model is extended to threads in a fairly natural way. The key characteristics are that the signal disposition is process-wide, but the signal mask is per-thread. The process-wide disposition of signals is established using the traditional mechanisms signal(3C), sigaction(2), and so on.

When a signal handler is marked SIG_DFL or SIG_IGN, the action on receipt of a signal is performed on the entire receiving process. These signals include exit, core dump, stop, continue, and ignore. The action on receipt of these signals is carried out on all threads in the process. Therefore, the issue of which thread picks the signal is nonexistent. The exit, core dump, stop, continue, and ignore signals have no handlers. See the signal.h(3HEAD) man page for basic information about signals.

Each thread has its own signal mask. The signal mask lets a thread block some signals while the thread uses memory or another state that is also used by a signal handler. All threads in a process share the set of signal handlers that are set up by sigaction(2) and its variants.

A thread in one process cannot send a signal to a specific thread in another process. A signal sent by kill(2), sigsend(2), or sigqueue(3RT) to a process is handled by any receptive threads in the process.

Signals are divided into the following categories: traps, exceptions, and interrupts. Traps and exceptions are synchronously generated signals. Interrupts are asynchronously generated signals.

As in traditional UNIX, if a signal is pending, additional occurrences of that signal normally have no additional effect. A pending signal is represented by a bit, not by a counter. However, signals that are posted through the sigqueue(3RT) interface allow multiple instances of the same signal to be queued to the process.

As is the case with single-threaded processes, when a thread receives a signal while blocked in a system call, the thread might return early. When a thread returns early, the thread either returns an EINTR error code, or, in the case of I/O calls, with fewer bytes transferred than requested.

Of particular importance to multithreaded programs is the effect of signals on pthread_cond_wait(3C). This call usually returns without error, a return value of zero, only in response to a pthread_cond_signal(3C) or a pthread_cond_broadcast(3C). However, if the waiting thread receives a traditional UNIX signal, pthread_cond_wait() returns with a return value of zero even though the wakeup was spurious.

Synchronous Signals

Traps, such as SIGILL, SIGFPE, and SIGSEGV, result from an operation on the thread, such as dividing by zero or making reference to nonexistent memory. A trap is handled only by the thread that caused the trap. Several threads in a process can generate and handle the same type of trap simultaneously.

The idea of signals to individual threads is easily extended for synchronously generated signals. The handler is invoked on the thread that generated the synchronous signal.

However, if the process chooses not to establish an appropriate signal handler, the default action is taken when a trap occurs. The default action occurs even if the offending thread is blocked on the generated signal. The default action for such signals is to terminate the process, perhaps with a core dump.

Such a synchronous signal usually means that something is seriously wrong with the whole process, not just with a thread. In this case, terminating the process is often a good choice.

Asynchronous Signals

Interrupts, such as SIGINT and SIGIO, are asynchronous with any thread and result from some action outside the process. These interrupts might be signals sent explicitly by another process, or might represent external actions such as a user typing a Control-C.

An interrupt can be handled by any thread whose signal mask allows the interrupt. When more than one thread is able to receive the interrupt, only one thread is chosen.

When multiple occurrences of the same signal are sent to a process, then each occurrence can be handled by a separate thread. However, the available threads must not have the signal masked. When all threads have the signal masked, then the signal is marked pending and the first thread to unmask the signal handles the signal.

Continuation Semantics

Continuation semantics are the traditional way to deal with signals. When a signal handler returns, control resumes where the process was at the time of the interruption. This control resumption is well suited for asynchronous signals in single-threaded processes, as shown in Example 5–1.

This control resumption is also used as the exception-handling mechanism in other programming languages, such as PL/1.

Example 5–1 Continuation Semantics

unsigned int nestcount;

unsigned int A(int i, int j) {

    if (i==0)
    else if (j==0)
        return(A(i-1, 1));
        return(A(i-1, A(i, j-1)));

void sig(int i) {
    printf("nestcount = %d\n", nestcount);

main() {
    sigset(SIGINT, sig);

Operations on Signals

This section describes the operations on signals.

Setting the Thread's Signal Mask

Sending a Signal to a Specific Thread

Waiting for a Specified Signal

Waiting for Specified Signal Within a Given Time

Setting the Thread's Signal Mask

pthread_sigmask(3C) does for a thread what sigprocmask(2) does for a process. pthread_sigmask() sets the thread's signal mask. When a new thread is created, its initial mask is inherited from its creator.

The call to sigprocmask() in a multithreaded process is equivalent to a call to pthread_sigmask(). See the sigprocmask(2) man page for more information.

Sending a Signal to a Specific Thread

pthread_kill(3C) is the thread analog of kill(2). A pthread_kill() call sends a signal to a specific thread. A signal that is sent to a specified thread is different from a signal that is sent to a process. When a signal is sent to a process, the signal can be handled by any thread in the process. A signal sent by pthread_kill() can be handled only by the specified thread.

You can use pthread_kill() to send signals only to threads in the current process. Because the thread identifier, type thread_t, is local in scope, you cannot name a thread outside the scope of the current process.

On receipt of a signal by the target thread, the action invoked (handler, SIG_DFL, or SIG_IGN) is global, as usual. If you send SIGXXX to a thread, and SIGXXX to kill a process, the whole process is killed when the target thread receives the signal.

Waiting for a Specified Signal

For multithreaded programs, sigwait(2) is the preferred interface to use because sigwait() deals well with asynchronously generated signals.

sigwait() causes the calling thread to wait until any signal identified by the sigwait() function's set argument is delivered to the thread. While the thread is waiting, signals identified by the set argument are unmasked, but the original mask is restored when the call returns.

All signals identified by the set argument must be blocked on all threads, including the calling thread. Otherwise, sigwait() might not work correctly.

Use sigwait() to separate threads from asynchronous signals. You can create one thread that listens for asynchronous signals while you create other threads to block any asynchronous signals set to this process.

The following example shows the syntax of sigwait() .

#include <signal.h>
int sigwait(const sigset_t *set, int *sig

When the signal is delivered, sigwait() clears the pending signal and places the signal number in sig. Many threads can call sigwait() at the same time, but only one thread returns for each signal that is received.

With sigwait(), you can treat asynchronous signals synchronously. A thread that deals with such signals calls sigwait() and returns as soon as a signal arrives. By ensuring that all threads, including the caller of sigwait(), mask asynchronous signals, ensures signals are handled only by the intended handler and are handled safely.

By always masking all signals in all threads and calling sigwait() as necessary, your application is much safer for threads that depend on signals.

Usually, you create one or more threads that call sigwait() to wait for signals. Because sigwait() retrieves even masked signals, be sure to block the signals of interest in all other threads so the signals are not accidentally delivered.

When a signal arrives, a signal-handling thread returns from sigwait() , handles the signal, and calls sigwait() again to wait for more signals. The signal-handling thread is not restricted to using Async-Signal-Safe functions. The signal-handling thread can synchronize with other threads in the usual way. The Async-Signal-Safe category is defined in MT Interface Safety Levels.

Note –

sigwait() cannot receive synchronously generated signals.

Waiting for Specified Signal Within a Given Time

sigtimedwait(3RT) is similar to sigwait(2) except that sigtimedwait() fails and returns an error when a signal is not received in the indicated amount of time. See the sigtimedwait(3RT) man page for more information.

Thread-Directed Signals

The UNIX signal mechanism is extended with the idea of thread-directed signals. Thread-directed signals are just like ordinary asynchronous signals, except that thread-directed signals are sent to a particular thread instead of to a process.

A separate thread that waits for asynchronous signals can be safer and easier than installing a signal handler that processes the signals.

A better way to deal with asynchronous signals is to treat these signals synchronously. By calling sigwait(2), a thread can wait until a signal occurs. See Waiting for a Specified Signal.

Example 5–2 Asynchronous Signals and sigwait(2)

main() {
    sigset_t set;
    void runA(void);
    int sig;

    sigaddset(&set, SIGINT);
    pthread_sigmask(SIG_BLOCK, &set, NULL);
    pthread_create(NULL, 0, runA, NULL, PTHREAD_DETACHED, NULL);

    while (1) {
        sigwait(&set, &sig);
        printf("nestcount = %d\n", nestcount);
        printf("received signal %d\n", sig);

void runA() {

This example modifies the code of Example 5–1. The main routine masks the SIGINT signal, creates a child thread that calls function A of the previous example, and issues sigwait() to handle the SIGINT signal.

Note that the signal is masked in the compute thread because the compute thread inherits its signal mask from the main thread. The main thread is protected from SIGINT while, and only while, the thread is not blocked inside of sigwait().

Also, note that no danger exists of having system calls interrupted when you use sigwait().

Completion Semantics

Another way to deal with signals is with completion semantics.

Use completion semantics when a signal indicates that something so catastrophic has happened that no reason exists to continue executing the current code block. The signal handler runs instead of the remainder of the block that had the problem. In other words, the signal handler completes the block.

In Example 5–3, the block in question is the body of the then part of the if statement. The call to setjmp(3C) saves the current register state of the program in jbuf and returns 0, thereby executing the block.

Example 5–3 Completion Semantics

sigjmp_buf jbuf;
void mult_divide(void) {
    int a, b, c, d;
    void problem();

    sigset(SIGFPE, problem);
    while (1) {
        if (sigsetjmp(&jbuf) == 0) {
            printf("Three numbers, please:\n");
            scanf("%d %d %d", &a, &b, &c);
            d = a*b/c;
            printf("%d*%d/%d = %d\n", a, b, c, d);

void problem(int sig) {
    printf("Couldn't deal with them, try again\n");
    siglongjmp(&jbuf, 1);

If a SIGFPE floating-point exception occurs, the signal handler is invoked.

The signal handler calls siglongjmp(3C), which restores the register state saved in jbuf, causing the program to return from sigsetjmp() again. The registers that are saved include the program counter and the stack pointer.

This time, however, sigsetjmp(3C) returns the second argument of siglongjmp(), which is 1. Notice that the block is skipped over, only to be executed during the next iteration of the while loop.

You can use sigsetjmp(3C) and siglongjmp(3C) in multithreaded programs. Be careful that a thread never does a siglongjmp() that uses the results of another thread's sigsetjmp().

Also, sigsetjmp() and siglongjmp() restore as well as save the signal mask, but setjmp(3C) and longjmp(3C) do not.

Use sigsetjmp() and siglongjmp() when you work with signal handlers.

Completion semantics are often used to deal with exceptions. In particular, the Sun AdaTM programming language uses this model.

Note –

Remember, sigwait(2) should never be used with synchronous signals.

Signal Handlers and Async-Signal Safety

A concept that is similar to thread safety is Async-Signal safety. Async-Signal-Safe operations are guaranteed not to interfere with operations that are being interrupted.

The problem of Async-Signal safety arises when the actions of a signal handler can interfere with the operation that is being interrupted.

For example, suppose a program is in the middle of a call to printf(3C), and a signal occurs whose handler calls printf(). In this case, the output of the two printf() statements would be intertwined. To avoid the intertwined output, the handler should not directly call printf() when printf() might be interrupted by a signal.

This problem cannot be solved by using synchronization primitives. Any attempt to synchronize between the signal handler and the operation being synchronized would produce an immediate deadlock.

Suppose that printf() is to protect itself by using a mutex. Now, suppose that a thread that is in a call to printf() and so holds the lock on the mutex is interrupted by a signal.

If the handler calls printf(), the thread that holds the lock on the mutex attempts to take the mutex again. Attempting to take the mutex results in an instant deadlock.

To avoid interference between the handler and the operation, ensure that the situation never arises. Perhaps you can mask off signals at critical moments, or invoke only Async-Signal-Safe operations from inside signal handlers.

The only routines that POSIX guarantees to be Async-Signal-Safe are listed in Table 5–2. Any signal handler can safely call in to one of these functions.

Table 5–2 Async-Signal-Safe Functions

























































































































Interrupted Waits on Condition Variables

When an unmasked caught signal is delivered to a thread waiting on a condition variable, when the signal handler returns, the thread returns from the condition wait function with a spurious wakeup: pthread_cond_wait() and pthread_cond_timedwait() return 0 even though no call to pthread_cond_signal() or pthread_cond_broadcast() was made by another thread. Whether SA_RESTART has been specified as a flag to sigaction() has no effect here. The pthread_cond_wait() and pthread_cond_timedwait() functions are not automatically restarted. In all cases, the associated mutex lock is reacquired before returning from the condition wait.

Re-acquisition of the associated mutex lock does not imply that the mutex is locked while the thread is executing the signal handler. The state of the mutex in the signal handler is undefined.

I/O Issues

One of the attractions of multithreaded programming is I/O performance. The traditional UNIX API gave you little assistance in this area. You either used the facilities of the file system or bypassed the file system entirely.

This section shows how to use threads to get more flexibility through I/O concurrency and multibuffering. This section also discusses the differences and similarities between the approaches of synchronous I/O with threads, and asynchronous I/O with and without threads.

I/O as a Remote Procedure Call

In the traditional UNIX model, I/O appears to be synchronous, as if you were placing a remote procedure call to the I/O device. Once the call returns, then the I/O has completed, or at least appears to have completed. A write request, for example, might merely result in the transfer of the data to a buffer in the operating environment.

The advantage of this model is familiar concept of procedure calls.

An alternative approach not found in traditional UNIX systems is the asynchronous model, in which an I/O request merely starts an operation. The program must somehow discover when the operation completes.

The asynchronous model is not as simple as the synchronous model. But, the asynchronous model has the advantage of allowing concurrent I/O and processing in traditional, single-threaded UNIX processes.

Tamed Asynchrony

You can get most of the benefits of asynchronous I/O by using synchronous I/O in a multithreaded program. With asynchronous I/O, you would issue a request and check later to determine when the I/O completes. You can instead have a separate thread perform the I/O synchronously. The main thread can then check for the completion of the operation at some later time perhaps by calling pthread_join(3C).

Asynchronous I/O

In most situations, asynchronous I/O is not required because its effects can be achieved with the use of threads, with each thread execution of synchronous I/O. However, in a few situations, threads cannot achieve what asynchronous I/O can.

The most straightforward example is writing to a tape drive to make the tape drive stream. Streaming prevents the tape drive from stopping while the drive is being written to. The tape moves forward at high speed while supplying a constant stream of data that is written to tape.

To support streaming, the tape driver in the kernel should use threads. The tape driver in the kernel must issue a queued write request when the tape driver responds to an interrupt. The interrupt indicates that the previous tape-write operation has completed.

Threads cannot guarantee that asynchronous writes are ordered because the order in which threads execute is indeterminate. You cannot, for example, specify the order of a write to a tape.

Asynchronous I/O Operations

#include <aio.h>

int aio_read(struct aiocb *aiocbp);

int aio_write(struct aiocb *aiocbp);

int aio_error(const struct aiocb *aiocbp);

ssize_t aio_return(struct aiocb *aiocbp);

int aio_suspend(struct aiocb *list[], int nent,
    const struct timespec *timeout);

int aio_waitn(struct aiocb *list[], uint_t nent, uint_t *nwait,
    const struct timespec *timeout);

int aio_cancel(int fildes, struct aiocb *aiocbp);

aio_read(3RT) and aio_write(3RT) are similar in concept to pread(2) and pwrite(2), except that the parameters of the I/O operation are stored in an asynchronous I/O control block (aiocbp) that is passed to aio_read() or aio_write():

    aiocbp->aio_fildes;    /* file descriptor */
    aiocbp->aio_buf;       /* buffer */
    aiocbp->aio_nbytes;    /* I/O request size */
    aiocbp->aio_offset;    /* file offset */

In addition, if desired, an asynchronous notification type (most commonly a queued signal) can be specified in the 'struct sigevent' member:

    aiocbp->aio_sigevent;  /* notification type */

A call to aio_read() or aio_write() results in the initiation or queueing of an I/O operation. The call returns without blocking.

The aiocbp value may be used as an argument to aio_error(3RT) and aio_return(3RT) in order to determine the error status and return status of the asynchronous operation while it is proceeding.

Waiting for I/O Operation to Complete

You can wait for one or more outstanding asynchronous I/O operations to complete by calling aio_suspend() or aio_waitn(). Use aio_error() and aio_return() on the completed asynchronous I/O control blocks to determine the success or failure of the I/O operation.

The aio_suspend() and aio_waitn() functions take a timeout argument, which indicates how long the caller is willing to wait. A NULL pointer means that the caller is willing to wait indefinitely. A pointer to a structure containing a zero value means that the caller is unwilling to wait at all.

You might start an asynchronous I/O operation, do some work, then call aio_suspend() or aio_waitn() to wait for the request to complete. Or you can rely on the asynchronous notification event specified in aio_sigevent() to occur to notify you when the operation completes.

Finally, a pending asynchronous I/O operation can be cancelled by calling aio_cancel(). This function is called with the address of the I/O control block that was used to initiate the I/O operation.

Shared I/O and New I/O System Calls

When multiple threads perform concurrent I/O operations with the same file descriptor, you might discover that the traditional UNIX I/O interface is not thread safe. The problem occurs with nonsequential I/O where the lseek(2) system call sets the file offset. The file offset is then used in the next read(2) or write(2) call to indicate where in the file the operation should start. When two or more threads are issuing an lseek() to the same file descriptor, a conflict results.

To avoid this conflict, use the pread() and pwrite() system calls.

#include <sys/types.h>
#include <unistd.h>

ssize_t pread(int fildes, void *buf, size_t nbyte, off_t offset);

ssize_t pwrite(int filedes, void *buf, size_t nbyte,
    off_t offset);

pread(2) and pwrite(2) behave just like read (2) and write(2) except that pread(2) and pwrite(2) take an additional argument, the file offset. With this argument, you specify the offset without using lseek(2), so multiple threads can use these routines safely for I/O on the same file descriptor.

Alternatives to getc and putc

An additional problem occurs with standard I/O. Programmers are accustomed to routines, such as getc(3C) and putc(3C) , that are implemented as macros, being very quick. Because of the speed of getc(3C) and putc(3C), these macros can be used within the inner loop of a program with no concerns about efficiency.

However, when getc(3C) and putc(3C) are made thread safe the macros suddenly become more expensive. The macros now require at least two internal subroutine calls, to lock and unlock a mutex.

To get around this problem, alternative versions of these routines are supplied: getc_unlocked(3C) and putc_unlocked(3C).

getc_unlocked(3C) and putc_unlocked(3C) do not acquire locks on a mutex. These getc_unlocked() or putc_unlocked() macros are as quick as the original, nonthread-safe versions of getc(3C) and putc(3C).

However, to use these macros in a thread-safe way, you must explicitly lock and release the mutexes that protect the standard I/O streams, using flockfile(3C) and funlockfile(3C). The calls to these latter routines are placed outside the loop. Calls to getc_unlocked() or putc_unlocked() are placed inside the loop.