Multithreaded Programming Guide

Chapter 5 Programming With the Operating Environment

This chapter describes how multithreading interacts with the Solaris operating environment and how the operating environment has changed to support multithreading.

Process Creation—Forking Issues

The default handling of the fork() function in the Solaris operating environment is somewhat different from the way fork() is handled in POSIX threads, although the Solaris operating environment does support both mechanisms.

Table 5–1 compares the differences and similarities of Solaris and pthreads fork() handling. When the comparable interface is not available either in POSIX threads or in Solaris threads, the `—' character appears in the table column.

Table 5–1 Comparing POSIX and Solaris fork() Handling


	Solaris Operating Environment Interface	POSIX Threads Interface
Fork-one model	`fork1(2)`	`fork(2)`
Fork-all model	`fork(2)`	—
Fork safety	—	`pthread_atfork(3THR)`

The Fork-One Model

As shown in Table 5–1, the behavior of the pthreads fork(2) function is the same as that of the Solaris fork1(2) function. Both the pthreads fork(2) function and the Solaris fork1(2) create a new process, duplicating the complete address space in the child, but duplicating only the calling thread in the child process.

This is useful when the child process immediately calls exec(), which is what happens after most calls to fork(). In this case, the child process does not need a duplicate of any thread other than the one that called fork().

In the child, do not call any library functions after calling fork() and before calling exec() because one of the library functions might use a lock that was held in the parent at the time of the fork(). The child process might execute only Async-Signal-Safe operations until one of the exec() handlers is called.

The Fork-One Safety Problem and Solution

In addition to all of the usual concerns such as locking shared data, a library should be well behaved with respect to forking a child process when only one thread is running (the one that called fork()). The problem is that the sole thread in the child process might try to grab a lock that is held by a thread that wasn't duplicated in the child.

This is not a problem most programs are likely to encounter. Most programs call exec() in the child right after the return from fork(). However, if the program wishes to carry out some actions in the child before the call to exec(), or never calls exec(), then the child could encounter deadlock scenarios.

Each library writer should provide a safe solution, although not providing a fork-safe library is not a large concern because this condition is rare.

For example, assume that T1 is in the middle of printing something (and so is holding a lock for printf()), when T2 forks a new process. In the child process, if the sole thread (T2) calls printf(), it promptly deadlocks.

The POSIX fork() or Solaris fork1() duplicates only the thread that calls it. (Calling the Solaris fork() duplicates all threads, so this issue does not come up.)

To prevent deadlock, ensure that no such locks are being held at the time of forking. The most obvious way to do this is to have the forking thread acquire all the locks that could possibly be used by the child. Because you cannot do this for locks like those in printf() (because printf() is owned by libc), you must ensure that printf() is not being used at fork() time.

To manage the locks in your library:

Identify all the locks used by the library.

Identify the locking order for the locks used by the library. (If a strict locking order is not used, then lock acquisition must be managed carefully.)

Arrange to acquire those locks at fork time. In Solaris threads this must be done manually, obtaining the locks just before calling fork1(), and releasing them right after.

In the following example, the list of locks used by the library is {L1,...Ln}, and the locking order for these locks is also L1...Ln.

mutex_lock(L1);
mutex_lock(L2);
fork1(...);
mutex_unlock(L1);
mutex_unlock(L2);

In pthreads, you can add a call to pthread_atfork(f1, f2, f3) in your library's .init() section, where f1(), f2(), f3() are defined as follows:

f1() /* This is executed just before the process forks. */
{
 mutex_lock(L1); |
 mutex_lock(...); | -- ordered in lock order
 mutex_lock(Ln); |
 } V

f2() /* This is executed in the child after the process forks. */
 {
 mutex_unlock(L1);
 mutex_unlock(...);
 mutex_unlock(Ln);
 }

f3() /* This is executed in the parent after the process forks. */
 {
 mutex_unlock(L1);
 mutex_unlock(...);
 mutex_unlock(Ln);
 }

Another example of deadlock would be a thread in the parent process—other than the one that called Solaris fork1(2)—that has locked a mutex. This mutex is copied into the child process in its locked state, but no thread is copied over to unlock the mutex. So, any thread in the child that tries to lock the mutex waits forever.

Virtual Forks–vfork(2)

The standard vfork(2) function is unsafe in multithreaded programs. vfork(2) is like fork1(2) in that only the calling thread is copied in the child process. As in nonthreaded implementations, vfork() does not copy the address space for the child process.

Be careful that the thread in the child process does not change memory before it calls exec(2). Remember that vfork() gives the parent address space to the child. The parent gets its address space back after the child calls exec() or exits. It is important that the child not change the state of the parent.

For example, it is disastrous to create new threads between the call to vfork() and the call to exec().

The Solution—pthread_atfork(3THR)

Use pthread_atfork() to prevent deadlocks whenever you use the fork-one model.

#include <pthread.h>

int pthread_atfork(void (*prepare) (void), void (*parent) (void),
    void (*child) (void) );

The pthread_atfork() function declares fork() handlers that are called before and after fork() in the context of the thread that called fork().

The prepare handler is called before fork() starts.
The parent handler is called after fork() returns in the parent.
The child handler is called after fork() returns in the child.

Any one of these can be set to NULL. The order in which successive calls to pthread_atfork() are made is significant.

For example, a prepare handler could acquire all the mutexes needed, and then the parent and child handlers could release them. This ensures that all the relevant locks are held by the thread that calls the fork function before the process is forked, preventing the deadlock in the child.

Using the fork all model avoids the deadlock problem described in The Fork-One Safety Problem and Solution.

Return Values

pthread_atfork() returns a zero when it completes successfully. Any other return value indicates that an error occurred. If the following condition is detected, pthread_atfork(3THR) fails and returns the corresponding value.

ENOMEM: Insufficient table space exists to record the fork handler addresses.

The Fork-All Model

The Solaris fork(2) function duplicates the address space and all the threads (and LWPs) in the child. This is useful, for example, when the child process never calls exec(2) but does use its copy of the parent address space. The fork-all functionality is not available in POSIX threads.

Note that when one thread in a process calls Solaris fork(2), threads that are blocked in an interruptible system call return EINTR.

Also, be careful not to create locks that are held by both the parent and child processes. This can happen when locks are allocated in memory that is sharable (that is use mmap() with the MAP_SHARED flag). Note that this is not a problem if the fork-one model is used.

Choosing the Right Fork

You determine whether fork() has a “fork-all” or a “fork-one” semantic in your application by linking with the appropriate library. Linking with -lthread gives you the “fork-all” semantic for fork(), and linking with -lpthread gives the “fork-one” semantic for fork() (see Figure 7–1for an explanation of compiling options).

Cautions for Any Fork

Be careful when using global state after a call to any fork() function.

For example, when one thread reads a file serially and another thread in the process successfully calls one of the forks, each process then contains a thread that is reading the file. Because the seek pointer for a file descriptor is shared after a fork(), the thread in the parent gets some data while the thread in the child gets the other. This introduces gaps in the sequential read accesses.

Process Creation—exec(2) and exit(2) Issues

Both the exec(2) and exit(2) system calls work as they do in single-threaded processes except that they destroy all the threads in the address space. Both calls block until all the execution resources (and so all active threads) are destroyed.

When exec() rebuilds the process, it creates a single lightweight process (LWP) . The process startup code builds the initial thread. As usual, if the initial thread returns, it calls exit() and the process is destroyed.

When all the threads in a process exit, the process exits. A call to any exec() function from a process with more than one thread terminates all threads, and loads and executes the new executable image. No destructor functions are called.

Timers, Alarms, and Profiling

The “End of Life” announcements for per-LWP timers (see timer_create(3RT)) and per-thread alarms (see alarm(2) or setitimer(2)) were made in the Solaris 2.5 release. Both features are now replaced with the per-process variants described in this section.

Originally, each LWP had a unique realtime interval timer and alarm that a thread bound to the LWP could use. The timer or alarm delivered one signal to the thread when the timer or alarm expired.

Each LWP also had a virtual time or profile interval timer that a thread bound to the LWP could use. When the interval timer expired, either SIGVTALRM or SIGPROF, as appropriate, was sent to the LWP that owned the interval timer.

Per-LWP POSIX Timers

In the Solaris 2.3 and 2.4 releases, the timer_create(3RT) function returned a timer object with a timer ID meaningful only within the calling LWP and with expiration signals delivered to that LWP. Because of this, the only threads that could use the POSIX timer facility were bound threads.

Even with this restricted use, POSIX timers in the Solaris 2.3 and 2.4 releases for multithreaded applications were unreliable about masking the resulting signals and delivering the associated value from the sigvent structure.

Beginning with the Solaris 2.5 release, an application that is compiled defining the macro _POSIX_PER_PROCESS_TIMERS, or with a value greater that 199506L for the symbol _POSIX_C_SOURCE, can create per-process timers.

Effective with the Solaris 9 Operating Environment, all timers are per-process except for the virtual time and profile interval timers (see setitimer(2) for ITIMER_VIRTUAL and ITIMER_PROF), which remain per-LWP.

The timer IDs of per-process timers are usable from any LWP, and the expiration signals are generated for the process rather than directed to a specific LWP.

The per-process timers are deleted only by timer_delete(3RT) or when the process terminates.

Per-Thread Alarms

In the Solaris Operating Environment 2.3 and 2.4 releases, a call to alarm(2) or setitimer(2) was meaningful only within the calling LWP. Such timers were deleted automatically when the creating LWP terminated. Because of this, the only threads that could use alarm() or setitimer() were bound threads.

Even with this restricted use, alarm() and setitimer() timers in Solaris Operating Environment 2.3 and 2.4 multithreaded applications were unreliable about masking the signals from the bound thread that issued these calls. When such masking was not required, then these two system calls worked reliably from bound threads.

With the Solaris Operating Environment 2.5 release, an application linking with -lpthread (POSIX) threads got per-process delivery of SIGALRM when calling alarm(). The SIGALRM generated by alarm() is generated for the process rather than directed to a specific LWP. Also, the alarm is reset when the process terminates.

Applications compiled with a release before the Solaris Operating Environment 2.5 release, or not linked with -lpthread, will continue to see a per-LWP delivery of signals generated by alarm() and setitimer()

Effective with the Solaris 9 Operating Environment, calls to alarm() or to setitimer(ITIMER_REAL) will cause the resulting SIGALRM signal to be sent to the process.

Profiling

In Solaris releases prior to 2.6, calling profil() in a multithreaded program would impact only the calling LWP; the profile state was not inherited at LWP creation time. To profile a multithreaded program with a global profile buffer, each thread needed to issue a call to profil() at threads start-up time, and each thread had to be a bound thread. This was cumbersome and did not easily support dynamically turning profiling on and off. In Solaris 2.6 and later releases, the profil() system call for multithreaded processes has global impact—that is, a call to profil() impacts all LWPs/threads in a process. This may cause applications that depend on the previous per-LWP semantic to break, but it is expected to improve multithreaded programs that wish to turn profiling on and off dynamically at runtime.

Nonlocal Goto—setjmp(3C) and longjmp(3C)

The scope of setjmp() and longjmp() is limited to one thread, which is fine most of the time. However, this does mean that a thread that handles a signal can longjmp() only when setjmp() is performed in the same thread.

Resource Limits

Resource limits are set on the entire process and are determined by adding the resource use of all threads in the process. When a soft resource limit is exceeded, the offending thread is sent the appropriate signal. The sum of the resources used in the process is available through getrusage(3C).

LWPs and Scheduling Classes

The Solaris kernel has three classes of scheduling. The highest-priority scheduling class is Realtime (RT). The middle-priority scheduling class is system. The system class cannot be applied to a user process. The lowest-priority scheduling class is timeshare (TS), which is also the default class.

Scheduling class is maintained for each LWP. When a process is created, the initial LWP inherits the scheduling class and priority of the creating LWP in the parent process. As more LWPs are created to run unbound threads, they also inherit this scheduling class and priority.

Threads have the scheduling class and priority of their underlying LWPs. Each LWP in a process can have a unique scheduling class and priority that is visible to the kernel. If a thread is bound, it will always be associated with the same LWP.

Thread priorities regulate contention for synchronization objects. By default LWPs are in the timesharing class. For compute-bound multithreading, thread priorities are not very useful. For multithreaded applications that do a lot of synchronization using the MT libraries, thread priorities become more meaningful.

The scheduling class is set by priocntl(2). How you specify the first two arguments determines whether just the calling LWP or all the LWPs of one or more processes are affected. The third argument of priocntl() is the command, which can be one of the following.

PC_GETCID. Get the class ID and class attributes for a specific class.

PC_GETCLINFO. Get the class name and class attributes for a specific class.

PC_GETPARMS. Get the class identifier and the class-specific scheduling parameters of a process, an LWP with a process, or a group of processes.

PC_SETPARMS. Set the class identifier and the class-specific scheduling parameters of a process, an LWP with a process, or a group of processes.

Note that priocntl() affects the scheduling of the LWP associated with the calling thread. For unbound threads, the calling thread is not guaranteed to be associated with the affected LWP after the call to priocntl() returns.

Timeshare Scheduling

Timeshare scheduling distributes the processing resource fairly among the LWPs in this scheduling class. Other parts of the kernel can monopolize the processor for short intervals without degrading response time as seen by the user.

The priocntl(2) call sets the nice(2) level of one or more processes. The priocntl() call also affects the nice() level of all the timesharing class LWPs in the process. The nice() level ranges from 0 to +20 normally and from -20 to +20 for processes with superuser privilege. The lower the value, the higher the priority.

The dispatch priority of time shared LWPs is calculated from the instantaneous CPU use rate of the LWP and from its nice() level. The nice() level indicates the relative priority of the LWPs to the timeshare scheduler.

LWPs with a greater nice() value get a smaller, but nonzero, share of the total processing. An LWP that has received a larger amount of processing is given lower priority than one that has received little or no processing.

Realtime Scheduling

The Realtime class (RT) can be applied to a whole process or to one or more LWPs in a process. This requires superuser privilege.

Unlike the nice(2) level of the timeshare class, LWPs that are classified Realtime can be assigned priorities either individually or jointly. A priocntl(2) call affects the attributes of all the Realtime LWPs in the process.

The scheduler always dispatches the highest-priority Realtime LWP. It preempts a lower-priority LWP when a higher-priority LWP becomes runnable. A preempted LWP is placed at the head of its level queue.

A Realtime LWP retains control of a processor until it is preempted, it suspends, or its Realtime priority is changed. LWPs in the RT class have absolute priority over processes in the TS class.

A new LWP inherits the scheduling class of the parent process or LWP. An RT class LWP inherits the parent's time slice, whether finite or infinite.

An LWP with a finite time slice runs until it terminates, blocks (for example, to wait for an I/O event), is preempted by a higher-priority runnable Realtime process, or the time slice expires.

An LWP with an infinite time slice ceases execution only when it terminates, blocks, or is preempted.

Fair Share Scheduling

The fair share scheduler (FSS) scheduling class allows allocation of CPU time based on shares.

By default, the FSS scheduling class uses the same range of priorities (0 to 59) as the TS and interactive (IA) scheduling classes. All LWPs in a process must run in the same scheduling class. The FSS class schedules individual LWPs, not whole processes. Thus, a mix of processes in the FSS and TS/IA classes could result in unexpected scheduling behavior in both cases.

With the use of processor sets, you can mix TS/IA with FSS in one system as long as all the processes running on each processor set are in either the TS/IA or the FSS scheduling class, so they do not compete for the same CPUs.

Fixed Priority Scheduling

The fixed priority scheduling class (FX) assigns fixed priorities and time quantum that are not adjusted to accomodate resource consumption. Process priority can be changed only by the process itself or another appropriately privileged process. For more information about FX, see the priocntl(1) and dispadmin(1M) manual pages.

Threads in this class share the same range of priorities (0 to 59) as the TS and interactive (IA) scheduling classes. TS is usually the default. FX will usually be used in conjunction with TS.

Extending Traditional Signals

The traditional UNIX signal model is extended to threads in a fairly natural way. The key characteristics are that the signal disposition is process-wide, but the signal mask is per-thread. The process-wide disposition of signals is established using the traditional mechanisms (signal(3C), sigaction(2), and so on).

When a signal handler is marked SIG_DFL or SIG_IGN, the action on receipt of the signal (exit, core dump, stop, continue, or ignore) is performed on the entire receiving process, affecting all threads in the process. For these signals that don't have handlers, the issue of which thread picks the signal is unimportant, because the action on receipt of the signal is carried out on the whole process. See signal(5) for basic information about signals.

Each thread has its own signal mask. This lets a thread block some signals while it uses memory or another state that is also used by a signal handler. All threads in a process share the set of signal handlers set up by sigaction(2) and its variants.

A thread in one process cannot send a signal to a specific thread in another process. A signal sent by kill(2), sigsend(2) or sigqueue(3RT) to a process is handled by any one of the receptive threads in the process.

Signals are divided into two categories: traps and exceptions (synchronously generated signals) and interrupts (asynchronously generated signals).

As in traditional UNIX, if a signal is pending, additional occurrences of that signal normally have no additional effect—a pending signal is represented by a bit, not by a counter. However, signals posted via the sigqueue(3RT) interface allow multiple instances of the same signal to be queued to the process.

As is the case with single-threaded processes, when a thread receives a signal while blocked in a system call, the thread might return early, either with the EINTR error code, or, in the case of I/O calls, with fewer bytes transferred than requested.

Of particular importance to multithreaded programs is the effect of signals on pthread_cond_wait(3THR). This call usually returns without error ( a return value of zero) only in response to a pthread_cond_signal(3THR) or a pthread_cond_broadcast(3THR), but, if the waiting thread receives a traditional UNIX signal, it returns with a return value of zero even though the wakeup was spurious. The Solaris threads cond_wait(3THR) function returns EINTR in this circumstance. See Interrupted Waits on Condition Variablesfor more information.

Synchronous Signals

Traps (such as SIGILL, SIGFPE, SIGSEGV) result from something a thread does to itself, such as dividing by zero or making reference to nonexistent memory. A trap is handled only by the thread that caused it. Several threads in a process can generate and handle the same type of trap simultaneously.

Extending the idea of signals to individual threads is easy for synchronously-generated signals—the handler is invoked on the thread that generated the synchronous signal.

However, if the process has not chosen to deal with such problems by establishing an appropriate signal handler, then the default action will be taken when a trap occurs, even if the offending thread has the generated signal blocked. The default action for such signals is to terminate the process, perhaps with a core dump.

Because such a synchronous signal usually means that something is seriously wrong with the whole process, and not just with a thread, terminating the process is often a good choice.

Asynchronous Signals

Interrupts (such as SIGINT and SIGIO) are asynchronous with any thread and result from some action outside the process. They might be signals sent explicitly by another process, or they might represent external actions such as a user typing Control-c.

An interrupt can be handled by any thread whose signal mask allows it. When more than one thread is able to receive the interrupt, only one is chosen.

When multiple occurrences of the same signal are sent to a process, then each occurrence can be handled by a separate thread, as long as threads are available that do not have it masked. When all threads have the signal masked, then the signal is marked pending and the first thread to unmask the signal handles it.

Continuation Semantics

Continuation semantics are the traditional way to deal with signals. The idea is that when a signal handler returns, control resumes where it was at the time of the interruption. This is well suited for asynchronous signals in single-threaded processes, as shown in Example 5–1.

This is also used as the exception-handling mechanism in some programming languages, such as PL/1.

Example 5–1 Continuation Semantics

unsigned int nestcount;

unsigned int A(int i, int j) {
    nestcount++;

    if (i==0)
        return(j+1)
    else if (j==0)
        return(A(i-1, 1));
    else
        return(A(i-1, A(i, j-1)));
}

void sig(int i) {
    printf("nestcount = %d\n", nestcount);
}

main() {
    sigset(SIGINT, sig);
    A(4,4);
}

Operations on Signals

pthread_sigmask(3THR)

pthread_sigmask(3THR) does for a thread what sigprocmask(2) does for a process—it sets the thread's signal mask. When a new thread is created, its initial mask is inherited from its creator.

The call to sigprocmask() in a multithreaded process is equivalent to a call to pthread_sigmask(). See the sigprocmask(2) page for more information.

pthread_kill(3THR)

pthread_kill(3THR) is the thread analog of kill(2)—it sends a signal to a specific thread. This, of course, is different from sending a signal to a process. When a signal is sent to a process, the signal can be handled by any thread in the process. A signal sent by pthread_kill() can be handled only by the specified thread.

Note than you can use pthread_kill() to send signals only to threads in the current process. This is because the thread identifier (type thread_t) is local in scope—it is not possible to name a thread in any process but your own.

Note also that the action taken (handler, SIG_DFL, SIG_IGN) on receipt of a signal by the target thread is global, as usual. This means, for example, that if you send SIGXXX to a thread, and the SIGXXX signal disposition for the process is to kill the process, then the whole process is killed when the target thread receives the signal.

sigwait(2)

For multithreaded programs, sigwait(2) is the preferred interface to use, because it deals well with asynchronously generated signals.

sigwait() causes the calling thread to wait until any signal identified by its set argument is delivered to the thread. While the thread is waiting, signals identified by the set argument are unmasked, but the original mask is restored when the call returns.

All signals identified by the set argument must be blocked on all threads, including the calling thread; otherwise, sigwait() might not work correctly.

Use sigwait() to separate threads from asynchronous signals. You can create one thread that is listening for asynchronous signals while your other threads are created to block any asynchronous signals that might be set to this process.

New `sigwait()` Implementations

Two versions of sigwait() are available beginning with the Solaris Operating Environment 2.5 release: the new Solaris Operating Environment 2.5 version, and the POSIX.1c version. New applications and libraries should use the POSIX standard interface, as the Solaris Operating Environment version might not be available in future releases.

The syntax for the two versions of sigwait() is shown below.

#include <signal.h>

/* the Solaris 2.5 version*/
int sigwait(sigset_t *set);

/* the POSIX.1c version */
int sigwait(const sigset_t *set, int *sig);

When the signal is delivered, the POSIX.1c sigwait() clears the pending signal and places the signal number in sig. Many threads can call sigwait() at the same time, but only one thread returns for each signal that is received.

With sigwait() you can treat asynchronous signals synchronously—a thread that deals with such signals simply calls sigwait() and returns as soon as a signal arrives. By ensuring that all threads (including the caller of sigwait()) have such signals masked, you can be sure that signals are handled only by the intended handler and that they are handled safely.

By always masking all signals in all threads, and just calling sigwait() as necessary, your application will be much safer for threads that depend on signals.

Usually, you create one or more threads that call sigwait() to wait for signals. Because sigwait() can retrieve even masked signals, be sure to block the signals of interest in all other threads so they are not accidentally delivered.

When a signal arrives, a thread returns from sigwait(), handles the signal, and calls sigwait() again to wait for more signals. The signal-handling thread is not restricted to using Async-Signal-Safe functions and can synchronize with other threads in the usual way. (The Async-Signal-Safe category is defined in MT Interface Safety Levels.)

Note –

sigwait() cannot receive synchronously-generated signals.

sigtimedwait(3RT)

sigtimedwait(3RT) is similar to sigwait(2) except that it fails and returns an error when a signal is not received in the indicated amount of time.

Thread-Directed Signals

The UNIX signal mechanism is extended with the idea of thread-directed signals. These are just like ordinary asynchronous signals, except that they are sent to a particular thread instead of to a process.

Waiting for asynchronous signals in a separate thread can be safer and easier than installing a signal handler and processing the signals there.

A better way to deal with asynchronous signals is to treat them synchronously. By calling sigwait(2), discussed in sigwait(2), a thread can wait until a signal occurs.

Example 5–2 Asynchronous Signals and sigwait(2)

main() {
    sigset_t set;
    void runA(void);
    int sig;

    sigemptyset(&set);
    sigaddset(&set, SIGINT);
    pthread_sigmask(SIG_BLOCK, &set, NULL);
    pthread_create(NULL, 0, runA, NULL, PTHREAD_DETACHED, NULL);

    while (1) {
        sigwait(&set, &sig);
        printf("nestcount = %d\n", nestcount);
        printf("received signal %d\n", sig);
    }
}

void runA() {
    A(4,4);
    exit(0);
}

This example modifies the code of Example 5–1: the main routine masks the SIGINT signal, creates a child thread that calls the function A of the previous example, and then issues sigwait() to handle the SIGINT signal.

Note that the signal is masked in the compute thread because the compute thread inherits its signal mask from the main thread. The main thread is protected from SIGINT while, and only while, it is not blocked inside of sigwait().

Also, note that there is never any danger of having system calls interrupted when you use sigwait().

Completion Semantics

Another way to deal with signals is with completion semantics.

Use completion semantics when a signal indicates that something so catastrophic has happened that there is no reason to continue executing the current code block. The signal handler runs instead of the remainder of the block that had the problem. In other words, the signal handler completes the block.

In Example 5–3, the block in question is the body of the then part of the if statement. The call to setjmp(3C) saves the current register state of the program in jbuf and returns 0, thereby executing the block.

Example 5–3 Completion Semantics

sigjmp_buf jbuf;
void mult_divide(void) {
    int a, b, c, d;
    void problem();

    sigset(SIGFPE, problem);
    while (1) {
        if (sigsetjmp(&jbuf) == 0) {
            printf("Three numbers, please:\n");
            scanf("%d %d %d", &a, &b, &c);
            d = a*b/c;
            printf("%d*%d/%d = %d\n", a, b, c, d);
        }
    }
}

void problem(int sig) {
    printf("Couldn't deal with them, try again\n");
    siglongjmp(&jbuf, 1);
}

If a SIGFPE (a floating-point exception) occurs, the signal handler is invoked.

The signal handler calls siglongjmp(3C), which restores the register state saved in jbuf, causing the program to return from sigsetjmp() again (among the registers saved are the program counter and the stack pointer).

This time, however, sigsetjmp(3C) returns the second argument of siglongjmp(), which is 1. Notice that the block is skipped over, only to be executed during the next iteration of the while loop.

Note that you can use sigsetjmp(3C) and siglongjmp(3C) in multithreaded programs, but be careful that a thread never does a siglongjmp() using the results of another thread's sigsetjmp().

Also, sigsetjmp() and siglongjmp() save and restore the signal mask, but setjmp(3C) and longjmp(3C) do not.

It is best to use sigsetjmp() and siglongjmp() when you work with signal handlers.

Completion semantics are often used to deal with exceptions. In particular, the Sun Ada^TM programming language uses this model.

Note –

Remember, sigwait(2) should never be used with synchronous signals.

Signal Handlers and Async-Signal Safety

A concept similar to thread safety is Async-Signal safety. Async-Signal-Safe operations are guaranteed not to interfere with operations that are being interrupted.

The problem of Async-Signal safety arises when the actions of a signal handler can interfere with the operation that is being interrupted.

For example, suppose a program is in the middle of a call to printf(3S) and a signal occurs whose handler itself calls printf(). In this case, the output of the two printf() statements would be intertwined. To avoid this, the handler should not call printf() itself when printf() might be interrupted by a signal.

This problem cannot be solved by using synchronization primitives because any attempted synchronization between the signal handler and the operation being synchronized would produce immediate deadlock.

Suppose that printf() is to protect itself by using a mutex. Now suppose that a thread that is in a call to printf(), and so holds the lock on the mutex, is interrupted by a signal.

If the handler (being called by the thread that is still inside of printf()) itself calls printf(), the thread that holds the lock on the mutex will attempt to take it again, resulting in an instant deadlock.

To avoid interference between the handler and the operation, either ensure that the situation never arises (perhaps by masking off signals at critical moments) or invoke only Async-Signal-Safe operations from inside signal handlers.

The only routines that POSIX guarantees to be Async-Signal-Safe are listed in Table 5–2. Any signal handler can safely call in to one of these functions.

Table 5–2 Async-Signal-Safe Functions


`_exit()`	`fstat()`	`read()`	`sysconf()`
`access()`	`getegid()`	`rename()`	`tcdrain()`
`alarm()`	`geteuid()`	`rmdir()`	`tcflow()`
`cfgetispeed()`	`getgid()`	`setgid()`	`tcflush()`
`cfgetospeed()`	`getgroups()`	`setpgid()`	`tcgetattr()`
`cfsetispeed()`	`getpgrp()`	`setsid()`	`tcgetpgrp()`
`cfsetospeed()`	`getpid()`	`setuid()`	`tcsendbreak()`
`chdir()`	`getppid()`	`sigaction()`	`tcsetattr()`
`chmod()`	`getuid()`	`sigaddset()`	`tcsetpgrp()`
`chown()`	`kill()`	`sigdelset()`	`time()`
`close()`	`link()`	`sigemptyset()`	`times()`
`creat()`	`lseek()`	`sigfillset()`	`umask()`
`dup2()`	`mkdir()`	`sigismember()`	`uname()`
`dup()`	`mkfifo()`	`sigpending()`	`unlink()`
`execle()`	`open()`	`sigprocmask()`	`utime()`
`execve()`	`pathconf()`	`sigsuspend()`	`wait()`
`fcntl()`	`pause()`	`sleep()`	`waitpid()`
`fork()`	`pipe()`	`stat()`	`write()`

Interrupted Waits on Condition Variables

When an unmasked, caught signal is delivered to a thread while the thread is waiting on a condition variable, then when the signal handler returns, the thread returns from the condition wait with a spurious wakeup (one not caused by a condition signal call from another thread). In this case, the Solaris threads interfaces (cond_wait() and cond_timedwait()) return EINTR while the POSIX threads interfaces (pthread_cond_wait() and pthread_cond_timedwait()) return 0. In all cases, the associated mutex lock is reacquired before returning from the condition wait.

This does not imply that the mutex is locked while the thread is executing the signal handler. The state of the mutex in the signal handler is undefined.

The implementation of libthread in releases of Solaris prior to the Solaris 9 release guaranteed that the mutex was held while in the signal handler. Applications that rely on this old behavior will require revision for Solaris 9 and subsequent releases.

Handler cleanup is illustrated by Example 5–4.

Example 5–4 Condition Variables and Interrupted Waits

int sig_catcher() {
    sigset_t set;
    void hdlr();

    mutex_lock(&mut);

    sigemptyset(&set);
    sigaddset(&set, SIGINT);
    sigsetmask(SIG_UNBLOCK, &set, 0);

    if (cond_wait(&cond, &mut) == EINTR) {
        /* signal occurred and lock is held */
        cleanup();
        mutex_unlock(&mut);
        return(0);
    }
    normal_processing();
    mutex_unlock(&mut);
    return(1);
}

void hdlr() {
    /* state of the lock is undefined */
    ...
}

Assume that the SIGINT signal is blocked in all threads on entry to sig_catcher() and that hdlr() has been established (with a call to sigaction(2)) as the handler for the SIGINT signal. When an unmasked and caught instance of the SIGINT signal is delivered to the thread while it is in cond_wait(), the thread calls hdlr(), then returns to the cond_wait() function where the lock on the mutex is reacquired, if necessary, and then returns EINTR from cond_wait().

Note that whether SA_RESTART has been specified as a flag to sigaction() has no effect here; cond_wait(3THR) is not a system call and is not automatically restarted. When a caught signal occurs while a thread is blocked in cond_wait(), the call always returns EINTR.

I/O Issues

One of the attractions of multithreaded programming is I/O performance. The traditional UNIX API gave you little assistance in this area—you either used the facilities of the file system or bypassed the file system entirely.

This section shows how to use threads to get more flexibility through I/O concurrency and multibuffering. This section also discusses the differences and similarities between the approaches of synchronous I/O (with threads) and asynchronous I/O (with and without threads).

I/O as a Remote Procedure Call

In the traditional UNIX model, I/O appears to be synchronous, as if you were placing a remote procedure call to the I/O device. Once the call returns, then the I/O has completed (or at least it appears to have completed—a write request, for example, might merely result in the transfer of the data to a buffer in the operating environment).

The advantage of this model is that it is easy to understand because, as a programmer you are very familiar with the concept of procedure calls.

An alternative approach not found in traditional UNIX systems is the asynchronous model, in which an I/O request merely starts an operation. The program must somehow discover when the operation completes.

This approach is not as simple as the synchronous model, but it has the advantage of allowing concurrent I/O and processing in traditional, single-threaded UNIX processes.

Tamed Asynchrony

You can get most of the benefits of asynchronous I/O by using synchronous I/O in a multithreaded program. Where, with asynchronous I/O, you would issue a request and check later to determine when it completes, you can instead have a separate thread perform the I/O synchronously. The main thread can then check (perhaps by calling pthread_join(3THR)) for the completion of the operation at some later time.

Asynchronous I/O

In most situations there is no need for asynchronous I/O, since its effects can be achieved with the use of threads, with each thread doing synchronous I/O. However, in a few situations, threads cannot achieve what asynchronous I/O can.

The most straightforward example is writing to a tape drive to make the tape drive stream. Streaming prevents the tape drive from stopping while it is being written to and moves the tape forward at high speed while supplying a constant stream of data that is written to tape.

To do this, the tape driver in the kernel must issue a queued write request when the tape driver responds to an interrupt that indicates that the previous tape-write operation has completed.

Threads cannot guarantee that asynchronous writes will be ordered because the order in which threads execute is indeterminate. Specifying the order of a write to a tape, for example, is not possible.

Asynchronous I/O Operations

#include <sys/asynch.h>

int aioread(int fildes, char *bufp, int bufs, off_t offset,
    int whence, aio_result_t *resultp);

int aiowrite(int filedes, const char *bufp, int bufs,
    off_t offset, int whence, aio_result_t *resultp);

aio_result_t *aiowait(const struct timeval *timeout);

int aiocancel(aio_result_t *resultp);

aioread(3AIO) and aiowrite(3AIO) are similar in form to pread(2) and pwrite(2), except for the addition of the last argument. Calls to aioread() and aiowrite() result in the initiation (or queueing) of an I/O operation.

The call returns without blocking, and the status of the call is returned in the structure pointed to by resultp. This is an item of type aio_result_t that contains the following:

int aio_return;
int aio_errno;

When a call fails immediately, the failure code can be found in aio_errno. Otherwise, this field contains AIO_INPROGRESS, meaning that the operation has been successfully queued.

You can wait for an outstanding asynchronous I/O operation to complete by calling aiowait(3AIO). This returns a pointer to the aio_result_t structure supplied with the original aioread(3AIO) or aiowrite(3) call.

This time aio_result_t contains whatever read(2) or write(2) would have returned if one of them had been called instead of the asynchronous version. If the read() or write() is successful, aio_return contains the number of bytes that were read or written; if it was not successful, aio_return is -1, and aio_errno contains the error code.

aiowait() takes a timeout argument, which indicates how long the caller is willing to wait. As usual, a NULL pointer here means that the caller is willing to wait indefinitely, and a pointer to a structure containing a zero value means that the caller is unwilling to wait at all.

You might start an asynchronous I/O operation, do some work, then call aiowait() to wait for the request to complete. Or you can use SIGIO to be notified, asynchronously, when the operation completes.

Finally, a pending asynchronous I/O operation can be cancelled by calling aiocancel(). This routine is called with the address of the result area as an argument. This result area identifies which operation is being cancelled.

Shared I/O and New I/O System Calls

When multiple threads are performing I/O operations at the same time with the same file descriptor, you might discover that the traditional UNIX I/O interface is not thread safe. The problem occurs with nonsequential I/O. This uses the lseek(2) system call to set the file offset, which is then used in the next read(2) or write(2) call to indicate where in the file the operation should start. When two or more threads are issuing lseeks() to the same file descriptor, a conflict results.

To avoid this conflict, use the pread(2) and pwrite(2) system calls.

#include <sys/types.h>
#include <unistd.h>

ssize_t pread(int fildes, void *buf, size_t nbyte, off_t offset);

ssize_t pwrite(int filedes, void *buf, size_t nbyte,
    off_t offset);

These behave just like read(2) and write(2) except that they take an additional argument, the file offset. With this argument, you specify the offset without using lseek(2), so multiple threads can use these routines safely for I/O on the same file descriptor.

Alternatives to getc(3C) and putc(3C)

An additional problem occurs with standard I/O. Programmers are accustomed to routines such as getc(3C) and putc(3C) being very quick—they are implemented as macros. Because of this, they can be used within the inner loop of a program with no concerns about efficiency.

However, when they are made thread safe they suddenly become more expensive—they now require (at least) two internal subroutine calls, to lock and unlock a mutex.

To get around this problem, alternative versions of these routines are supplied, getc_unlocked(3S) and putc_unlocked(3C).

These do not acquire locks on a mutex and so are as quick as the original, nonthread-safe versions of getc(3C) and putc(3C).

However, to use them in a thread-safe way, you must explicitly lock and release the mutexes that protect the standard I/O streams, using flockfile(3C) and funlockfile(3C). The calls to these latter routines are placed outside the loop, and the calls to getc_unlocked() or putc_unlocked() are placed inside the loop.