Multithreaded Programming Guide

Creating and Using Threads

The threads packages will cache the threads data structure, stacks, and LWPs so that the repetitive creation of unbound threads can be inexpensive.

Unbound thread creation has considerable overhead when compared to process creation or even to bound thread creation. In fact, the overhead is similar to unbound thread synchronization when you include the context switches to stop one thread and start another.

So, creating and destroying threads as they are required is usually better than attempting to manage a pool of threads that wait for independent work.

A good example of this is an RPC server that creates a thread for each request and destroys it when the reply is delivered, instead of trying to maintain a pool of threads to service requests.

While thread creation has less overhead compared to that of process creation, it is not efficient when compared to the cost of a few instructions. Create threads for processing that lasts at least a couple of thousand machine instructions.

Lightweight Processes

Figure 9-1 illustrates the relationship between LWPs and the user and kernel levels.

Figure 9-1 Multithreading Levels and Relationships

The user-level threads library, with help from the programmer and the operating environment, ensures that the number of LWPs available is adequate for the currently active user-level threads. However, there is no one-to-one mapping between user threads and LWPs, and user-level threads can freely migrate from one LWP to another.

With Solaris threads, a programmer can tell the threads library how many threads should be "running" at the same time.

For example, if the programmer says that up to three threads should run at the same time, then at least three LWPs should be available. If there are three available processors, the threads run in parallel. If there is only one processor, then the operating environment multiplexes the three LWPs on that one processor. If all the LWPs block, the threads library adds another LWP to the pool.

When a user thread blocks due to synchronization, its LWP transfers to another runnable thread. This transfer is done with a coroutine linkage and not with a system call.

The operating environment decides which LWP should run on which processor and when. It has no knowledge about what user threads are or how many are active in each process.

The kernel schedules LWPs onto CPU resources according to their scheduling classes and priorities. The threads library schedules threads on the process pool of LWPs in much the same way.

Each LWP is independently dispatched by the kernel, performs independent system calls, incurs independent page faults, and runs in parallel on a multiprocessor system.

An LWP has some capabilities that are not exported directly to threads, such as a special scheduling class.

Unbound Threads

The library invokes LWPs as needed and assigns them to execute runnable threads. The LWP assumes the state of the thread and executes its instructions. If the thread becomes blocked on a synchronization mechanism, or if another thread should be run, the thread state is saved in process memory and the threads library assigns another thread to the LWP to run.

Bound Threads

Sometimes having more threads than LWPs, as can happen with unbound threads, is a disadvantage.

For example, a parallel array computation divides the rows of its arrays among different threads. If there is one LWP for each processor, but multiple threads for each LWP, each processor spends time switching between threads. In this case, it is better to have one thread for each LWP, divide the rows among a smaller number of threads, and reduce the number of thread switches.

A mixture of threads that are permanently bound to LWPs and unbound threads is also appropriate for some applications.

An example of this is a realtime application that has some threads with system-wide priority and realtime scheduling, and other threads that attend to background computations. Another example is a window system with unbound threads for most operations and a mouse serviced by a high-priority, bound, realtime thread.

When a user-level thread issues a system call, the LWP running the thread calls into the kernel and remains attached to the thread at least until the system call completes.

Bound threads require more overhead than unbound threads. Because bound threads can change the attributes of the underlying LWP, the LWPs are not cached when the bound threads exit. Instead, the operating environment provides a new LWP when a bound thread is created and destroys it when the bound thread exits.

Use bound threads only when a thread needs resources that are available only through the underlying LWP, such as a virtual time interval timer or an alternate stack, or when the thread must be visible to the kernel to be scheduled with respect to all other active threads in the system, as in realtime scheduling.

Use unbound threads even when you expect all threads to be active simultaneously. This enables Solaris threads to cache LWP and thread resources efficiently so that thread creation and destruction are fast. Use thr_setconcurrency(3T) to tell Solaris threads how many threads you expect to be simultaneously active.

Thread Concurrency (Solaris Threads Only)

By default, Solaris threads attempts to adjust the system execution resources (LWPs) used to run unbound threads to match the real number of active threads. While the Solaris threads package cannot make perfect decisions, it at least ensures that the process continues to make progress.

When you have some idea of the number of unbound threads that should be simultaneously active (executing code or system calls), tell the library through thr_setconcurrency(3THR).

For example:

A database server that has a thread for each user should tell Solaris threads the expected number of simultaneously active users.

A window server that has one thread for each client should tell Solaris threads the expected number of simultaneously active clients.

A file copy program that has one reader thread and one writer thread should tell Solaris threads that the desired concurrency level is two.

Alternatively, the concurrency level can be incremented by one through the THR_NEW_LWP flag as each thread is created.

Include unbound threads blocked on interprocess (USYNC_PROCESS) synchronization variables as active when you compute thread concurrency. Exclude bound threads--they do not require concurrency support from Solaris threads because they are equivalent to LWPs.

Efficiency

A new thread is created with thr_create(3T) in less time than an existing thread can be restarted. This means that it is more efficient to create a new thread when one is needed and have it call thr_exit(3T) when it has completed its task than it would be to stockpile an idle thread and restart it.

Thread Creation Guidelines

Here are some simple guidelines for using threads.

Use threads for independent activities that must do a meaningful amount of work.

Use Solaris threads to take advantage of CPU concurrency.

Use bound threads only when absolutely necessary, that is, when some facility of the underlying LWP is required.