Writing Device Drivers

Chapter 4 Multithreading

This chapter describes the locking primitives and thread synchronization mechanisms of the SunOS multithreaded kernel.

Threads

A thread of control, or thread, is a sequence of instructions executed within a program. A thread can share data and code with other threads and can run concurrently with other threads. There are two kinds of threads: user threads and kernel threads. See Multithreaded Programming Guide for more information on threads.

User Threads

Each process in the SunOS operating system has an address space that contains one or more lightweight processes (LWPs), each of which in turn runs one or more user threads. Figure 4-1 shows the relationship between threads, LWPs, and processes. An LWP schedules its user threads and runs one user thread at a time, though multiple LWPs may run concurrently. User threads are handled in user space.

The LWP is the interface between user threads and the kernel. The LWP can be thought of as a virtual CPU that schedules user thread execution. When a user thread issues a system call, the LWP running the thread calls into the kernel and remains bound to the thread at least until the system call is complete. When an LWP is running in the kernel, executing a system call on behalf of a user thread, it runs one kernel thread. Each LWP is therefore associated with exactly one kernel thread.

Kernel Threads

There are two types of kernel threads: those bound to an LWP and those not associated with an LWP. Threads not associated with LWPs are system threads, such as those created to handle hardware interrupts. For those threads bound to an LWP, there is one and only one kernel thread per LWP. On a multiprocessor system, several kernel threads can run simultaneously. Even on uniprocessors, running kernel threads can be preempted at any time to run other threads. Drivers are mainly concerned with kernel threads as most device driver routines run as kernel threads. Figure 4-1 illustrates the relationship between threads and lightweight processes.

Figure 4-1 Threads and Lightweight Processes

Graphic

A multithreaded kernel requires consideration of locking primitives and thread synchronization.

Multiprocessing Changes Since the SunOS 4.1 System

Here is a simplified view of how the earlier releases of the SunOS kernel ran on multiprocessors. Only one processor could run kernel code at any one time, and this was enforced by using a master lock around the entire kernel. When a processor needed to execute kernel code, it acquired the master lock, blocking other processors from accessing kernel code. It released the lock on exiting the kernel.

Figure 4-2 SunOS 4.1 Kernels on a Multiprocessor

Graphic

In Figure 4-2 CPU1 executes kernel code. All other processors are locked out of the kernel; the other processors could, however, run user code.

In the SunOS 5.7 system, instead of one master lock, there are many locks that protect smaller regions of code or data. In the example shown in Figure 4-3, there is a kernel lock that controls access to data structure A, and another that controls access to data structure B. Using these locks, only one processor at a time can be executing code dealing with data structure A, but another could be accessing data within structure B. This allows a greater degree of concurrency.

Figure 4-3 SunOS 5.7 on a Multiprocessor

Graphic

In Figure 4-3, CPU1 and CPU3 are executing kernel code simultaneously.

Locking Primitives

In traditional UNIX systems, any section of kernel code runs until it explicitly gives up the processor by calling sleep() or is interrupted by hardware. This is not true in SunOS 5.7! A kernel thread can be preempted at any time to run another thread. Because all kernel threads share kernel address space, and often need to read and modify the same data, the kernel provides a number of locking primitives to prevent threads from corrupting shared data. These mechanisms include mutual exclusion locks, readers/writer locks, and semaphores.

Storage Classes of Driver Data

The storage class of data is a guide to whether the driver might need to take explicit steps to control access to the data. There are three types of data storage classes:

Mutual-Exclusion Locks

A mutual-exclusion lock, or mutex, is usually associated with a set of data and regulates access to that data. Mutexes provide a way to allow only one thread at a time access to that data.

Table 4-1 Mutex Routines

Name 

Description 

mutex_init(9F)

Initializes a mutex. 

mutex_destroy(9F)

Releases any associated storage. 

mutex_enter(9F)

Acquires a mutex. 

mutex_tryenter(9F)

Acquires a mutex if available; but does not block. 

mutex_exit(9F)

Releases a mutex. 

mutex_owned(9F)

Test sif the mutex is held by the current thread. To be used in ASSERT(9F) only. 

Setting Up Mutexes

Device drivers usually allocate a mutex for each driver data structure. The mutex is typically a field in the structure and is of type kmutex_t. mutex_init(9F) is called to prepare the mutex for use. This is usually done at attach(9E) time for per-device mutexes and _init(9E) time for global driver mutexes.

For example,

	struct xxstate *xsp;
	...
 	mutex_init(&xsp->mu, "xx mutex", MUTEX_DRIVER, NULL);
 	...

For a more complete example of mutex initialization see Chapter 5, Autoconfiguration.

The driver must destroy the mutex with mutex_destroy(9F) before being unloaded. This is usually done at detach(9E) time for per-device mutexes and _fini(9E) time for global driver mutexes.

Using Mutexes

Every section of the driver code that needs to read or write the shared data structure must do the following:

For example, to protect access to the busy flag in the state structure:

	...
 	mutex_enter(&xsp->mu);
 	xsp->busy = 0;
 	mutex_exit(&xsp->mu);
 	....

The scope of a mutex--the data it protects--is entirely up to the programmer. A mutex protects some particular data structure because the programmer chooses to do so and uses it accordingly. A mutex protects a data structure only if every code path that accesses the data structure does so while holding the mutex. For additional guidelines on using mutexes see Appendix G, Advanced Topics.

Readers/Writer Locks

A readers/writer lock regulates access to a set of data. The readers/writer lock is so called because many threads can hold the lock simultaneously for reading, but only one thread can hold it for writing.

Most device drivers do not use readers/writer locks. These locks are slower than mutexes and provide a performance gain only when protecting data that is not frequently written but is commonly read by many concurrent threads. In this case, contention for a mutex could become a bottleneck, so using a readers/writer lock might be more efficient. See rwlock(9F) for more information.

Semaphores

Counting semaphores are available as an alternative primitive for managing threads within device drivers. See semaphore(9F) for more information.

Thread Synchronization

In addition to protecting shared data, drivers often need to synchronize execution among multiple threads.

Condition Variables

Condition variables are a standard form of thread synchronization. They are designed to be used with mutexes. The associated mutex is used to ensure that a condition can be checked atomically, and that the thread can block on the associated condition variable without missing either a change to the condition or a signal that the condition has changed. Condition variables must be initialized by calling cv_init(9F) and must be destroyed by calling cv_destroy(9F).


Note -

Condition variable routines are approximately equivalent to the routines sleep() and wakeup() used in SunOS 4.1.


Table 4-2 lists the condvar(9F) interfaces. The four wait routines - cv_wait(9F), cv_timedwait(9F), cv_wait_sig(9F), and cv_timedwait_sig(9F) - take a pointer to a mutex as an argument.

Table 4-2 Condition Variable Routines

Name 

Description 

cv_init(9F) 

Initializes a condition variable. 

cv_destroy(9F) 

Destroys a condition variable. 

cv_wait(9F) 

Waits for condition. 

cv_timedwait(9F) 

Waits for condition or timeout. 

cv_wait_sig(9F) 

Waits for condition or return zero on receipt of a signal. 

cv_timedwait_sig(9F) 

Waits for condition or timeout or signal. 

cv_signal(9F) 

Signals one thread waiting on the condition variable. 

cv_broadcast(9F) 

Signals all threads waiting on the condition variable. 

Initializing Condition Variables

Declare a condition variable (type kcondvar_t) for each condition. Usually, this is done in the driver's soft-state structure. Use cv_init(9F) to initialize each one. Similar to mutexes, condition variables are usually initialized at attach(9E) time. For example:

	...
 	cv_init(&xsp->cv, NULL, CV_DRIVER, NULL);
 	...

For a more complete example of condition variable initialization see Chapter 5, Autoconfiguration.

Multithreading Additions to the State Structure

This section adds the following fields to the state structure. See "Software State Structure" for more information.

int     busy;		/* device busy flag */
 	kmutex_t       mu;		/* mutex to protect state structure */
 	kcondvar_t     cv;		/* threads wait for access here */

Using Condition Variables

To use condition variables, follow these steps in the code path waiting for the condition:

  1. Acquire the mutex guarding the condition.

  2. Test the condition.

  3. If the test results do not allow the thread to continue, use cv_wait(9F) to block the current thread on the condition. cv_wait(9F) releases the mutex before blocking. Upon return from cv_wait(9F) (which will reacquire the mutex before returning), repeat the test.

  4. Once the test allows the thread to continue, set the condition to its new value. For example, set a device flag to busy.

  5. Release the mutex.

Follow these steps in the code path signaling the condition:

  1. Acquire the mutex guarding the condition.

  2. Set the condition.

  3. Signal the blocked thread with cv_signal(9F).

  4. Release the mutex.

Example 4-1 uses a busy flag, and mutex and condition variables to force the read(9E) routine to wait until the device is no longer busy before starting a transfer.


Example 4-1 Using Mutexes and Condition Variables

static int
xxread(dev_t dev, struct uio *uiop, cred_t *credp)
{
	struct xxstate *xsp;
	...
	mutex_enter(&xsp->mu);
	while (xsp->busy)
			cv_wait(&xsp->cv, &xsp->mu);
	xsp->busy = 1;
	mutex_exit(&xsp->mu);
	perform the data access
}

static u_int
xxintr(caddr_t arg);
{
	struct xxstate *xsp = (struct xxstate *)arg;
	mutex_enter(&xsp->mu);
	xsp->busy = 0;
	cv_broadcast(&xsp->cv);
	mutex_exit(&xsp->mu);
}

In Example 4-1, xxintr()( ) always calls cv_broadcast(9F), even if there are no threads waiting on the condition. This extra call can be avoided by using a want flag in the state structure, as shown in Example 4-2. Before a thread blocks on the condition variable (such as because the device is busy), it sets the want flag, indicating that it wants to be signaled when the condition occurs. When the condition occurs (the device finishes the transfer), the call to cv_broadcast(9F) is made only if the want flag is set.


Example 4-2 Using a want Flag

static int
xxread(dev_t dev, struct uio *uiop, cred_t *credp)
{
	struct xxstate *xsp;
	...
	mutex_enter(&xsp->mu);
	while (xsp->busy) {
			xsp->want = 1;
			cv_wait(&xsp->cv, &xsp->mu);
	}
	xsp->busy = 1;
	mutex_exit(&xsp->mu);
	perform error recovery
}
static u_int
xxintr(caddr_t arg);
{
	struct xxstate *xsp = (struct xxstate *)arg;
	mutex_enter(&xsp->mu);
	xsp->busy = 0;
	if (xsp->want) {
			xsp->want = 0;
			cv_broadcast(&xsp->cv);
	}
	mutex_exit(&xsp->mu);
}

cv_timedwait(9F)

If a thread blocks on a condition with cv_wait(9F), and that condition does not occur, it may wait forever. One way to prevent this is to establish a callback with timeout(9F). This callback sets a flag indicating that the condition did not occur normally, and then unblocks the thread. The notified thread then notices that the condition did not occur and can return an error (such as device broken).

A better solution is to use cv_timedwait(9F). An absolute wait time is passed to cv_timedwait(9F), which returns -1 if the time is reached and the event has not occurred. It returns nonzero otherwise. This saves a lot of work setting up separate timeout(9F) routines and avoids having threads get stuck in the driver.

cv_timedwait(9F) requires an absolute wait time expressed in clock ticks since the system was last rebooted. This can be determined by retrieving the current value with ddi_get_lbolt(9F). The driver usually has a maximum number of seconds or microseconds to wait, so this value is converted to clock ticks with drv_usectohz(9F) and added to the value from ddi_get_lbolt(9F).

Example 4-3 shows how to use cv_timedwait(9F) to wait up to five seconds to access the device before returning EIO to the caller.


Example 4-3 Using cv_timedwait(9F)

	clock_t			cur_ticks, to;
	mutex_enter(&xsp->mu);
	while (xsp->busy) {
			cur_ticks = ddi_get_lbolt();
			to = cur_ticks + drv_usectohz(5000000); /* 5 seconds from now */
			if (cv_timedwait(&xsp->cv, &xsp->mu, to) == -1) {
				/*
				 * The timeout time 'to' was reached without the
			 	 * condition being signalled.
			 	 */
				tidy up and exit
				mutex_exit(&xsp->mu);
				return (EIO);
			}
	}
	xsp->busy = 1;
	mutex_exit(&xsp->mu);

cv_wait_sig()

There is always the possibility that either the driver accidentally waits for a condition that will never occur (as described in "cv_timedwait(9F)") or that the condition will not happen for a long time. In either case, the user may want to abort the thread by sending it a signal. Whether the signal causes the driver to wake up depends upon the driver.

cv_wait_sig(9F) allows a signal to unblock the thread. This allows the user to break out of potentially long waits by sending a signal to the thread with kill(1) or by typing the interrupt character. cv_wait_sig(9F) returns zero if it is returning because of a signal, or nonzero if the condition occurred.

Example 4-4 shows how to use cv_wait_sig(9F) to allow a signal to unblock the thread.


Example 4-4 Using cv_wait_sig(9F)

	mutex_enter(&xsp->mu);
	while (xsp->busy) {
			if (cv_wait_sig(&xsp->cv, &xsp->mu) == 0) {
				/* Signalled while waiting for the condition. */
				tidy up and exit
				mutex_exit(&xsp->mu);
				return (EINTR);
			}
	}
	xsp->busy = 1;
	mutex_exit(&xsp->mu);

cv_timedwait_sig()

cv_timedwait_sig(9F) is similar to cv_timedwait(9F) and cv_wait_sig(9F), except that it returns -1 without the condition being signaled after a timeout has been reached, or 0 if a signal (for example, kill(2)) is sent to the thread.

For both cv_timedwait(9F) and cv_timedwait_sig(9F), time is measured in absolute clock ticks since the last system reboot.

Choosing a Locking Scheme

The locking scheme for most device drivers should be kept straightforward. Using additional locks may allow more concurrency but increase overhead. Using fewer locks is less time consuming but allows less concurrency. Generally, use one mutex per data structure, a condition variable for each event or condition the driver must wait for, and a mutex for each major set of data global to the driver. Avoid holding mutexes for long periods of time.

To look at lock usage, use lockstat(1M). lockstat(1M) monitors all kernel lock events, gathers frequency and timing data about the events, and displays the data.

For more information on locking schemes, see Appendix G, Advanced Topics. Also see the Multithreaded Programming Guide for more details on multithreading operations.