Writing Device Drivers

Chapter 3 Multithreading

This chapter describes the locking primitives and thread synchronization mechanisms of the Solaris multithreaded kernel. Device drivers should be designed to take advantage of multithreading. This chapter provides information on the following subjects:

Locking Primitives

In traditional UNIX systems, every section of kernel code runs until it explicitly gives up the processor by calling sleep(1) or is interrupted by hardware. This is not true in the Solaris operating environment. A kernel thread can be preempted at any time to run another thread. Because all kernel threads share kernel address space and often need to read and modify the same data, the kernel provides a number of locking primitives to prevent threads from corrupting shared data. These mechanisms include mutual exclusion locks (or mutex), readers/writer locks, and semaphores.

Storage Classes of Driver Data

The storage class of data is a guide to whether the driver might need to take explicit steps to control access to the data. The three types of data storage classes are:

Automatic (stack) data – Every thread has a private stack, so drivers never need to lock automatic variables.
Global and static data – Global and static data can be shared by any number of threads in the driver; the driver might need to lock this type of data at times.
Kernel heap data – Any number of threads in the driver might share kernel heap data, such as data allocated by kmem_alloc(9F). If this data is shared, the driver needs to protect it at times.

Mutual-Exclusion Locks

A mutual-exclusion lock, or mutex, is usually associated with a set of data and regulates access to that data. Mutexes provide a way to allow only one thread at a time access to that data.

Table 3–1 Mutex Routines


Name	Description
mutex_init(9F)	Initializes a mutex
mutex_destroy(9F)	Releases any associated storage
mutex_enter(9F)	Acquires a mutex
mutex_tryenter(9F)	Acquires a mutex if available; but does not block
mutex_exit(9F)	Releases a mutex
mutex_owned(9F)	Tests to determine if the mutex is held by the current thread. To be used in ASSERT(9F) only

Setting Up Mutexes

Device drivers usually allocate a mutex for each driver data structure. The mutex is typically a field in the structure and is of type kmutex_t. mutex_init(9F) is called to prepare the mutex for use. This is usually done at attach(9E) time for per-device mutexes and _init(9E) time for global driver mutexes.

For example,

struct xxstate *xsp;
...
mutex_init(&xsp->mu, NULL, MUTEX_DRIVER, NULL);
...

For a more complete example of mutex initialization, see Chapter 5, Driver Autoconfiguration.

The driver must destroy the mutex with mutex_destroy(9F) before being unloaded. This is usually done at detach(9E) time for per-device mutexes and _fini(9E) time for global driver mutexes.

Using Mutexes

Every section of the driver code that needs to read or write the shared data structure must do the following:

Acquire the mutex
Access the data
Release the mutex

The scope of a mutex—the data it protects—is entirely up to the programmer. A mutex protects some particular data structure because the programmer chooses to do so and uses it accordingly. A mutex protects a data structure only if every code path that accesses the data structure does so while holding the mutex.

Readers/Writer Locks

A readers/writer lock regulates access to a set of data. The readers/writer lock is so called because many threads can hold the lock simultaneously for reading, but only one thread can hold it for writing.

Most device drivers do not use readers/writer locks. These locks are slower than mutexes and provide a performance gain only when protecting data that is not frequently written but is commonly read by many concurrent threads. In this case, contention for a mutex could become a bottleneck, so using a readers/writer lock might be more efficient. The readers/writer functions are summarized in the following table. See the rwlock(9F) man page for detailed information.

Table 3–2 Readers/Writer Locks


Name	Description
rw_init(9F)	Initializes a readers/writer lock
rw_destroy(9F)	Destroys a readers/writer lock
rw_enter(9F)	Acquires a readers/writer lock
rw_tryenter	Attempts to acquire a reader/writer lock without waiting
rw_tryupgrade(9F)	Attempts to upgrade readers/writer lock holding from reader to writer
rw_downgrade(9F)	Downgrades a readers/writer lock holding from writer to reader
rw_exit(9F)	Releases a readers/writer lock
rw_read_locked(9F)	Determines whether readers/writer lock is held for read or write

Semaphores

Counting semaphores are available as an alternative primitive for managing threads within device drivers. See the semaphore(9F) man page for more information.

Table 3–3 Semaphores


Name	Description
sema_init(9F)	Initialize a semaphore
sema_destroy(9F)	Destroys a semaphore
sema_p(9F)	Decrement semaphore and possibly block
sema_tryp(9F)	Attempt to decrement semaphore, but do not block
sema_p_sig(9F)	Decrement semaphore, but do not block if signal is pending
sema_v(9F)	Increment semaphore and possibly unblock waiter

Thread Synchronization

In addition to protecting shared data, drivers often need to synchronize execution among multiple threads.

Condition Variables in Thread Synchronization

Condition variables are a standard form of thread synchronization. They are designed to be used with mutexes. The associated mutex is used to ensure that a condition can be checked atomically, and that the thread can block on the associated condition variable without missing either a change to the condition or a signal that the condition has changed.

Table 3–4 lists the condvar(9F) interfaces.

Table 3–4 Condition Variable Routines


Name	Description
cv_init(9F)	Initializes a condition variable
cv_destroy(9F)	Destroys a condition variable
cv_wait(9F)	Waits for condition
cv_timedwait(9F)	Waits for condition or timeout
cv_wait_sig	Waits for condition or return zero on receipt of a signal
cv_timedwait_sig(9F)	Waits for condition or timeout or signal
cv_signal(9F)	Signals one thread waiting on the condition variable
cv_broadcast(9F)	Signals all threads waiting on the condition variable

Initializing Condition Variables

Declare a condition variable (type kcondvar_t) for each condition. Usually, this is done in the driver's soft-state structure. Use cv_init(9F) to initialize each one. Similar to mutexes, condition variables are usually initialized at attach(9E) time. For example:

cv_init(&xsp->cv, NULL, CV_DRIVER, NULL);

For a more complete example of condition variable initialization see Chapter 5, Driver Autoconfiguration.

Waiting for the Condition

To use condition variables, follow these steps in the code path waiting for the condition:

Acquire the mutex guarding the condition.
Test the condition.
If the test results do not allow the thread to continue, use cv_wait(9F) to block the current thread on the condition. cv_wait(9F) releases the mutex before blocking. Upon return from cv_wait(9F) (which will reacquire the mutex before returning), repeat the test.
Once the test allows the thread to continue, set the condition to its new value. For example, set a device flag to busy.
Release the mutex.

Signaling the Condition

Follow these steps in the code path signaling the condition:

Acquire the mutex guarding the condition.
Set the condition.
Signal the blocked thread with cv_broadcast(9F).
Release the mutex.

Example 3–1 uses a busy flag along with mutex and condition variables to force the read(9E) routine to wait until the device is no longer busy before starting a transfer.

Example 3–1 Using Mutexes and Condition Variables

static int
xxread(dev_t dev, struct uio *uiop, cred_t *credp)
{
        struct xxstate *xsp;
        ...
        mutex_enter(&xsp->mu);
        while (xsp->busy)
                cv_wait(&xsp->cv, &xsp->mu);
        xsp->busy = 1;
        mutex_exit(&xsp->mu);
        /* perform the data access */
}

static uint_t
xxintr(caddr_t arg)
{
        struct xxstate *xsp = (struct xxstate *)arg;
        mutex_enter(&xsp->mu);
        xsp->busy = 0;
        cv_broadcast(&xsp->cv);
        mutex_exit(&xsp->mu);
}

`cv_wait()` and `cv_timedwait()` Functions

If a thread blocks on a condition with cv_wait(9F), and that condition does not occur, it can wait forever. For that reason, it is often preferable to use cv_timedwait(9F), which depends upon another thread to perform a wakeup. cv_timedwait(9F) takes an absolute wait time as an argument and returns -1 if the time is reached and the event has not occurred. It returns a positive value if the condition is met.

cv_timedwait(9F) requires an absolute wait time expressed in clock ticks since the system was last rebooted. This can be determined by retrieving the current value with ddi_get_lbolt(9F). The driver usually has a maximum number of seconds or microseconds to wait, so this value is converted to clock ticks with drv_usectohz(9F) and added to the value from ddi_get_lbolt(9F).

Example 3–2 shows how to use cv_timedwait(9F) to wait up to five seconds to access the device before returning EIO to the caller.

Example 3–2 Using `cv_timedwait()`

clock_t            cur_ticks, to;
mutex_enter(&xsp->mu);
while (xsp->busy) {
        cur_ticks = ddi_get_lbolt();
        to = cur_ticks + drv_usectohz(5000000); /* 5 seconds from now */
        if (cv_timedwait(&xsp->cv, &xsp->mu, to) == -1) {
                /*
                 * The timeout time 'to' was reached without the
                 * condition being signalled.
                 */
                /* tidy up and exit */
                mutex_exit(&xsp->mu);
                return (EIO);
        }
}
xsp->busy = 1;
mutex_exit(&xsp->mu);

Although device driver writers generally find it preferable to use cv_timedwait(9F) over cv_wait(9F), there are situations in which cv_wait(9F) is a better choice. For example, cv_wait(9F) would be better when a driver is waiting on:

Internal driver state changes, where such a state change may require some command to be executed, or a set amount of time to pass.
Something the driver needs to single-thread.
Some situation that is already managing a possible timeout, as when “A” depends on “B,” and “B” itself is using cv_timedwait(9F).

`cv_wait_sig()` Function

There is always the possibility that either the driver accidentally waits for a condition that will never occur or that the condition will not happen for a long time. In either case, the user can abort the thread by sending it a signal. Whether the signal causes the driver to wake up depends upon the driver.

cv_wait_sig(9F) allows a signal to unblock the thread. This enables the user to break out of potentially long waits by sending a signal to the thread with kill(1) or by typing the interrupt character. cv_wait_sig(9F) returns zero if it is returning because of a signal, or nonzero if the condition occurred.

Example 3–3 shows how to use cv_wait_sig(9F) to allow a signal to unblock the thread.

Example 3–3 Using `cv_wait_sig()`

mutex_enter(&xsp->mu);
while (xsp->busy) {
        if (cv_wait_sig(&xsp->cv, &xsp->mu) == 0) {
        /* Signalled while waiting for the condition */
                /* tidy up and exit */
                mutex_exit(&xsp->mu);
                return (EINTR);
        }
}
xsp->busy = 1;
mutex_exit(&xsp->mu);

`cv_timedwait_sig()` Function

cv_timedwait_sig(9F) is similar to cv_timedwait(9F) and cv_wait_sig(9F), except that it returns -1 without the condition being signaled after a timeout has been reached, or 0 if a signal (for example, kill(2)) is sent to the thread.

For both cv_timedwait(9F) and cv_timedwait_sig(9F), time is measured in absolute clock ticks since the last system reboot.

Choosing a Locking Scheme

The locking scheme for most device drivers should be kept straightforward. Using additional locks allows more concurrency but increases overhead. Using fewer locks is less time consuming but allows less concurrency. Generally, use one mutex per data structure, a condition variable for each event or condition the driver must wait for, and a mutex for each major set of data global to the driver. Avoid holding mutexes for long periods of time.

Use the multithreading semantics of the entry point to your advantage.
Make all entry points re-entrant and reduce the amount of shared data by changing static variable to automatic.
If your driver acquires multiple mutexes, acquire and release the mutexes in the same order in all code paths.
Hold and release locks within the same functional space.
Avoid holding driver mutexes when calling DDI interfaces which can block, for example, kmem_alloc(9F) with KM_SLEEP.

To look at lock usage, use lockstat(1M). lockstat(1M) monitors all kernel lock events, gathers frequency and timing data about the events, and displays the data.

See the Multithreaded Programming Guide for more details on multithreaded operations.

Potential Locking Pitfalls

Mutexes are not re-entrant by the same thread. If you already own the mutex, attempting to claim it again leads to this panic:

panic: recursive mutex_enter. mutex %x caller %x

Releasing a mutex that the current thread does not hold causes this panic:

panic: mutex_adaptive_exit: mutex not held by thread

The following panic occurs only on uniprocessors:

panic: lock_set: lock held and only one CPU

It indicates that a spin mutex is held and will spin forever, because there is no other CPU to release it. This could happen because the driver forgot to release the mutex on one code path, or blocked while holding it.

A common cause of this panic is that the device's interrupt is high-level and is calling a routine that blocks the interrupt handler while holding a spin mutex. This is obvious if the driver explicitly calls cv_wait(9F), but might not be so if the driver is blocking while grabbing an adaptive mutex with mutex_enter(9F).