Writing Device Drivers

Chapter 3 Multithreading

This chapter describes the locking primitives and thread synchronization mechanisms of the Solaris multithreaded kernel. Device drivers should be designed to take advantage of multithreading. This chapter provides information on the following subjects:

Locking Primitives

In traditional UNIX systems, every section of kernel code runs until it explicitly gives up the processor by calling sleep(1) or is interrupted by hardware. This is not true in the Solaris operating environment. A kernel thread can be preempted at any time to run another thread. Because all kernel threads share kernel address space and often need to read and modify the same data, the kernel provides a number of locking primitives to prevent threads from corrupting shared data. These mechanisms include mutual exclusion locks (or mutex), readers/writer locks, and semaphores.

Storage Classes of Driver Data

The storage class of data is a guide to whether the driver might need to take explicit steps to control access to the data. The three types of data storage classes are:

Mutual-Exclusion Locks

A mutual-exclusion lock, or mutex, is usually associated with a set of data and regulates access to that data. Mutexes provide a way to allow only one thread at a time access to that data.

Table 3–1 Mutex Routines

Name 

Description 

mutex_init(9F)

Initializes a mutex 

mutex_destroy(9F)

Releases any associated storage 

mutex_enter(9F)

Acquires a mutex 

mutex_tryenter(9F)

Acquires a mutex if available; but does not block 

mutex_exit(9F)

Releases a mutex 

mutex_owned(9F)

Tests to determine if the mutex is held by the current thread. To be used in ASSERT(9F) only

Setting Up Mutexes

Device drivers usually allocate a mutex for each driver data structure. The mutex is typically a field in the structure and is of type kmutex_t. mutex_init(9F) is called to prepare the mutex for use. This is usually done at attach(9E) time for per-device mutexes and _init(9E) time for global driver mutexes.

For example,

struct xxstate *xsp;
...
mutex_init(&xsp->mu, NULL, MUTEX_DRIVER, NULL);
...

For a more complete example of mutex initialization, see Chapter 5, Driver Autoconfiguration.

The driver must destroy the mutex with mutex_destroy(9F) before being unloaded. This is usually done at detach(9E) time for per-device mutexes and _fini(9E) time for global driver mutexes.

Using Mutexes

Every section of the driver code that needs to read or write the shared data structure must do the following:

The scope of a mutex—the data it protects—is entirely up to the programmer. A mutex protects some particular data structure because the programmer chooses to do so and uses it accordingly. A mutex protects a data structure only if every code path that accesses the data structure does so while holding the mutex.

Readers/Writer Locks

A readers/writer lock regulates access to a set of data. The readers/writer lock is so called because many threads can hold the lock simultaneously for reading, but only one thread can hold it for writing.

Most device drivers do not use readers/writer locks. These locks are slower than mutexes and provide a performance gain only when protecting data that is not frequently written but is commonly read by many concurrent threads. In this case, contention for a mutex could become a bottleneck, so using a readers/writer lock might be more efficient. The readers/writer functions are summarized in the following table. See the rwlock(9F) man page for detailed information.

Table 3–2 Readers/Writer Locks

Name 

Description 

rw_init(9F)

Initializes a readers/writer lock 

rw_destroy(9F)

Destroys a readers/writer lock 

rw_enter(9F)

Acquires a readers/writer lock 

rw_tryenter

Attempts to acquire a reader/writer lock without waiting 

rw_tryupgrade(9F)

Attempts to upgrade readers/writer lock holding from reader to writer 

rw_downgrade(9F)

Downgrades a readers/writer lock holding from writer to reader 

rw_exit(9F)

Releases a readers/writer lock 

rw_read_locked(9F)

Determines whether readers/writer lock is held for read or write 

Semaphores

Counting semaphores are available as an alternative primitive for managing threads within device drivers. See the semaphore(9F) man page for more information.

Table 3–3 Semaphores

Name 

Description 

sema_init(9F)

Initialize a semaphore 

sema_destroy(9F)

Destroys a semaphore 

sema_p(9F)

Decrement semaphore and possibly block 

sema_tryp(9F)

Attempt to decrement semaphore, but do not block 

sema_p_sig(9F)

Decrement semaphore, but do not block if signal is pending 

sema_v(9F)

Increment semaphore and possibly unblock waiter 

Thread Synchronization

In addition to protecting shared data, drivers often need to synchronize execution among multiple threads.

Condition Variables in Thread Synchronization

Condition variables are a standard form of thread synchronization. They are designed to be used with mutexes. The associated mutex is used to ensure that a condition can be checked atomically, and that the thread can block on the associated condition variable without missing either a change to the condition or a signal that the condition has changed.

Table 3–4 lists the condvar(9F) interfaces.

Table 3–4 Condition Variable Routines

Name 

Description 

cv_init(9F)

Initializes a condition variable 

cv_destroy(9F)

Destroys a condition variable 

cv_wait(9F)

Waits for condition 

cv_timedwait(9F)

Waits for condition or timeout 

cv_wait_sig

Waits for condition or return zero on receipt of a signal 

cv_timedwait_sig(9F)

Waits for condition or timeout or signal 

cv_signal(9F)

Signals one thread waiting on the condition variable 

cv_broadcast(9F)

Signals all threads waiting on the condition variable 

Initializing Condition Variables

Declare a condition variable (type kcondvar_t) for each condition. Usually, this is done in the driver's soft-state structure. Use cv_init(9F) to initialize each one. Similar to mutexes, condition variables are usually initialized at attach(9E) time. For example:

cv_init(&xsp->cv, NULL, CV_DRIVER, NULL);

For a more complete example of condition variable initialization see Chapter 5, Driver Autoconfiguration.

Waiting for the Condition

To use condition variables, follow these steps in the code path waiting for the condition:

  1. Acquire the mutex guarding the condition.

  2. Test the condition.

  3. If the test results do not allow the thread to continue, use cv_wait(9F) to block the current thread on the condition. cv_wait(9F) releases the mutex before blocking. Upon return from cv_wait(9F) (which will reacquire the mutex before returning), repeat the test.

  4. Once the test allows the thread to continue, set the condition to its new value. For example, set a device flag to busy.

  5. Release the mutex.

Signaling the Condition

Follow these steps in the code path signaling the condition:

  1. Acquire the mutex guarding the condition.

  2. Set the condition.

  3. Signal the blocked thread with cv_broadcast(9F).

  4. Release the mutex.

Example 3–1 uses a busy flag along with mutex and condition variables to force the read(9E) routine to wait until the device is no longer busy before starting a transfer.


Example 3–1 Using Mutexes and Condition Variables

static int
xxread(dev_t dev, struct uio *uiop, cred_t *credp)
{
        struct xxstate *xsp;
        ...
        mutex_enter(&xsp->mu);
        while (xsp->busy)
                cv_wait(&xsp->cv, &xsp->mu);
        xsp->busy = 1;
        mutex_exit(&xsp->mu);
        /* perform the data access */
}

static uint_t
xxintr(caddr_t arg)
{
        struct xxstate *xsp = (struct xxstate *)arg;
        mutex_enter(&xsp->mu);
        xsp->busy = 0;
        cv_broadcast(&xsp->cv);
        mutex_exit(&xsp->mu);
}

cv_wait() and cv_timedwait() Functions

If a thread blocks on a condition with cv_wait(9F), and that condition does not occur, it can wait forever. For that reason, it is often preferable to use cv_timedwait(9F), which depends upon another thread to perform a wakeup. cv_timedwait(9F) takes an absolute wait time as an argument and returns -1 if the time is reached and the event has not occurred. It returns a positive value if the condition is met.

cv_timedwait(9F) requires an absolute wait time expressed in clock ticks since the system was last rebooted. This can be determined by retrieving the current value with ddi_get_lbolt(9F). The driver usually has a maximum number of seconds or microseconds to wait, so this value is converted to clock ticks with drv_usectohz(9F) and added to the value from ddi_get_lbolt(9F).

Example 3–2 shows how to use cv_timedwait(9F) to wait up to five seconds to access the device before returning EIO to the caller.


Example 3–2 Using cv_timedwait()

clock_t            cur_ticks, to;
mutex_enter(&xsp->mu);
while (xsp->busy) {
        cur_ticks = ddi_get_lbolt();
        to = cur_ticks + drv_usectohz(5000000); /* 5 seconds from now */
        if (cv_timedwait(&xsp->cv, &xsp->mu, to) == -1) {
                /*
                 * The timeout time 'to' was reached without the
                 * condition being signalled.
                 */
                /* tidy up and exit */
                mutex_exit(&xsp->mu);
                return (EIO);
        }
}
xsp->busy = 1;
mutex_exit(&xsp->mu);

Although device driver writers generally find it preferable to use cv_timedwait(9F) over cv_wait(9F), there are situations in which cv_wait(9F) is a better choice. For example, cv_wait(9F) would be better when a driver is waiting on:

cv_wait_sig() Function

There is always the possibility that either the driver accidentally waits for a condition that will never occur or that the condition will not happen for a long time. In either case, the user can abort the thread by sending it a signal. Whether the signal causes the driver to wake up depends upon the driver.

cv_wait_sig(9F) allows a signal to unblock the thread. This enables the user to break out of potentially long waits by sending a signal to the thread with kill(1) or by typing the interrupt character. cv_wait_sig(9F) returns zero if it is returning because of a signal, or nonzero if the condition occurred.

Example 3–3 shows how to use cv_wait_sig(9F) to allow a signal to unblock the thread.


Example 3–3 Using cv_wait_sig()

mutex_enter(&xsp->mu);
while (xsp->busy) {
        if (cv_wait_sig(&xsp->cv, &xsp->mu) == 0) {
        /* Signalled while waiting for the condition */
                /* tidy up and exit */
                mutex_exit(&xsp->mu);
                return (EINTR);
        }
}
xsp->busy = 1;
mutex_exit(&xsp->mu);

cv_timedwait_sig() Function

cv_timedwait_sig(9F) is similar to cv_timedwait(9F) and cv_wait_sig(9F), except that it returns -1 without the condition being signaled after a timeout has been reached, or 0 if a signal (for example, kill(2)) is sent to the thread.

For both cv_timedwait(9F) and cv_timedwait_sig(9F), time is measured in absolute clock ticks since the last system reboot.

Choosing a Locking Scheme

The locking scheme for most device drivers should be kept straightforward. Using additional locks allows more concurrency but increases overhead. Using fewer locks is less time consuming but allows less concurrency. Generally, use one mutex per data structure, a condition variable for each event or condition the driver must wait for, and a mutex for each major set of data global to the driver. Avoid holding mutexes for long periods of time.

To look at lock usage, use lockstat(1M). lockstat(1M) monitors all kernel lock events, gathers frequency and timing data about the events, and displays the data.

See the Multithreaded Programming Guide for more details on multithreaded operations.

Potential Locking Pitfalls

Mutexes are not re-entrant by the same thread. If you already own the mutex, attempting to claim it again leads to this panic:

panic: recursive mutex_enter. mutex %x caller %x

Releasing a mutex that the current thread does not hold causes this panic:

panic: mutex_adaptive_exit: mutex not held by thread

The following panic occurs only on uniprocessors:

panic: lock_set: lock held and only one CPU

It indicates that a spin mutex is held and will spin forever, because there is no other CPU to release it. This could happen because the driver forgot to release the mutex on one code path, or blocked while holding it.

A common cause of this panic is that the device's interrupt is high-level and is calling a routine that blocks the interrupt handler while holding a spin mutex. This is obvious if the driver explicitly calls cv_wait(9F), but might not be so if the driver is blocking while grabbing an adaptive mutex with mutex_enter(9F).