This section supplements the guidelines presented in Chapter 4, Multithreading, for writing an MT-safe driver, a driver that safely supports multiple threads.
Here are some issues to consider when deciding on how many locks to use in a driver:
The driver should allow as many threads as possible into the driver: this leads to fine-grained locking.
However, it should not spend too much time executing the locking primitives: this approach leads to coarse-grained locking.
Driver code should be simple and maintainable.
Avoid lock contention for shared data.
Write re-entrant code wherever possible. This makes it possible for many threads to execute without grabbing any locks.
Use locks to protect the data and not the code path.
Keep in mind the level of concurrency provided by the device; if the controller can only handle one request at a time, there is no point in spending excessive time making the driver handle multiple threads.
A little thought in reorganizing the ordering and types of locks around such data can lead to considerable savings.
To avoid unnecessary locks, note the following:
Use the multithreading semantics of the entry points to your advantage.
If an element of a device's state structure is read-mostly--for example, initialized in attach(9E), and destroyed in detach(9E), but only read in other entry points--there is no need to acquire a mutex to read that element of the structure. Indiscriminately adding calls to mutex_enter(9F) and mutex_exit(9F) around every access to such a variable can lead to unnecessary locking overhead.
Make all entry points re-entrant and reduce the amount of shared data by changing static variables to automatic, or by adding them to your state structure.
Kernel-thread stacks are small (currently 8 Kbytes), so do not allocate large automatic variables, and avoid deep recursion.
When acquiring multiple mutexes, be sure to acquire them in the same order on each code path. For example, mutexes A and B are used to protect two resources in the following ways:
Code Path 1 Code Path 2 mutex_enter(&A); mutex_enter(&B); ... ... mutex_enter(&B); mutex_enter(&A); ... ... mutex_exit(&B); mutex_exit(&A); ... ... mutex_exit(&A); mutex_exit(&B);
If thread 1 is executing code path one, and thread two is executing code path 2, the following could occur:
Thread one acquires mutex A.
Thread two acquires mutex B.
Thread one needs mutex B, so it blocks holding mutex A.
Thread two needs mutex A, so it blocks holding mutex B.
These threads are now deadlocked. This is hard to track, particularly since the code paths are rarely so straightforward. Also, it doesn't always happen, as it depends on the relative timing of threads 1 and 2.
Experience has shown that it is easier to deal with locks that are either held throughout the execution of a routine, or locks that are both acquired and released in one routine. Avoid nesting like this:
static void xxfoo(...) { mutex_enter(&softc->lock); ... xxbar(); } static void xxbar(...) { ... mutex_exit(&softc->lock); }
This example works, but will almost certainly lead to maintenance problems.
If contention is likely in a particular code path, try to hold locks for a short time. In particular, arrange to drop locks before calling kernel routines that might block. For example:
mutex_enter(&softc->lock); ... softc->foo = bar; softc->thingp = kmem_alloc(sizeof(thing_t), KM_SLEEP); ... mutex_exit(&softc->lock);
This is better coded as:
thingp = kmem_alloc(sizeof(thing_t), KM_SLEEP); mutex_enter(&softc->lock); ... softc->foo = bar; softc->thingp = thingp; ... mutex_exit(&softc->lock);
Here is a set of mutex-related panics:
panic: recursive mutex_enter. mutex %x caller %x
Mutexes are not re-entrant by the same thread. If you already own the mutex, you cannot own it again. Doing this leads to this panic.
panic: mutex_adaptive_exit: mutex not held by thread
Releasing a mutex that the current thread does not hold causes this panic.
panic: lock_set: lock held and only one CPU
This panic only occurs on a uniprocessor. It indicates that a spin mutex is held and it would spin forever, because there is no other CPU to release it. This could happen because the driver forgot to release the mutex on one code path, or blocked while holding it.
A common cause of this panic is that the device's interrupt is high-level (see ddi_intr_hilevel(9F) and Intro(9F)), and is calling a routine that blocks the interrupt handler while holding a spin mutex. This is obvious if the driver explicitly calls cv_wait(9F), but might not be so if it's blocking while grabbing an adaptive mutex with mutex_enter(9F).
In principle, this is only a problem for drivers that operate above lock level.