Writing Device Drivers

Chapter 19 Recommended Coding Practices

This chapter describes how to write drivers that are robust. Drivers that are written in accordance with the guidelines discussed in this chapter:

This chapter provides information on the following subjects:

Debugging

Driver code is more difficult to debug than user programs because:

Be sure to build debugging support into your driver. Debugging support facilitates future development and maintenance work.

Use cmn_err() to Log Driver Activity

Use the cmn_err() function to print messages to the console from within the device driver. cmn_err() is similar to printf(3C), but provides additional format characters (such as %b) to print device register bits. See the cmn_err(9F) man page for more information.


Note –

To ensure that the driver is DDI-compliant, use cmn_err() instead of printf() and uprintf().


Use ASSERT() to Catch Invalid Assumptions

The syntax for ASSERT(9F) is as follows:

void ASSERT(EXPRESSION)

ASSERT() is a macro that is used to halt the execution of the kernel if a condition expected to be true is actually false. ASSERT provides a way for the programmer to validate the assumptions made by a piece of code.

The ASSERT() macro is defined only when the DEBUG compilation symbol is defined. However, when DEBUG is not defined, theASSERT() macro has no effect.

For example, if a driver pointer should be non-NULL and is not, the following assertion can be used to check the code:

ASSERT(ptr != NULL);

If the driver is compiled with DEBUG defined and the assertion fails, a message is printed to the console and the system panics:

panic: assertion failed: ptr != NULL, file: driver.c, line: 56

Note –

Because ASSERT(9F) uses the DEBUG compilation symbol, any conditional debugging code should also use DEBUG.


Assertions are an extremely valuable form of active documentation.

Use mutex_owned() to Validate and Document Locking Requirements

The syntax for mutex_owned(9F) is as follows:

int mutex_owned(kmutex_t *mp);

A significant portion of driver development involves properly handling multiple threads. Comments should always be used when a mutex is acquired; they are even more useful when an apparently necessary mutex is not acquired. To determine if a mutex is held by a thread, use mutex_owned() within ASSERT(9F):

void helper(void)
{
        /* this routine should always be called with xsp's mutex held */
        ASSERT(mutex_owned(&xsp->mu));
        ...
}

Caution – Caution –

mutex_owned() is only valid within ASSERT() macros. Under no circumstances should you use it to control the behavior of a driver.


Use Conditional Compilation to Toggle Costly Debugging Features

Debugging code can be placed in a driver by conditionally compiling code based on a preprocessor symbol such as DEBUG or by using a global variable. Conditional compilation has the advantage that unnecessary code can be removed in the production driver. Using a variable allows the amount of debugging output to be chosen at runtime. This can be accomplished by setting a debugging level at runtime with an ioctl or through a debugger. Commonly, these two methods are combined.

The following example relies on the compiler to remove unreachable code (the code following the always-false test of zero), and also provides a local variable that can be set in /etc/system or patched by a debugger.

#ifdef DEBUG
comments on values of xxdebug and what they do
static int xxdebug;
#define dcmn_err if (xxdebug) cmn_err
#else
#define dcmn_err if (0) cmn_err
#endif
...
    dcmn_err(CE_NOTE, "Error!\n");

This method handles the fact that cmn_err(9F) has a variable number of arguments. Another method relies on the macro having one argument, a parenthesized argument list for cmn_err(9F), which the macro removes. It also removes the reliance on the optimizer by expanding the macro to nothing if DEBUG is not defined.

#ifdef DEBUG
comments on values of xxdebug and what they do
static int xxdebug;
#define dcmn_err(X) if (xxdebug) cmn_err X
#else
#define dcmn_err(X) /* nothing */
#endif
    ...
/* Note:double parentheses are required when using dcmn_err. */
    dcmn_err((CE_NOTE, "Error!")); 

You can extend this in many ways, such as by having different messages from cmn_err(9F), depending on the value of xxdebug, but be careful not to obscure the code with too much debugging information.

Another common scheme is to write an xxlog() function, which uses vsprintf(9F) or vcmn_err(9F) to handle variable argument lists.

Defensive Programming

Use the following defensive programming techniques to prevent your code from causing system panics or hangs, draining system resources, or allowing the spread of corrupted data.

All Solaris drivers should abide by these coding practices:

Using Separate Device Driver Instances

The Solaris kernel allows multiple instances of a driver. Each instance has its own data space but shares the text and some global data with other instances. The device is managed on a per-instance basis. Drivers should use a separate instance for each piece of hardware unless the driver is designed to handle fail-over internally. There can be multiple instances of a driver per slot, for example, multi-function cards, which is standard behavior for Solaris device drivers.

Exclusive Use of DDI Access Handles

All PIO access by a driver must use Solaris DDI access functions from the ddi_getX, ddi_putX, ddi_rep_getX, and ddi_rep_putX families of routines. The driver should not directly access the mapped registers by the address returned from ddi_regs_map_setup(9F). (Avoid the ddi_peek(9F) and ddi_poke(9F) routines because they do not use access handles.)

The DDI access mechanism is important because it provides an opportunity to control how data is read into the kernel.

Detecting Corrupted Data

The following sections consider where data corruption can occur and the steps you can take to detect it.

Corruption of Device Management and Control Data

The driver should assume that any data obtained from the device, whether by PIO or DMA, could have been corrupted. In particular, extreme care should be taken with pointers, memory offsets, or array indexes read or calculated from data supplied by the device. Such values can be malignant, meaning they can cause a kernel panic if dereferenced. All such values should be checked for range and alignment (if required) before use.

Even if a pointer is not malignant, it can still be misleading. For example, it can point at a valid instance of an object, but not the correct one. Where possible, the driver should cross-check the pointer with the pointed-to object, or otherwise validate the data obtained through it.

Other types of data can also be misleading, such as packet lengths, status words, or channel IDs. Each should be checked to the extent possible: a packet length can be range-checked to ensure that it is not negative or larger than the containing buffer; a status word can be checked for ”impossible” bits; and a channel ID can be matched against a list of valid IDs.

Where a value is used to identify a Stream, the driver must ensure that the Stream still exists. The asynchronous nature of STREAMS processing means that a Stream can be dismantled while device interrupts are still outstanding.

The driver should not reread data from the device; the data should be read once, validated, and stored in the driver's local state. This avoids the hazard presented by data that, although correct when initially read and validated, is incorrect when reread later.

The driver should also ensure that all loops are bounded, so that a device returning a continuous BUSY status, or claiming that another buffer needs to be processed, does not lock up the entire system.

Corruption of Received Data

Device errors can result in corrupted data being placed in receive buffers. Such corruption is indistinguishable from corruption that occurs beyond the domain of the device—for example, within a network. Typically, existing software is already in place to handle such corruption; for example, through integrity checks at the transport layer of a protocol stack or within the application using the device.

If the received data will not be checked for integrity at a higher layer—as in the case of a disk driver, for example—it can be integrity-checked within the driver itself. Methods of detecting corruption in received data are typically device-specific (checksums, CRC, and so forth).

DMA Isolation

A defective device might initiate an improper DMA transfer over the bus. This data transfer could corrupt good data that was previously delivered. A device that fails might generate a corrupt address that can contaminate memory that does not even belong to its own driver.

In systems with an IOMMU, a device can write only to pages mapped as writable for DMA. Therefore, pages that are to be the target of DMA writes should be owned solely by one driver instance and not shared with any other kernel structure. While the page in question is mapped as writable for DMA, the driver should be suspicious of data in that page. The page must be unmapped from the IOMMU before it is passed beyond the driver, or before any validation of the data.

You can use ddi_umem_alloc(9F) to guarantee that a whole aligned page is allocated, or allocate multiple pages and ignore the memory below the first page boundary. You can find the size of an IOMMU page by using ddi_ptob(9F).

Alternatively, the driver can choose to copy the data into a safe part of memory before processing it. If this is done, the data must first be synchronized using ddi_dma_sync(9F).

Calls to ddi_dma_sync(9F) should specify SYNC_FOR_DEV before using DMA to transfer data to a device, and SYNC_FOR_CPU after using DMA to transfer data from the device to memory.

On some PCI-based systems with an IOMMU, devices may be able to use PCI dual address cycles (64-bit addresses) to bypass the IOMMU. This gives the device the potential to corrupt any region of main memory. Device drivers must not attempt to use such a mode and should disable it.

Handling Stuck Interrupts

The driver must identify stuck interrupts because a persistently asserted interrupt severely affects system performance, almost certainly stalling a single-processor machine.

Sometimes the driver might have difficulty in identifying a particular interrupt as bogus. For network drivers, if a receive interrupt is indicated but no new buffers have been made available, no work was needed. When this is an isolated occurrence, it is not a problem, as the actual work might already have been completed by another routine (read service, for example).

On the other hand, continuous interrupts with no work for the driver to process can indicate a stuck interrupt line. For this reason, all platforms allow a number of apparently bogus interrupts to occur before taking defensive action.

While appearing to have work to do, a hung device might be failing to update its buffer descriptors. The driver should defend against such repetitive requests.

In some cases, platform–specific bus drivers might be capable of identifying a persistently unclaimed interrupt and can disable the offending device. However, this relies on the driver's ability to identify the valid interrupts and return the appropriate value. The driver should therefore return a DDI_INTR_UNCLAIMED result unless it detects that the device legitimately asserted an interrupt (that is, the device actually requires the driver to do some useful work).

The legitimacy of other, more incidental, interrupts is much harder to certify. To this end, an interrupt-expected flag is a useful tool for evaluating whether an interrupt is valid. Consider an interrupt such as descriptor free, which can be generated if all the device's descriptors had been previously allocated. If the driver detects that it has taken the last descriptor from the card, it can set an interrupt-expected flag. If this flag is not set when the associated interrupt is delivered, it is suspicious.

Some informative interrupts might not be predictable, such as one indicating that a medium has become disconnected or frame sync has been lost. The easiest method of detecting whether such an interrupt is stuck is to mask this particular source on first occurrence until the next polling cycle.

If the interrupt occurs again while disabled, this should be considered a false interrupt. Some devices have interrupt status bits that can be read even if the mask register has disabled the associated source and might not be causing the interrupt. Driver designers can devise more appropriate algorithms specific to their devices.

Avoid looping on interrupt status bits indefinitely. Break such loops if none of the status bits set at the start of a pass requires any real work.

Additional Programming Considerations

In addition to the requirements discussed in the previous sections, the driver developer must consider a few other issues. These are:

Thread Interaction

Kernel panics in a device driver are often caused by unexpected interaction of kernel threads after a device failure. When a device fails, threads can interact in ways that the designer had not anticipated.

If processing routines terminate early, the condition variable waiters are blocked because an expected signal is never given. Attempting to inform other modules of the failure or handling unanticipated callbacks can result in undesirable thread interactions. Consider the sequence of mutex acquisition and relinquishing that can occur during device failures.

Threads that originate in an upstream STREAMS module can run into unfortunate paradoxes if used to return to that module unexpectedly. You might use alternative threads to handle exception messages. For instance, a wput procedure might use a read-side service routine to communicate an M_ERROR, rather than doing it directly with a read-side putnext.

A failing STREAMS device that cannot be quiesced during close (because of the fault) can generate an interrupt after the stream has been dismantled. The interrupt handler must not attempt to use a stale stream pointer to try to process the message.

Threats From Top-Down Requests

While protecting the system from defective hardware, the driver designer also needs to protect against driver misuse. Although the driver can assume that the kernel infrastructure is always correct (a trusted core), user requests passed to it can be potentially destructive.

For example, a user can request an action to be performed upon a user-supplied data block (M_IOCTL) that is smaller than that indicated in the control part of the message. The driver should never trust a user application.

The design should consider the construction of each type of ioctl that it can receive with a view to the potential harm that it could cause. The driver should make checks to be sure that it does not process malformed ioctls.

Adaptive Strategies

A driver can continue to provide service with faulty hardware, attempting to work around the identified problem by using an alternative strategy for accessing the device. Given that broken hardware is unpredictable and given the risk associated with additional design complexity, adaptive strategies are not always wise. At most, they should be limited to periodic interrupt polling and retry attempts. Periodically retrying the device lets the driver know when a device has recovered. Periodic polling can control the interrupt mechanism after a driver has been forced to disable interrupts.

Ideally, a system always has an alternative device to provide a vital system service. Service multiplexors in kernel or user space offer the best method of maintaining system services when a device fails. Such practices are beyond the scope of this chapter.

Declaring a Variable Volatile

volatile is a keyword that must be used when declaring any variable that will reference a device register. If this is not done, the compile-time optimizer might optimize important accesses away. This is important; neglecting to use volatile can result in bugs that are difficult to track down.

The correct use of volatile is necessary to prevent elusive bugs. It instructs the compiler to use exact semantics for the declared objects—in particular, to not optimize away or reorder accesses to the object. There are two instances where device drivers must use the volatile qualifier:

  1. When data refers to an external hardware device register (memory that has side effects other than just storage). Note, however, that if the DDI data access functions are used to access device registers, it is not necessary to use volatile.

  2. When data refers to global memory that is accessible by more than one thread, that is not protected by locks, and that relies on the sequencing of memory accesses. Using volatile is less expensive than using lock.

The following example uses volatile. A busy flag is used to prevent a thread from continuing while the device is busy and the flag is not protected by a lock:

    while (busy) {
          /* do something else */
      }

The testing thread will continue when another thread turns off the busy flag:

    busy = 0;

However, since busy is accessed frequently in the testing thread, the compiler may optimize the test by placing the value of busy in a register, then test the contents of the register without reading the value of busy in memory before every test. The testing thread would never see busy change and the other thread would only change the value of busy in memory, resulting in deadlock. Declaring the busy flag as volatile forces its value to be read before each test.


Note –

It would probably be preferable to use a condition variable, discussed under Condition Variables in Thread Synchronization rather than the busy flag in this example.


It is also recommended that the volatile qualifier be used in such a way as to avoid the risk of accidental omission. For example, this code

    struct device_reg {
         volatile uint8_t csr;
         volatile uint8_t data;
     };
     struct device_reg *regp;

is recommended over:

    struct device_reg {
         uint8_t csr;
         uint8_t data;
     };
     volatile struct device_reg *regp;

Although the two examples are functionally equivalent, the second one requires the writer to ensure that volatile is used in every declaration of type struct device_reg. The first example results in the data being treated as volatile in all declarations and is therefore preferred. Note as mentioned above, that the use of the DDI data access functions to access device registers makes it unnecessary to qualify variables as volatile.

Serviceability

To ensure serviceability, the driver must be enabled to do the following:

Periodic Health Checks

A latent fault is one that does not show itself until some other action occurs. For example, a hardware failure occurring in a device that is a cold stand-by could remain undetected until a fault occurs on the master device. At this point, the system now contains two defective devices and might be unable to continue operation.

As a general rule, latent faults that are allowed to remain undetected will eventually cause system failure. Without latent fault checking, the overall availability of a redundant system is jeopardized. To avoid this, a device driver must detect latent faults and report them in the same way as other faults.

The driver should ensure that it has a mechanism for making periodic health checks on the device. In a fault-tolerant situation where the device can be the secondary or fail-over device, early detection of a failed secondary device is essential to ensure that it can be repaired or replaced before any failure in the primary device occurs.

Periodic health checks can: