Writing Device Drivers

Chapter 23 Recommended Coding Practices

This chapter describes how to write drivers that are robust. Drivers that are written in accordance with the guidelines that are discussed in this chapter are easier to debug. The recommended practices also protect the system from hardware and software faults.

This chapter provides information on the following subjects:

Debugging Preparation Techniques

Driver code is more difficult to debug than user programs because:

The driver interacts directly with the hardware
The driver operates without the protection of the operating system that is available to user processes

Be sure to build debugging support into your driver. This support facilitates both maintenance work and future development.

Use a Unique Prefix to Avoid Kernel Symbol Collisions

The name of each function, data element, and driver preprocessor definition must be unique for each driver.

A driver module is linked into the kernel. The name of each symbol unique to a particular driver must not collide with other kernel symbols. To avoid such collisions, each function and data element for a particular driver must be named with a prefix common to that driver. The prefix must be sufficient to uniquely name each driver symbol. Typically, this prefix is the name of the driver or an abbreviation for the name of the driver. For example, xx_open() would be the name of the open(9E) routine of driver xx.

When building a driver, a driver must necessarily include a number of system header files. The globally-visible names within these header files cannot be predicted. To avoid collisions with these names, each driver preprocessor definition must be given a unique name by using an identifying prefix.

A distinguishing driver symbol prefix also is an aid to deciphering system logs and panics when troubleshooting. Instead of seeing an error related to an ambiguous attach() function, you see an error message about xx_attach().

Use `cmn_err()` to Log Driver Activity

Use the cmn_err(9F) function to print messages to a system log from within the device driver. The cmn_err(9F) function for kernel modules is similar to the printf(3C) function for applications. The cmn_err(9F) function provides additional format characters, such as the %b format to print device register bits. The cmn_err(9F) function writes messages to a system log. Use the tail(1) command to monitor these messages on /var/adm/messages.

% tail -f /var/adm/messages

Use `ASSERT()` to Catch Invalid Assumptions

Assertions are an extremely valuable form of active documentation. The syntax for ASSERT(9F) is as follows:

void ASSERT(EXPRESSION)

The ASSERT() macro halts the execution of the kernel if a condition that is expected to be true is actually false. ASSERT() provides a way for the programmer to validate the assumptions made by a piece of code.

The ASSERT() macro is defined only when the DEBUG compilation symbol is defined. When DEBUG is not defined, the ASSERT() macro has no effect.

The following example assertion tests the assumption that a particular pointer value is not NULL:

ASSERT(ptr != NULL);

If the driver has been compiled with DEBUG, and if the value of ptr is NULL at this point in execution, then the following panic message is printed to the console:

panic: assertion failed: ptr != NULL, file: driver.c, line: 56

Note –

Because ASSERT(9F) uses the DEBUG compilation symbol, any conditional debugging code should also use DEBUG.

Use `mutex_owned()` to Validate and Document Locking Requirements

The syntax for mutex_owned(9F) is as follows:

int mutex_owned(kmutex_t *mp);

A significant portion of driver development involves properly handling multiple threads. Comments should always be used when a mutex is acquired. Comments can be even more useful when an apparently necessary mutex is not acquired. To determine whether a mutex is held by a thread, use mutex_owned() within ASSERT(9F):

void helper(void)
{
    /* this routine should always be called with xsp's mutex held */
    ASSERT(mutex_owned(&xsp->mu));
    /* ... */
}

Note –

mutex_owned() is only valid within ASSERT() macros. You should use mutex_owned() to control the behavior of a driver.

Use Conditional Compilation to Toggle Costly Debugging Features

You can insert code for debugging into a driver through conditional compiles by using a preprocessor symbol such as DEBUG or by using a global variable. With conditional compilation, unnecessary code can be removed in the production driver. Use a variable to set the amount of debugging output at runtime. The output can be specified by setting a debugging level at runtime with an ioctl or through a debugger. Commonly, these two methods are combined.

The following example relies on the compiler to remove unreachable code, in this case, the code following the always-false test of zero. The example also provides a local variable that can be set in /etc/system or patched by a debugger.

#ifdef DEBUG
/* comments on values of xxdebug and what they do */
static int xxdebug;
#define dcmn_err if (xxdebug) cmn_err
#else
#define dcmn_err if (0) cmn_err
#endif
/* ... */
    dcmn_err(CE_NOTE, "Error!\n");

This method handles the fact that cmn_err(9F) has a variable number of arguments. Another method relies on the fact that the macro has one argument, a parenthesized argument list for cmn_err(9F). The macro removes this argument. This macro also removes the reliance on the optimizer by expanding the macro to nothing if DEBUG is not defined.

#ifdef DEBUG
/* comments on values of xxdebug and what they do */
static int xxdebug;
#define dcmn_err(X) if (xxdebug) cmn_err X
#else
#define dcmn_err(X) /* nothing */
#endif
/* ... */
/* Note:double parentheses are required when using dcmn_err. */
    dcmn_err((CE_NOTE, "Error!"));

You can extend this technique in many ways. One way is to specify different messages from cmn_err(9F), depending on the value of xxdebug. However, in such a case, you must be careful not to obscure the code with too much debugging information.

Another common scheme is to write an xxlog() function, which uses vsprintf(9F) or vcmn_err(9F) to handle variable argument lists.

Declaring a Variable Volatile

volatile is a keyword that must be applied when declaring any variable that will reference a device register. Without the use of volatile, the compile-time optimizer can inadvertently delete important accesses. Neglecting to use volatile might result in bugs that are difficult to track down.

The correct use of volatile is necessary to prevent elusive bugs. The volatile keyword instructs the compiler to use exact semantics for the declared objects, in particular, not to remove or reorder accesses to the object. Two instances where device drivers must use the volatile qualifier are:

When data refers to an external hardware device register, that is, memory that has side effects other than just storage. Note, however, that if the DDI data access functions are used to access device registers, you do not have to use volatile.
When data refers to global memory that is accessible by more than one thread, that is not protected by locks, and that relies on the sequencing of memory accesses. Using volatileconsumes fewer resources than using lock.

The following example uses volatile. A busy flag is used to prevent a thread from continuing while the device is busy and the flag is not protected by a lock:

    while (busy) {
      /* do something else */
      }

The testing thread will continue when another thread turns off the busy flag:

    busy = 0;

Because busy is accessed frequently in the testing thread, the compiler can potentially optimize the test by placing the value of busy in a register and test the contents of the register without reading the value of busy in memory before every test. The testing thread would never see busy change and the other thread would only change the value of busy in memory, resulting in deadlock. Declaring the busy flag as volatile forces its value to be read before each test.

Note –

An alternative to the busy flag is to use a condition variable. See Condition Variables in Thread Synchronization.

When using the volatile qualifier, avoid the risk of accidental omission. For example, the following code

    struct device_reg {
     volatile uint8_t csr;
     volatile uint8_t data;
     };
     struct device_reg *regp;

is preferable to the next example:

    struct device_reg {
     uint8_t csr;
     uint8_t data;
     };
     volatile struct device_reg *regp;

Although the two examples are functionally equivalent, the second one requires the writer to ensure that volatile is used in every declaration of type struct device_reg. The first example results in the data being treated as volatile in all declarations and is therefore preferred. As mentioned above, using the DDI data access functions to access device registers makes qualifying variables as volatile unnecessary.

Serviceability

To ensure serviceability, the driver must be enabled to take the following actions:

Detect faulty devices and report the fault
Remove a device as supported by the Solaris hot-plug model
Add a new device as supported by the Solaris hot-plug model
Perform periodic health checks to enable the detection of latent faults

Periodic Health Checks

A latent fault is one that does not show itself until some other action occurs. For example, a hardware failure occurring in a device that is a cold standby could remain undetected until a fault occurs on the master device. At this point, the system now contains two defective devices and might be unable to continue operation.

Latent faults that remain undetected typically cause system failure eventually. Without latent fault checking, the overall availability of a redundant system is jeopardized. To avoid this situation, a device driver must detect latent faults and report them in the same way as other faults.

You should provide the driver with a mechanism for making periodic health checks on the device. In a fault-tolerant situation where the device can be the secondary or failover device, early detection of a failed secondary device is essential to ensure that the secondary device can be repaired or replaced before any failure in the primary device occurs.

Periodic health checks can be used to perform the following activities:

Check any register or memory location on the device whose value might have been altered since the last poll.

Features of a device that typically exhibit deterministic behavior include heartbeat semaphores, device timers (for example, local lbolt used by download), and event counters. Reading an updated predictable value from the device gives a degree of confidence that things are proceeding satisfactorily.
Timestamp outgoing requests such as transmit blocks or commands that are issued by the driver.

The periodic health check can look for any suspect requests that have not completed.
Initiate an action on the device that should be completed before the next scheduled check.

If this action is an interrupt, this check is an ideal way to ensure that the device's circuitry can deliver an interrupt.

Chapter 23 Recommended Coding Practices

Debugging Preparation Techniques

Use a Unique Prefix to Avoid Kernel Symbol Collisions

Use cmn_err() to Log Driver Activity

Use ASSERT() to Catch Invalid Assumptions

Use mutex_owned() to Validate and Document Locking Requirements

Use Conditional Compilation to Toggle Costly Debugging Features

Declaring a Variable Volatile

Serviceability

Periodic Health Checks

Use `cmn_err()` to Log Driver Activity

Use `ASSERT()` to Catch Invalid Assumptions

Use `mutex_owned()` to Validate and Document Locking Requirements