In addition to the requirements discussed in the previous sections, consider the following issues:
Threats from top-down requests
If processing routines terminate early, the condition variable waiters are blocked because an expected signal is never given. Attempting to inform other modules of the failure or handling unanticipated callbacks can result in undesirable thread interactions. Consider the sequence of mutex acquisition and relinquishing that can occur during device failures.
Threads that originate in an upstream STREAMS module can become involved in unfortunate paradoxes if those threads are used to return to that module unexpectedly. Consider using alternative threads to handle exception messages. For instance, a procedure might use a read-side service routine to communicate an M_ERROR, rather than handling the error directly with a read-side putnext(9F).
A failing STREAMS device that cannot be quiesced during close because of a fault can generate an interrupt after the stream has been dismantled. The interrupt handler must not attempt to use a stale stream pointer to try to process the message.
While protecting the system from defective hardware, you also need to protect against driver misuse. Although the driver can assume that the kernel infrastructure is always correct (a trusted core), user requests passed to it can be potentially destructive.
For example, a user can request an action to be performed upon a user-supplied data block (M_IOCTL) that is smaller than the block size that is indicated in the control part of the message. The driver should never trust a user application.
Consider the construction of each type of ioctl that your driver can receive and the potential harm that the ioctl could cause. The driver should perform checks to ensure that it does not process a malformed ioctl.
A driver can continue to provide service using faulty hardware. The driver can attempt to work around the identified problem by using an alternative strategy for accessing the device. Given that broken hardware is unpredictable and given the risk associated with additional design complexity, adaptive strategies are not always wise. At most, these strategies should be limited to periodic interrupt polling and retry attempts. Periodically retrying the device tells the driver when a device has recovered. Periodic polling can control the interrupt mechanism after a driver has been forced to disable interrupts.
Ideally, a system always has an alternative device to provide a vital system service. Service multiplexors in kernel or user space offer the best method of maintaining system services when a device fails. Such practices are beyond the scope of this section.