Writing Device Drivers

Handling Stuck Interrupts

The driver must identify stuck interrupts because a persistently asserted interrupt severely affects system performance, almost certainly stalling a single-processor machine.

Sometimes the driver might have difficulty identifying a particular interrupt as invalid. For network drivers, if a receive interrupt is indicated but no new buffers have been made available, no work was needed. When this situation is an isolated occurrence, it is not a problem, since the actual work might already have been completed by another routine such as a read service.

On the other hand, continuous interrupts with no work for the driver to process can indicate a stuck interrupt line. For this reason, platforms allow a number of apparently invalid interrupts to occur before taking defensive action.

While appearing to have work to do, a hung device might be failing to update its buffer descriptors. The driver should defend against such repetitive requests.

In some cases, platform-specific bus drivers might be capable of identifying a persistently unclaimed interrupt and can disable the offending device. However, this relies on the driver's ability to identify the valid interrupts and return the appropriate value. The driver should return a DDI_INTR_UNCLAIMED result unless the driver detects that the device legitimately asserted an interrupt. The interrupt is legitimate only if the device actually requires the driver to do some useful work.

The legitimacy of other, more incidental, interrupts is much harder to certify. An interrupt-expected flag is a useful tool for evaluating whether an interrupt is valid. Consider an interrupt such as descriptor free, which can be generated if all the device's descriptors had been previously allocated. If the driver detects that it has taken the last descriptor from the card, it can set an interrupt-expected flag. If this flag is not set when the associated interrupt is delivered, the interrupt is suspicious.

Some informative interrupts might not be predictable, such as one that indicates that a medium has become disconnected or frame sync has been lost. The easiest method of detecting whether such an interrupt is stuck is to mask this particular source on first occurrence until the next polling cycle.

If the interrupt occurs again while disabled, the interrupt should be considered false. Some devices have interrupt status bits that can be read even if the mask register has disabled the associated source and might not be causing the interrupt. You can devise a more appropriate algorithm specific to your devices.

Avoid looping on interrupt status bits indefinitely. Break such loops if none of the status bits set at the start of a pass requires any real work.