Error handling activity might begin at the time that the error is detected by the operating system via a trap or error interrupt. If the software responsible for handling the error (the error handler) cannot immediately isolate the device that was involved in the failed I/O operation, it must attempt to find a software module within the device tree that can perform the error isolation. The Solaris device tree provides a structural means to propagate nexus driver error handling activities to children who might have a more detailed understanding of the error and can capture error state and isolate the problem device.
A driver can register an error handler callback with the I/O Fault Services Framework. The error handler should be specific to the type of error and subsystem where error detection has occurred. When the driver's error handler routine is invoked, the driver must check for any outstanding errors associated with device transactions and generate ereport events. The driver must also return error handler status in its ddi_fm_error(9S) structure. For example, if it has been determined that the system's integrity has been compromised, the most appropriate action might be for the error handler to panic the system.
The callback is invoked by a parent nexus driver when an error might be associated with a particular device instance. Device drivers that register error handlers must be DDI_FM_ERRCB_CAPABLE.
void ddi_fm_handler_register(dev_info_t *dip, ddi_err_func_t handler, void *impl_data)
The ddi_fm_handler_register(9F) routine registers an error handler callback with the I/O fault services framework. The ddi_fm_handler_register() function should be called in the driver's attach(9E) entry point for callback registration following driver fault management initialization (ddi_fm_init()).
The error handler callback function must do the following:
Check for any outstanding hardware errors associated with device transactions, and generate ereport events for diagnosis. For a PCI, PCI-x, or PCI express device this can generally be done using pci_ereport_post() as described in Detecting and Reporting PCI-Related Errors.
Return error handler status in its ddi_fm_error structure:
DDI_FM_OK
DDI_FM_FATAL
DDI_FM_NONFATAL
DDI_FM_UNKNOWN
Driver error handlers receive the following:
A pointer to a device instance (dip) under the driver's control
A data structure (ddi_fm_error) that contains common fault management data and status for error handling
A pointer to any implementation specific data (impl_data) specified at the time of the handler's registration
The ddi_fm_handler_register() and ddi_fm_handler_unregister(9F) routines must be called from kernel context in a driver's attach(9E) or detach(9E) entry point. The registered error handler callback can be called from kernel, interrupt, or high-level interrupt context. Therefore the error handler:
Must not hold locks
Must not sleep waiting for resources
A device driver is responsible for:
Isolating the device instance that might have caused errors
Recovering transactions associated with errors
Reporting the service impact of errors
Scheduling device shutdown for errors considered fatal
These actions can be carried out within the error handler function. However, because of the restrictions on locking and because the error handler function does not always know the context of what the driver was doing at the point where the fault occurred, it is more usual for these actions to be carried out following inline calls to ddi_fm_acc_err_get(9F) and ddi_fm_dma_err_get(9F) within the normal paths of the driver as described previously.
/* * The I/O fault service error handling callback function */ /*ARGSUSED*/ static int bge_fm_error_cb(dev_info_t *dip, ddi_fm_error_t *err, const void *impl_data) { /* * as the driver can always deal with an error * in any dma or access handle, we can just return * the fme_status value. */ pci_ereport_post(dip, err, NULL); return (err->fme_status); }