A device retirement mechanism isolates a device that is flagged as faulty by the fault management framework (FMA). This feature allows faulty devices to be safely and automatically inactivated to avoid data loss, data corruption, or panics and system down time. The retirement process takes into account the stability of the system after the device has been retired.
Critical devices are never retired. If you need to manually replace a retired device, use the fmadm repair command after the device replacement so that system knows that the device is replaced.
For more information, see fmadm(1M).
When a device is retired, a message similar to the following is displayed on the console and recorded on the /var/adm/messages file.
Aug 9 18:14 starbug genunix: [ID 751201 kern.notice] \ NOTICE: One or more I/O devices have been retired
You can use the prtconf command to identify specific retired devices. For example:
# prtconf . . . pci, instance #2 scsi, instance #0 disk (driver not attached) tape (driver not attached) sd, instance #3 sd, instance #0 (retired) scsi, instance #1 (retired) disk (retired) tape (retired) pci, instance #3 network, instance #2 (driver not attached) network, instance #3 (driver not attached) os-io (driver not attached) iscsi, instance #0 pseudo, instance #0 . . .
Use the steps that follow to resolve a faulty device or a device that has been retired.
# fmadm faulty --------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Jun 20 16:30:52 55c82fff-b709-62f5-b66e-b4e1bbe9dcb1 ZFS-8000-LR Major Problem Status : solved Diag Engine : zfs-diagnosis / 1.0 System Manufacturer : unknown Name : ORCL,SPARC-T3-4 Part_Number : unknown Serial_Number : 1120BDRCCD Host_ID : 84a02d28 ---------------------------------------- Suspect 1 of 1 : Fault class : fault.fs.zfs.open_failed Certainty : 100% Affects : zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/ pool_name=pond/vdev_name=id1,sd@n5000c500335dc60f/a Status : faulted and taken out of service FRU Name : "zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/ pool_name=pond/vdev_name=id1,sd@n5000c500335dc60f/a" Status : faulty Description : ZFS device 'id1,sd@n5000c500335dc60f/a' in pool 'pond' failed to open. Response : An attempt will be made to activate a hot spare if available. Impact : Fault tolerance of the pool may be compromised. Action : Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -lx' for more information. Please refer to the associated reference document at http://support.oracle.com/msg/ZFS-8000-LR for the latest service procedures and policies regarding this diagnosis.
# zpool clear pond c0t5000C500335DC60Fd0
If an intermittent device error occurred but the device was not replaced, you can attempt to clear the previous error.
# fmadm repaired zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/ \ pool_name=pond/vdev_name=id1,sd@n5000c500335dc60f/a fmadm: recorded repair to of zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/ pool_name=pond/vdev_name=id1,sd@n5000c500335dc60f/a
# fmadm faulty
If the error is cleared, the fmadm faulty command returns nothing.