设备弃用机制通过故障管理框架 (fault management framework, FMA) 隔离标记为有故障的设备。通过该功能,可以安全且自动地禁用故障设备,从而避免数据丢失、数据损坏或紧急情况和系统停机。弃用进程会考虑弃用设备后的系统稳定性。
永远不弃用关键设备。如果需要手动更换弃用的设备,请在更换设备后使用 fmadm repair 命令,以便系统知道设备已更换。
有关更多信息,请参见fmadm(1M)。
弃用某个设备后,类似于以下内容的消息会显示在控制台上并记录在 /var/adm/messages 文件中。
Aug 9 18:14 starbug genunix: [ID 751201 kern.notice] \ NOTICE: One or more I/O devices have been retired
可以使用 prtconf 命令来确定特定的弃用设备。例如:
# prtconf . . . pci, instance #2 scsi, instance #0 disk (driver not attached) tape (driver not attached) sd, instance #3 sd, instance #0 (retired) scsi, instance #1 (retired) disk (retired) tape (retired) pci, instance #3 network, instance #2 (driver not attached) network, instance #3 (driver not attached) os-io (driver not attached) iscsi, instance #0 pseudo, instance #0 . . .
使用下面的步骤解决有故障的设备或已弃用的设备。
# fmadm faulty --------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Jun 20 16:30:52 55c82fff-b709-62f5-b66e-b4e1bbe9dcb1 ZFS-8000-LR Major Problem Status : solved Diag Engine : zfs-diagnosis / 1.0 System Manufacturer : unknown Name : ORCL,SPARC-T3-4 Part_Number : unknown Serial_Number : 1120BDRCCD Host_ID : 84a02d28 ---------------------------------------- Suspect 1 of 1 : Fault class : fault.fs.zfs.open_failed Certainty : 100% Affects : zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/ pool_name=pond/vdev_name=id1,sd@n5000c500335dc60f/a Status : faulted and taken out of service FRU Name : "zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/ pool_name=pond/vdev_name=id1,sd@n5000c500335dc60f/a" Status : faulty Description : ZFS device 'id1,sd@n5000c500335dc60f/a' in pool 'pond' failed to open. Response : An attempt will be made to activate a hot spare if available. Impact : Fault tolerance of the pool may be compromised. Action : Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -lx' for more information. Please refer to the associated reference document at http://support.oracle.com/msg/ZFS-8000-LR for the latest service procedures and policies regarding this diagnosis.
# zpool clear pond c0t5000C500335DC60Fd0
如果设备发生间歇错误,但没有更换该设备,则可以尝试清除先前的错误。
# fmadm repaired zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/ \ pool_name=pond/vdev_name=id1,sd@n5000c500335dc60f/a fmadm: recorded repair to of zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/ pool_name=pond/vdev_name=id1,sd@n5000c500335dc60f/a
# fmadm faulty
如果错误已被清除,则 fmadm faulty 命令不会返回任何内容。