在 Oracle® Solaris 11.2 中管理设备

退出打印视图

更新时间: 2014 年 7 月
 
 

解决有故障的设备

设备弃用机制通过故障管理框架 (fault management framework, FMA) 隔离标记为有故障的设备。通过该功能,可以安全且自动地禁用故障设备,从而避免数据丢失、数据损坏或紧急情况和系统停机。弃用进程会考虑弃用设备后的系统稳定性。

永远不弃用关键设备。如果需要手动更换弃用的设备,请在更换设备后使用 fmadm repair 命令,以便系统知道设备已更换。

有关更多信息,请参见fmadm(1M)

弃用某个设备后,类似于以下内容的消息会显示在控制台上并记录在 /var/adm/messages 文件中。

Aug 9 18:14 starbug genunix: [ID 751201 kern.notice] \
     NOTICE: One or more I/O devices have been retired

可以使用 prtconf 命令来确定特定的弃用设备。例如:

# prtconf
.
.
.
pci, instance #2
scsi, instance #0
disk (driver not attached)
tape (driver not attached)
sd, instance #3
sd, instance #0 (retired)
scsi, instance #1 (retired)
disk (retired)
tape (retired)
pci, instance #3
network, instance #2 (driver not attached)
network, instance #3 (driver not attached)
os-io (driver not attached)
iscsi, instance #0
pseudo, instance #0
.
.
.

如何解决有故障的设备

使用下面的步骤解决有故障的设备或已弃用的设备。

  1. 使用 fmadm faulty 命令确定发生故障的设备。例如:
    # fmadm faulty
    --------------- ------------------------------------  -------------- ---------
    TIME            EVENT-ID                              MSG-ID SEVERITY
    --------------- ------------------------------------  -------------- ---------
    Jun 20 16:30:52 55c82fff-b709-62f5-b66e-b4e1bbe9dcb1  ZFS-8000-LR Major
    
    Problem Status    : solved
    Diag Engine       : zfs-diagnosis / 1.0
    System
    Manufacturer  : unknown
    Name          : ORCL,SPARC-T3-4
    Part_Number   : unknown
    Serial_Number : 1120BDRCCD
    Host_ID       : 84a02d28
    
    ----------------------------------------
    Suspect 1 of 1 :
    Fault class : fault.fs.zfs.open_failed
    Certainty   : 100%
    Affects     : zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/
    pool_name=pond/vdev_name=id1,sd@n5000c500335dc60f/a
    Status      : faulted and taken out of service
    
    FRU
    Name             : "zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/
    pool_name=pond/vdev_name=id1,sd@n5000c500335dc60f/a"
    Status        : faulty
    
    Description : ZFS device 'id1,sd@n5000c500335dc60f/a' in pool 'pond' failed to
    open.
    
    Response    : An attempt will be made to activate a hot spare if available.
    
    Impact      : Fault tolerance of the pool may be compromised.
    
    Action      : Use 'fmadm faulty' to provide a more detailed view of this event.
    Run 'zpool status -lx' for more information. Please refer to the
    associated reference document at
    http://support.oracle.com/msg/ZFS-8000-LR for the latest service
    procedures and policies regarding this diagnosis.
  2. 更换有故障或弃用的设备或清除设备错误。例如:
    # zpool clear pond c0t5000C500335DC60Fd0

    如果设备发生间歇错误,但没有更换该设备,则可以尝试清除先前的错误。

  3. 清除 FMA 故障。例如:
    # fmadm repaired zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/ \
    pool_name=pond/vdev_name=id1,sd@n5000c500335dc60f/a
    fmadm: recorded repair to of zfs://pool=86124fa573cad84e/vdev=25d36cd46e0a7f49/
    pool_name=pond/vdev_name=id1,sd@n5000c500335dc60f/a
  4. 确认故障已清除。
    # fmadm faulty

    如果错误已被清除,则 fmadm faulty 命令不会返回任何内容。