显示有关故障或缺陷的信息

用来显示故障或缺陷信息以及确定相关的 FRU 的首选方法是 fmadm faulty 命令。不过，也支持 fmdump 命令。fmdump 通常用于显示系统中出现问题的历史记录，而 fmadm faulty 用于显示活动的问题。

注意 - 不要根据 fmdump 命令的输出执行管理操作，而应根据 fmadm faulty 的输出执行操作。日志文件可能会包含不应该被视为故障或缺陷的错误陈述。

如何显示关于故障组件的信息

成为管理员。
有关更多信息，请参见《Oracle Solaris 11.1 管理：安全服务》中的"如何使用指定给您的管理权限"。
显示关于组件的信息。
```
# fmadm faulty
```
有关生成的文本的说明，请参见以下示例。

示例 3-1 具有一个故障 CPU 的 fmadm 输出

1    # fmadm faulty
2    --------------- ------------------------------------  -------------- ---------
3    TIME            EVENT-ID                              MSG-ID         SEVERITY
4    --------------- ------------------------------------  -------------- ---------
5    Aug 24 17:56:03 7b83c87c-78f6-6a8e-fa2b-d0cf16834049  SUN4V-8001-8H  Minor
6    
7    Host        : bur419-61
8    Platform    : SUNW,T5440        Chassis_id  : BEL07524BN
9    Product_sn  : BEL07524BN
10
11   Fault class : fault.cpu.ultraSPARC-T2plus.ireg
12   Affects     : cpu:///cpuid=0/serial=1F95806CD1421929
13                     faulted and taken out of service
14   FRU         : "MB/CPU0" (hc://:product-id=SUNW,T5440:server-id=bur419-61:\
15                 serial=3529:part=541255304/motherboard=0/cpuboard=0)
16                     faulty
17   Serial ID.  : 3529
18                 1F95806CD1421929
19   
20   Description : The number of integer register errors associated with this thread
21                 has exceeded acceptable levels.
22   
23   Response    : The fault manager will attempt to remove the affected thread from
24                 service.
25   
26   Impact      : System performance may be affected.
27   
28   Action      : Use 'fmadm faulty' to provide a more detailed view of this event.
29                 Please refer to the associated reference document at
30                 http://support.oracle.com/msg/SUN4V-8001-8H for the latest service
31                 procedures and policies regarding this diagnosis.

其中的主要关注点是第 14 行，这显示了关于受影响的 FRU 的数据。引号中提供了易读的位置字符串 "MB/CPU0"。引号中的值用于与物理硬件上的标签进行匹配。另外，还以故障管理资源标识符 (Fault Management Resource Identifier, FMRI) 格式提供了 FRU，其中包括了关于包含故障的系统的描述性属性，例如它的主机名和机箱序列号。在支持它的平台上，FRU 的 FMRI 中还会包括 FRU 的部件号和序列号。

Affects 行（第 12 和 13 行）指示受故障影响的组件及其相关状态。在此示例中，有一个 CPU 导线束受到影响。它 faulted and taken out of service（发生故障并被从服务中删除）。

在 fmadm faulty 命令输出中，在 FRU 说明之后，第 16 行显示了状态，其状态为 faulty。在 Action 部分中，通常会引用 fmadm 命令，也可能只包括其他特定操作，或者同时包括这些信息。

示例 3-2 具有多个故障的 fmadm 输出

1    # fmadm faulty
2    --------------- ------------------------------------  -------------- -------
3    TIME            EVENT-ID                              MSG-ID         SEVERITY
4    --------------- ------------------------------------  -------------- -------
5    Sep 21 10:01:36 d482f935-5c8f-e9ab-9f25-d0aaafec1e6c  PCIEX-8000-5Y  Major
6    
7    Fault class  : fault.io.pci.device-invreq
8    Affects      : dev:///pci@0,0/pci1022,7458@11/pci1000,3060@0
9                   dev:///pci@0,0/pci1022,7458@11/pci1000,3060@1
10                   ok and in service
11                  dev:///pci@0,0/pci1022,7458@11/pci1000,3060@2
12                  dev:///pci@0,0/pci1022,7458@11/pci1000,3060@3
13                    faulty and taken out of service
14   FRU          : "SLOT 2" (hc://.../pciexrc=3/pciexbus=4/pciexdev=0)
15                    repair attempted
16                  "SLOT 3" (hc://.../pciexrc=3/pciexbus=4/pciexdev=1)
17                    acquitted
18                  "SLOT 4" (hc://.../pciexrc=3/pciexbus=4/pciexdev=2)
19                    not present
20                  "SLOT 5" (hc://.../pciexrc=3/pciexbus=4/pciexdev=3)
21                    faulty
22   
23    Description  : The transmitting device sent an invalid request.
24   
25    Response     : One or more device instances may be disabled
26   
27    Impact       : Possible loss of services provided by the device instances
28                   associated with this fault
29   
30    Action       : Use 'fmadm faulty' to provide a more detailed view of this event.
31                   Please refer to the associated reference document at
32                   http://support.oracle.com/msg/PCIEX-8000-5Y for the latest service
33                   procedures and policies regarding this diagnosis.

在 fmadm faulty 命令输出中，在 FRU 说明之后，第 21 行显示了状态，其状态为 faulty。在其他情况下可能会看到的其他状态值包括 acquitted 和 repair attempted，如第 15 行和第 17 行中的 SLOT 2 和 SLOT 3 所示。

示例 3-3 使用 fmdump 命令显示故障

某些控制台消息和知识库文章可能会指示您使用旧的 fmdump -v -u UUID 命令显示故障信息。尽管首选使用 fmadm faulty 命令，但 fmdump 命令仍可执行，如以下示例所示：

1    % fmdump -v -u 7b83c87c-78f6-6a8e-fa2b-d0cf16834049
2    TIME                 UUID                                 SUNW-MSG-ID EVENT
3    Aug 24 17:56:03.4596 7b83c87c-78f6-6a8e-fa2b-d0cf16834049 SUN4V-8001-8H Diagnosed
4      100%  fault.cpu.ultraSPARC-T2plus.ireg
5
6            Problem in: -
7               Affects: cpu:///cpuid=0/serial=1F95806CD1421929
8                   FRU: hc://:product-id=SUNW,T5440:server-id=bur419-61:\
9                   serial=9999:part=541255304/motherboard=0/cpuboard=0
10              Location: MB/CPU0

仍然提供了有关受影响的 FRU 的信息，不过是在三个行中分别列出的（第 8 行到第 10 行）。Location（位置）字符串提供了易读的 FRU 字符串。FRU 行提供了正式的 FMRI。请注意，使用 fmdump 命令不会显示严重性、描述性文本和操作，除非使用 -m 选项。有关更多信息，请参见 fmdump(1M) 手册页。

如何识别哪些 CPU 处于脱机状态

显示关于 CPU 的信息。

% /usr/sbin/psrinfo 
0       faulted   since 05/13/2011 12:55:26 
1       on-line   since 05/12/2011 11:47:26

faulted 状态表示 CPU 已被故障管理响应代理置于脱机状态。

如何显示关于有缺陷的服务的信息

成为管理员。
有关更多信息，请参见《Oracle Solaris 11.1 管理：安全服务》中的"如何使用指定给您的管理权限"。

显示关于缺陷的信息。

# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
May 12 22:52:47 915cb64b-e16b-4f49-efe6-de81ff96fce7  SMF-8000-YX    major

Host        : parity
Platform    : Sun-Fire-V40z     Chassis_id  : XG051535088
Product_sn  : XG051535088

Fault class : defect.sunos.smf.svc.maintenance
Affects     : svc:///system/intrd:default
                  faulted and taken out of service
Problem in  : svc:///system/intrd:default
                  faulted and taken out of service

Description : A service failed - it is restarting too quickly.

Response    : The service has been placed into the maintenance state.

Impact      : svc:/system/intrd:default is unavailable.

Action      : Run 'svcs -xv svc:/system/intrd:default' to determine the
              generic reason why the service failed, the location of any
              logfiles, and a list of other services impacted. Please refer to
              the associated reference document at
              http://support.oracle.com/msg/SMF-8000-YX for the latest service procedures
              and policies regarding this diagnosis.

显示关于有缺陷的服务的信息。

按照 fmadm 输出中的 Action 部分提供的说明进行操作。

# svcs -xv svc:/system/intrd:default
svc:/system/intrd:default (interrupt balancer)
 State: maintenance since Wed May 12 22:52:47 2010
Reason: Restarting too quickly.
   See: http://support.oracle.com/msg/SMF-8000-YX
   See: man -M /usr/share/man -s 1M intrd
   See: /var/svc/log/system-intrd:default.log
Impact: This service is not running.

有关修复此问题的进一步说明，请参阅知识库文章 SMF-8000-YX。

跳过导航链接
退出打印视图
	在 Oracle Solaris 11.1 中管理服务和故障 Oracle Solaris 11.1 Information Library (简体中文)