Use the fmadm faulty command to display fault or defect information and determine which FRUs are involved. The fmadm faulty command displays active problems. The fmdump command displays the contents of log files associated with the Fault Manager daemon and is more useful as a historical log of problems on the system.
The fmadm faulty command displays status information for resources that the Fault Manager identifies as faulty. The fmadm faulty command has many options for displaying different information or displaying information in different formats. See the fmadm(1M) man page for information about all the fmadm faulty options.
Example 2-1 fmadm faulty Output Showing One Faulty CPU1 # fmadm faulty 2 --------------- ------------------------------------ -------------- --------- 3 TIME EVENT-ID MSG-ID SEVERITY 4 --------------- ------------------------------------ -------------- --------- 5 Aug 24 17:56:03 7b83c87c-78f6-6a8e-fa2b-d0cf16834049 SUN4V-8001-8H Minor 6 7 Host : bur419-61 8 Platform : SUNW,T5440 Chassis_id : BEL07524BN 9 Product_sn : BEL07524BN 10 11 Fault class : fault.cpu.ultraSPARC-T2plus.ireg 12 Affects : cpu:///cpuid=0/serial=1F95806CD1421929 13 faulted and taken out of service 14 FRU : "MB/CPU0" (hc://:product-id=SUNW,T5440:server-id=bur419-61:\ 15 serial=3529:part=541255304/motherboard=0/cpuboard=0) 16 faulty 17 Serial ID. : 3529 18 1F95806CD1421929 19 20 Description : The number of integer register errors associated with this thread 21 has exceeded acceptable levels. 22 23 Response : The fault manager will attempt to remove the affected thread from 24 service. 25 26 Impact : System performance may be affected. 27 28 Action : Use 'fmadm faulty' to provide a more detailed view of this event. 29 Please refer to the associated reference document at 30 http://support.oracle.com/msg/SUN4V-8001-8H for the latest service 31 procedures and policies regarding this diagnosis.
Line 14 identifies the impacted FRU. The string shown in quotation marks, “MB/CPU0,” should match the label on the physical hardware. The string shown in parentheses is the Fault Management Resource Identifier (FMRI) for the FRU. The FMRI includes descriptive properties about the system that contains the fault, such as its host name and chassis serial number. On some platforms, the part number and serial number of the FRU are also included in the FMRI of the FRU.
The Affects lines (lines 12 and 13) indicate the components that are affected by the fault and their relative state. In this example, a single CPU strand is affected. That CPU strand is faulted and has been taken out of service by the Fault Manager.
Following the FRU description in the fmadm faulty command output, line 16 shows the state as faulty. The Action section might include specific actions in addition to references to documents on the support site.
Example 2-2 fmadm faulty Output Showing Multiple Faults1 # fmadm faulty 2 --------------- ------------------------------------ -------------- ------- 3 TIME EVENT-ID MSG-ID SEVERITY 4 --------------- ------------------------------------ -------------- ------- 5 Sep 21 10:01:36 d482f935-5c8f-e9ab-9f25-d0aaafec1e6c PCIEX-8000-5Y Major 6 7 Fault class : fault.io.pci.device-invreq 8 Affects : dev:///pci@0,0/pci1022,7458@11/pci1000,3060@0 9 dev:///pci@0,0/pci1022,7458@11/pci1000,3060@1 10 ok and in service 11 dev:///pci@0,0/pci1022,7458@11/pci1000,3060@2 12 dev:///pci@0,0/pci1022,7458@11/pci1000,3060@3 13 faulty and taken out of service 14 FRU : "SLOT 2" (hc://.../pciexrc=3/pciexbus=4/pciexdev=0) 15 repair attempted 16 "SLOT 3" (hc://.../pciexrc=3/pciexbus=4/pciexdev=1) 17 acquitted 18 "SLOT 4" (hc://.../pciexrc=3/pciexbus=4/pciexdev=2) 19 not present 20 "SLOT 5" (hc://.../pciexrc=3/pciexbus=4/pciexdev=3) 21 faulty 22 23 Description : The transmitting device sent an invalid request. 24 25 Response : One or more device instances may be disabled 26 27 Impact : Possible loss of services provided by the device instances 28 associated with this fault 29 30 Action : Use 'fmadm faulty' to provide a more detailed view of this event. 31 Please refer to the associated reference document at 32 http://support.oracle.com/msg/PCIEX-8000-5Y for the latest service 33 procedures and policies regarding this diagnosis.
In this output, device 1 in slot 3 is described as “ok and in service” on line 10, and line 17 shows its state as “acquitted.” Device 3 in slot 5 is described as “faulty and taken out of service,” and its state is “faulty.” States shown for two other devices are “repair attempted” and “not present.”
Example 2-3 Showing Faults With the fmdump CommandSome console messages and knowledge articles might instruct you to use the fmdump -v -u UUID command to display fault information, as shown in the following example:
1 # fmdump -v -u 7b83c87c-78f6-6a8e-fa2b-d0cf16834049 2 TIME UUID SUNW-MSG-ID EVENT 3 Aug 24 17:56:03.4596 7b83c87c-78f6-6a8e-fa2b-d0cf16834049 SUN4V-8001-8H Diagnosed 4 100% fault.cpu.ultraSPARC-T2plus.ireg 5 6 Problem in: - 7 Affects: cpu:///cpuid=0/serial=1F95806CD1421929 8 FRU: hc://:product-id=SUNW,T5440:server-id=bur419-61:\ 9 serial=9999:part=541255304/motherboard=0/cpuboard=0 10 Location: MB/CPU0
The information about the affected FRUs is on lines 8 through 10. The Location string on line 10 presents the human-readable FRU string. Line 8 shows the FMRI of the FRU. To see the severity, descriptive text, and action in the fmdump output, use the -m option. See the fmdump(1M) man page for more information.
Example 2-4 Identifying Which CPUs Are OfflineUse the psrinfo command to display information about the CPUs:
$ psrinfo 0 faulted since 05/13/2013 12:55:26 1 on-line since 05/12/2013 11:47:26
The faulted state in this example indicates that the CPU has been taken offline by a Fault Manager response agent.