When a PSH fault is detected, a Solaris console message similar to Console Message Showing Fault Detected by PSH is displayed.
Example 1-8 Console Message Showing Fault Detected by PSHSUNW-MSG-ID: SUN4V-8000-DX, TYPE: Fault, VER: 1, SEVERITY: Minor EVENT-TIME: Wed Sep 14 10:09:46 EDT 2005 PLATFORM: SUNW,Sun-Netra-T5440, CSN: -, HOSTNAME: wgs48-37 SOURCE: cpumem-diagnosis, REV: 1.5 EVENT-ID: f92e9fbe-735e-c218-cf87-9e1720a28004 DESC: The number of errors associated with this memory module has exceeded acceptable levels. Refer to http://sun.com/msg/SUN4V-8000-DX for more information. AUTO-RESPONSE: Pages of memory associated with this memory module are being removed from service as errors are reported. IMPACT: Total system memory capacity will be reduced as pages are retired. REC-ACTION: Schedule a repair procedure to replace the affected memory module. Use fmdump -v -u <EVENT_ID> to identify the module.
Faults detected by the Solaris PSH facility are also reported through service processor alerts. ALOM CMT CLI Alert of PSH Diagnosed Fault depicts an ALOM CMT CLI alert of the same fault reported by Solaris PSH in ALOM CMT CLI Alert of PSH Diagnosed Fault.
Example 1-9 ALOM CMT CLI Alert of PSH Diagnosed FaultSC Alert: Host detected fault, MSGID: SUN4V-8000-DX
The ALOM CMT CLI showfaults command provides summary information about the fault. See Detecting Faults for more information about the showfaults command.
The fmdump command displays the list of faults detected by the Solaris PSH facility and identifies the faulty FRU for a particular EVENT_ID (UUID).
Do not use fmdump to verify a FRU replacement has cleared a fault because the output of fmdump is the same after the FRU has been replaced. Use the fmadm faulty command to verify the fault has cleared.
In Output from the fmdump -v Command, a fault is displayed, indicating the following details:
In PSH Message Output, the message ID SUN4V-8000-JA provides information for corrective action:
# fmdump -v -u fd940ac2-d21e-c94a-f258-f8a9bb69d05b TIME UUID SUNW-MSG-ID Jul 31 12:47:42.2007 fd940ac2-d21e-c94a-f258-f8a9bb69d05b SUN4V-8000-JA 100% fault.cpu.ultraSPARC-T2.misc_regs Problem in: cpu:///cpuid=16/serial=5D67334847 Affects: cpu:///cpuid=16/serial=5D67334847 FRU: hc://:serial=101083:part=541215101/motherboard=0 Location: MBExample 1-11 PSH Message Output
CPU errors exceeded acceptable levels Type Fault Severity Major Description The number of errors associated with this CPU has exceeded acceptable levels. Automated Response The fault manager will attempt to remove the affected CPU from service. Impact System performance may be affected. Suggested Action for System Administrator Schedule a repair procedure to replace the affected CPU, the identity of which can be determined using fmdump -v -u <EVENT_ID>. Details The Message ID: SUN4V-8000-JA indicates diagnosis has determined that a CPU is faulty. The Solaris fault manager arranged an automated attempt to disable this CPU. The recommended action for the system administrator is to contact Sun support so a Sun service technician can replace the affected component.
When the Solaris PSH facility detects faults the faults are logged and displayed on the console. In most cases, after the fault is repaired, the corrected state is detected by the system and the fault condition is repaired automatically. However, this must be verified and, in cases where the fault condition is not automatically cleared, the fault must be cleared manually.
PSH detected faults are distinguished from other kinds of faults by the text: Host detected fault.
Example:
sc> showfaults -v Last POST Run: Wed Jun 29 11:29:02 2007 Post Status: Passed all devices ID Time FRU Fault 0 Jun 30 22:13:02 /SYS/MB/CMP0/BR1/CH0/D0 Host detected fault, MSGID: SUN4V-8000-DX UUID: 7ee0e46b-ea64-6565-e684-e996963f7b86
Example:
sc> clearfault 7ee0e46b-ea64-6565-e684-e996963f7b86 Clearing fault from all indicted FRUs... Fault cleared.
In some cases, even though the fault is cleared, some persistent fault information remains and results in erroneous fault messages at boot time. To ensure that these messages are not displayed, perform the following Solaris command:
fmadm repair UUID
Example:
# fmadm repair 7ee0e46b-ea64-6565-e684-e996963f7b86