There are three ways to tell when a fault has occurred somewhere in the system:
The amber Service Action Required LEDs on the failed component and on the system chassis are illuminated (see Monitor Service Action Required LEDs below).
Component status information, available through the ILOM web interface and CLI, register the component as being in a faulted state (see Monitor Faults via Management Interfaces below).
The occurrence of the fault is recorded in the system event log (see Monitor Event Log below).
When a component experiences a hardware failure (enters a faulted state), fault management illuminates the Service Action Required (amber) LED on that component. In addition, fault management illuminates the Service Action Required LEDs on the system chassis (both front and back) when any system component is in a faulted state.
Since a Service Action Required LED indicates a hardware failure, it remains illuminated until fault management detects that the failed hardware has been replaced or repaired. The chassis Service Action Required LEDs – which serve as summary indicators for all component faults – remain illuminated as long as any system component remains in a faulted state.
If the chassis Service Action Required LEDs are illuminated but no other system component displays its Service Action Required LED, then fault management has diagnosed an external fault: a problem outside the system that potentially affects the system as a whole. For example, if the external ambient air temperature exceeds 43oC (109.4oF), a fault is declared and the system will shut down although there is nothing physically wrong with any system hardware.
Refer to Chassis Faults for information about the external conditions that can cause these chassis faults.
The power supply units (PSUs) are a special case: they monitor their own fault status and control their own Service Action Required LEDs. The fault management software cannot turn the PSU LEDs on or off. However, because fault management is monitoring sensors on the PSUs, it is notified when a PSU fault occurs. Fault management illuminates the chassis Service Action Required LEDs and notes the fault occurrence in the ILOM management interfaces and in the event log.
Note that it is possible for a PSU to extinguish its Service Action Required LED (declare that the fault is cleared), but for fault management to continue to assert that the PSU is still in a faulted state. If this happens, the ILOM management interfaces, the chassis Service Action Required LEDs, and the event log reflect that the faulted state is ongoing.
Refer to Power Supply Faults for more information.
To check the fault status of system components from the web interface:
Log in to the ILOM web interface.
Select the System Information tab.
Select the Components tab.
The Components page appears. This page displays the Component Management Status table which lists system components and displays their Fault Status.
Locate the faulted component.
Look for the component whose Fault Status is listed as Faulted. Note that if any component is faulted, then the system chassis itself (/CH) is also listed as Faulted.
Refer to CLI Overview to find out about object namespace and how to identify the targets and properties that may pertain to faults.
Open an ILOM CLI window.
Issue the appropriate show command to display information about system components.
For example, when you are at the chassis level (/CH) and a component is in a faulted state, the fault_state of the chassis is listed as Faulted, as illustrated below.
-> show
/CH
Targets:
.
.
.
Properties:
type = Chassis
fault_state = Faulted
clear_fault_action = (none)
->
Drill down to the component that has failed, and issue the show command again to confirm that the component's fault_state = Faulted.
The following example shows that the fault_state of rear fan module 0 (/CH/RFM0) is Faulted.
->cd /CH/RFM0
/CH/RFM0 ->show
/CH/RFM0 Targets: SERVICE ACT FAN1_OK FAN2_OK FAN1_SPEED FAN2_SPEED Properties: type = Rear Fan FRU fault_state = Faulted clear_fault_action = (none) prepare_to_remove_status = NotReady prepare_to_remove_action = (none) return_to_service_action = (none) ->
Faults are recorded in the system event log, which can be viewed from both the ILOM web interface and the CLI.
Log in to the ILOM web interface.
Select the System Monitoring tab.
Select the Event Logs tab.
Faults are listed with a Class of Fault, a timestamp of when the fault occurred, and a description of the fault. Note that if you are looking for a fault that occurred recently, it is likely to be near the end of the log. A fault entry appears something like the example below, which lists a fault that occurred on power supply 3 (/CH/PS3).
4 Mon May 1 13:17:22 2006 FMA Fault critical Fault detected at time = Mon May 1 13:17:22 2006. The suspect component: /CH/PS3 has FAULT:powersupply_temperature_ps with probability=100
See View the Event Log for instructions on finding and interpreting the contents of the event log.
Open an ILOM CLI window.
Issue the following command to view the event log:
show /CMM/logs/event list
You can scroll through the log output to review its contents. A fault entry appears something like the example below, which lists a fault that occurred on power supply 3 (/CH/PS3).
4 Mon May 1 13:17:22 2006 FMA Fault critical Fault detected at time = Mon May 1 13:17:22 2006. The suspect component: /CH/PS3 has FAULT:powersupply_temperature_ps with probability=100
See View the Event Log for instructions on finding and interpreting the contents of the event log.