When a fault indicates a hardware failure, the appropriate and recommended method for clearing the fault is to replace the failed component. This ensures that fault management will notice when the fault has been cleared, and will both extinguish the Service Action Required LEDs and update the ILOM management interfaces to reflect the cleared fault.
You can track and repair together faults that span multiple components by checking which faults have matching UUIDs. For example, DIMMs are faulted in pairs, so that each DIMM has a 50 percent probability that it is faulted. If you clear a fault on one DIMM, the other affected DIMM would also have its fault cleared. Clearing a fault on one component may cause other faults within the system to be cleared automatically.
When a fault is cleared, the following message is logged:
Fault <class> on component <FRU NAC name> cleared
When all faults on a component are cleared, the following message is logged:
Component <FRU NAC name> repaired
For most faults, you can use the management interfaces to manually issue a command to clear a component's fault state. However, if you do so and the component sensors determine that the fault condition still exists, the fault will immediately be declared again.
Determine which system component has experienced a hardware failure.
Look at the Service Action Required LEDs, the component status (from the ILOM web interface or CLI), and/or the event log to get information about the component failure. (See Determe Whether Hardware Failed.)
Remove and replace the failed component.
If necessary, refer to the hot-plug instructions for the failed component to correctly and safely remove and replace it.
Monitor the component LEDs and/or the management interfaces to confirm that fault management has cleared the fault.
Manually clearing a fault condition from the ILOM web interface does not correct an underlying hardware failure.
Log in to the ILOM web interface.
Select the System Information tab.
Select the Components tab.
The Components page appears with the Components Management Status table displayed.
Locate the component whose Fault Status is listed as Faulted.
Note that when there is a faulted component, the chassis's Fault Status is also listed as Faulted.
Select the failed component.
Select Clear Fault from the Actions drop-down box.
The component's Fault Status will be updated to OK, and its Service Action Required LED will be extinguished. However, if the fault condition persists, the component will return to its faulted state almost immediately and its Service Action Required LED will be re-illuminated.
Open an ILOM CLI window.
Use the
cd
command to move to the component that has failed.
Use the
show
command to confirm the component's fault_state.
The following example checks the status of rear fan module 0 (/CH/RFM0) and shows that its fault_state = Faulted.
->cd /CH/RFM0
/CH/RFM0 ->show
/CH/RFM0 Targets: SERVICE ACT FAN1_OK FAN2_OK FAN1_SPEED FAN2_SPEED Properties: type = Rear Fan FRU fault_state = Faulted clear_fault_action = (none) prepare_to_remove_status = NotReady prepare_to_remove_action = (none) return_to_service_action = (none) ->
Issue the
set clear_fault_action=true
command for the faulted component.
The following command clears the fault state for rear fan module 0 (/CH/RFM0):
set /CH/RFM0 clear_fault_action=true
You can use the
show
command again to confirm that the fault_state has changed to OK. However, if the fault condition persists,
the component will return to its faulted state almost immediately.