Clear Faults

When a fault indicates a hardware failure, the appropriate and recommended method for clearing the fault is to replace the failed component. This ensures that fault management will notice when the fault has been cleared, and will both extinguish the Service Action Required LEDs and update the ILOM management interfaces to reflect the cleared fault.

You can track and repair together faults that span multiple components by checking which faults have matching UUIDs. For example, DIMMs are faulted in pairs, so that each DIMM has a 50 percent probability that it is faulted. If you clear a fault on one DIMM, the other affected DIMM would also have its fault cleared. Clearing a fault on one component may cause other faults within the system to be cleared automatically.

When a fault is cleared, the following message is logged:

Fault <class> on component <FRU NAC name> cleared

When all faults on a component are cleared, the following message is logged:

Component <FRU NAC name> repaired

For most faults, you can use the management interfaces to manually issue a command to clear a component's fault state. However, if you do so and the component sensors determine that the fault condition still exists, the fault will immediately be declared again.

Replace Faulted Component

  1. Determine which system component has experienced a hardware failure.

    Look at the Service Action Required LEDs, the component status (from the ILOM web interface or CLI), and/or the event log to get information about the component failure. (See Determe Whether Hardware Failed.)

  2. Remove and replace the failed component.

    If necessary, refer to the hot-plug instructions for the failed component to correctly and safely remove and replace it.

  3. Monitor the component LEDs and/or the management interfaces to confirm that fault management has cleared the fault.

Clear Fault From the Web Interface

Caution

Manually clearing a fault condition from the ILOM web interface does not correct an underlying hardware failure.

  1. Log in to the ILOM web interface.

  2. Select the System Information tab.

  3. Select the Components tab.

    The Components page appears with the Components Management Status table displayed.

  4. Locate the component whose Fault Status is listed as Faulted.

    Note that when there is a faulted component, the chassis's Fault Status is also listed as Faulted.

  5. Select the failed component.

  6. Select Clear Fault from the Actions drop-down box.

    The component's Fault Status will be updated to OK, and its Service Action Required LED will be extinguished. However, if the fault condition persists, the component will return to its faulted state almost immediately and its Service Action Required LED will be re-illuminated.

Clear Fault From the CLI

Caution

Clearing a fault condition from the CLI does not correct an underlying hardware failure.

  1. Open an ILOM CLI window.

  2. Use the cd command to move to the component that has failed.

  3. Use the show command to confirm the component's fault_state.

    The following example checks the status of rear fan module 0 (/CH/RFM0) and shows that its fault_state = Faulted.

    -> cd /CH/RFM0
    /CH/RFM0
    
    
    -> show
    
    
    /CH/RFM0
    Targets:
    	SERVICE
    	ACT
    	FAN1_OK
    	FAN2_OK
    	FAN1_SPEED
    	FAN2_SPEED
    
    
    Properties:
    	type = Rear Fan FRU
    	fault_state = Faulted 
    	clear_fault_action = (none)
    	prepare_to_remove_status = NotReady
    	prepare_to_remove_action = (none)
    	return_to_service_action = (none)
    
    
    ->
  4. Issue the set clear_fault_action=true command for the faulted component.

    The following command clears the fault state for rear fan module 0 (/CH/RFM0):

    set /CH/RFM0 clear_fault_action=true

You can use the show command again to confirm that the fault_state has changed to OK. However, if the fault condition persists, the component will return to its faulted state almost immediately.

Related Topics