Go to main content

Oracle® Exadata Database Server X8-8 Service Manual

Exit Print View

Updated: June 2020
 
 

Troubleshooting Server Hardware Faults

When a server hardware fault event occurs, the system lights the Fault-Service Required LED and captures the event in the Oracle ILOM event log. If you set up notifications through Oracle ILOM, you also receive an alert through the notification method you chose. When you become aware of a hardware fault, address it immediately.

To investigate a hardware fault, see the following:

Basic Troubleshooting Steps

When a server encounters a fault, the fault is recorded in a common fault database. The fault is then reported by the server in one of several ways, depending on the type and severity of the fault.

Use the following process to address a suspected hardware fault:

  1. Review the Oracle Server X8-8 Product Notes for late-breaking server information, and hardware-related issues.

    Refer to Oracle Server X8-8 Product Notes at: https://www.oracle.com/goto/x8-8/docs

  2. Investigate the hardware fault. Identify the hardware issue.

    Select one of the following methods to identify the failed component and server subsystem containing the fault.

    • Log in to Oracle ILOM. See Identify Hardware Faults (Oracle ILOM).

    • Log in to the Oracle ILOM service processor from the Oracle ILOM Fault Management Shell and issue the fmadm faulty command.

      For more information about how to use the Oracle ILOM Fault Management Shell and supported commands, see the Oracle ILOM User's Guide for System Monitoring and Diagnostics Firmware Release 4.0.x in the Oracle Integrated Lights Out Manager (ILOM) 4.0 Documentation Library at https://www.oracle.com/goto/ilom/docs.

    If you determine that the hardware fault requires service, continue.

  3. Prepare the server for service.

    See Preparing for Service. You can use Oracle ILOM to power off the server, activate the Locate Button/LED, and take the server offline.

    Obtain physical access to the server. Before servicing the server, prepare the work space to ensure ESD protection for the server and components.

  4. Service replaceable server components.

    See Servicing Components for FRU and CRU removal, installation, and replacement procedures in this document.


    Note -  A component designated as a FRU must be replaced by Oracle Service personnel. Contact Oracle Service.
  5. Return the server to service.

    See Returning the Server to Operation.

  6. Clear the fault in Oracle ILOM (optional).

    Most components include a FRU ID to clear the fault automatically. You might need to clear the fault in Oracle ILOM, depending on the component requirements.

    See Clear Hardware Fault Messages (Oracle ILOM).