System Diagnostics and Troubleshooting Scenarios

Run diagnostic tests to verify the operation of a server when it is newly installed, when it is upgraded or modified, and when it fails. The following sections list the common testing scenarios:

New System

  1. Before installing options into a new system, run these diagnostic tests in the following order:
    • HWdiag

    • UEFIdiag

    Tests failed: If the tests identify a server failure:

    • Check the server Product Notes or Release Notes for the product or option for any known conditions that might cause a diagnostic test to fail.

    • If the solution to the problem is not in the Product Notes or Release Notes, assume that the server was damaged in shipment. Terminate the installation process, and notify Oracle Service personnel. This ensures that the server is covered under warranty.

    If you experience a network connectivity problem when placing a server into service for the first time, ensure that the network access point for the server is activated.

    Tests passed: If the server passes the tests and has no optional components to install, you can place the server into service.

    If the server passes the test and you have optional components to install, install the options and re-run the tests.

    • If the server passes the test with the new components installed, you can place the server into service.

    • If the diagnostic tests reveal that a newly installed component is faulty, remove the component and return the component for replacement.

Upgraded System

  1. Before installing a server upgrade (memory, hard disk drives, I/O cards, or power supply) to an in-service server, take the server out of service and run these diagnostic tests in the following order:
    • HWdiag

    • UEFIdiag

  2. Install the server upgrade.
  3. In Oracle ILOM Health Status, view Open Problems to determine and fix any errors or faults.
  4. Run the HWdiag and UEFIdiag diagnostic tests again.
    • Tests failed: If the diagnostic tests fail, one of the installed options was faulty or the server was damaged when you installed the option. In either case, remove and replace the faulty component, run the diagnostic tests again to confirm that the problem has been corrected, and place the server into service.

    • Tests passed: Place the server into service.

    Note:

    If the failed component is a non-replaceable component on the server's motherboard, return the motherboard to Oracle for repair, or order a replacement motherboard and have it installed in the field by authorized Oracle Service personnel.

Production System

If the server has been operating problem-free for a long time, and then the Fault-Service Required indicator LED on the server illuminates, do the following:

  1. Check Oracle ILOM for Open Problems. See Administering Open Problems.
  2. If you find an open problem, take the appropriate action to repair or replace the faulty component.

    Oracle ILOM typically clears open problems after you repair or replace the faulty component.

  3. If the problem is not resolved, remove the AC power cords from the server and press the Fault Remind button on the motherboard, which illuminates any internal Fault LEDs and indicates which CRU or FRU is faulty.
  4. If the failed component is a customer-replaceable unit (CRU), replace it. For x86 servers, CRUs are defined in the server Service Manual and the Oracle System Handbook. You must have an account to access the handbook.

    You can access the Oracle System Handbook from My Oracle Support.

  5. If the failed component is a field-replaceable unit (FRU), initiate a service request with Oracle Service. FRUs are defined in the server Service Manual and Oracle System Handbook.

    Note:

    If the failed component is a non-replaceable component on the server motherboard, return the motherboard to Oracle for repair, or order a replacement motherboard and have it installed in the field by authorized Oracle Service personnel.