Preliminary Troubleshooting Procedures

This section describes the troubleshooting actions that might help you identify problems quickly and prepare for the more extensive troubleshooting procedures.

Check for Known Issues

Product Notes and Release Notes documents provide information about late-breaking issues or problems. They include a description of each issue or problem and methods to repair or work around it.

  1. Check the server Product Notes or software Release Notes for known issues related to the problem you are trying to fix.

    You can often find the problem and its solution in the Product Notes or the Release Notes.

    Product Notes and Release Notes sometimes contain information about the diagnostic tools themselves. For example, they might say that under certain circumstances, a specific diagnostic test failure can be ignored.

  2. If you find the problem described in the document, follow the instructions to repair it or work around it.

    Often, following the instructions in the Product Notes or the Release Notes is the first and last step in troubleshooting a problem with your server.

Gather Information for Service

Next, gather information from the service call or the on-site personnel.

  1. Collect information about the following items:
    • Events that occurred before the failure

    • Whether any hardware or software was modified or installed

    • Whether the server was recently installed or moved

    • How long the server exhibited symptoms

    • The duration or frequency of the problem

  2. Document the server settings before you make any changes.

    If possible, make one change at a time to isolate potential problems. In this way, you can maintain a controlled environment and reduce the scope of troubleshooting.

  3. Record the results of any change you make. Include any errors or informational messages.
  4. Check for potential device conflicts, especially if you added a new device.
  5. Check for version dependencies, especially with third-party software.

    For more information, see Information to Gather for Troubleshooting.

Troubleshoot Power Problems

If the server does not power on:

  1. Check that AC power cords are attached firmly to the server power supplies and to the AC sources.
  2. Check the power supply (PS) Fault LED on the power supplies. If the PS LED is lit, that power supply is in a faulted state.
  3. Check that the System OK LED on the server front panel is steady on, which indicates the server is in Main power mode. If it is blinking, the server is in Standby power mode.

    For instructions to bring the server to Main power mode, refer to the server Installation Guide.

  4. Check the system for faults using Oracle ILOM.
  5. Run the hwdiag cpld vr_check test and inspect the output for errors.

    This test checks the complex programmable logic device (CPLD). For information about the HWdiag utility, see Using the Oracle ILOM Diag Shell.

Inspect the External System

  1. Inspect the external status indicator LEDs, which can indicate component malfunction.

    For the LED locations and descriptions of their behavior, refer to the server Service Manual.

  2. Verify that nothing in the server environment is blocking airflow or making a contact that could short out power.
  3. If the problem is not evident, continue with Inspect the Internal System.

Inspect the Internal System

  1. Choose a method for shutting down the server from Main power mode to Standby power mode.
    • Graceful shutdown: Press and release the On/Standby button on the front panel. This causes Advanced Configuration and Power Interface (ACPI)-enabled operating systems to perform an orderly shutdown of the operating system. Servers not running ACPI-enabled operating systems shut down to Standby power mode immediately.

    • Emergency shutdown: Press and hold the On/Standby button for five seconds to force Main power off and enter Standby power mode.

      When the system is in Standby power mode, the System OK LED blinks.

      Caution:

      When the server is in Standby power mode, power is still directed to the service processor board and the power supply fans. To remove power completely, disconnect the AC power cords from the server back panel.

  2. Remove the chassis top cover to view the server internal components.

    Refer to the server Service Manual for details.

  3. Inspect the internal status indicator LEDs, as described in the Service Manual.
  4. Verify that there are no loose or improperly seated components.
  5. Verify that all cable connectors inside the system are firmly and correctly attached to their appropriate connectors.
  6. Verify that any after-factory components are qualified and supported.

    For a list of supported PCIe cards and memory modules (DIMMs), refer to the server Service Manual and Product Notes.

  7. Check that the installed DIMMs comply with the supported DIMM population rules and configurations.

    Refer to the server Service Manual for information about DIMMs.

  8. Replace any faulty component.

    Refer to the server Service Manual for component remove and replace procedures.

  9. To restore Main power mode to the server, that is, all components powered on, press and release the On/Standby button on the server front panel.

    When Main power is applied to the full server, the System OK LED next to the On/Standby button blinks intermittently until BIOS POST finishes, then the LED is steady on.

  10. If the problem with the server is not evident, view the BIOS event logs during system startup.