4 Oracle ILOM Issues

This section describes important operating notes and known Oracle ILOM issues for Oracle Server X8-8.

For updated information about Oracle ILOM, refer to Oracle Integrated Lights Out Manager (ILOM) documentation at Servers Documentation - Systems Management.

DIMM Fault SPX86A-800A-95 - Memtest Single Symbol Test Failed - ILOM 5.1.0.21

Bug ID: 34325538, 34445460

Issue:

Issue: The following DIMM Fault message is seen: SPX86A-800A-95 - Memtest Single Symbol Test Failed (Doc ID 2317012.1) SPX86A-800A-95 indicates that the ILOM fault manager has received an error report indicating a memory DIMM produced correctable errors (CE) during both passes of the memory test.

If the server encounters multiple runtime memory fault related events, increased runtime error messages may be related to DIMM memory testing conditions. Oracle ILOM Adaptive Double DRAM Device Correction (ADDDC) and Post Package Repair (PPR) features are enabled in the server firmware. ADDDC Sparing is a RAS feature to test memory reliability. The Advanced Memory Test (AMT) in the Memory Reference Code (MRC) can fail a DIMM with a single symbol error and then PPR would try to repair the defect.

When enabled, PPR may be able to repair affected DRAM areas on a DIMM. PPR runs when ADDDC was previously activated before reboot or MRC initialization failed memory tests. Upon encountering any memory related fault event during MRC initialization or experiencing certain memory correctable events during runtime that triggers ADDDC on first occurrence, then PPR would be activated after the next system initialization/reboot and attempt to repair the DIMM.

Note:

Certain DIMM manufacturers may exhibit different memory failure patterns, and may not support soft PPR configuration (which enables temporarily attempting a repair action).

Affected Hardware: Oracle Server X9-2, Oracle Server X9-2L, Oracle Server X8-8, Oracle Server X8-2, Oracle Server X8-2L, Oracle Server X7-8, Oracle Server X7-2, Oracle Server X7-2L

Note:

Not all server systems enable ADDDC.

Affected Software:

The following x86 server software Oracle ILOM releases or later, support PPR (Post-Package Repair).

  • Oracle Server X9-2 SW1.1.0 ILOM 5.0.2.21 (Does not enable ADDDC.)
  • Oracle Server X9-2L SW1.1.0 ILOM 5.0.2.21 (Does not enable ADDDC.)
  • Oracle Server X8-8 SW3.2.2.1 ILOM 5.0.2.22 (Does not enable ADDDC.)
  • Oracle Server X8-2 SW3.2.2 ILOM 5.0.2.24
  • Oracle Server X8-2L SW3.2.2 ILOM 5.0.2.24
  • Oracle Server X7-8 SW3.2.2.1 ILOM 5.0.2.22 (Does not enable ADDDC.)
  • Oracle Server X7-2 SW3.2.2 ILOM 5.0.2.24
  • Oracle Server X7-2L SW3.2.2 ILOM 5.0.2.24

Workaround: Some DIMM faults are recoverable errors if PPR is enabled on the server. If multiple DIMM memory errors are detected on the server:

  1. Log in to the Oracle ILOM command-line interface (CLI) using an account with admin (a) role privileges.
  2. From the Oracle ILOM CLI, launch the Oracle ILOM Fault Management Shell.
    -> start /SP/faultmgmt/shell 
    Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
  3. Display information about server components using Oracle ILOM FMA CLI command.
    faultmgmtsp> fmadm faulty
  4. Manually clear server faults using Oracle ILOM FMA CLI command.
    faultmgmtsp> fmadm repair <FRU>
  5. Exit the Oracle ILOM Fault Management Shell and return to the the Oracle ILOM CLI command prompt.
    faultmgmtsp> exit
  6. Upgrade the server to the latest ILOM/UEFI firmware release that supports PPR. The system resets during the firmware upgrade and runs memory tests again.
  7. If memory related events faults continue to be logged, replace the faulted DIMMs in the server. Log an Oracle Support case through the support portal for further assistance.

For updated information about Oracle ILOM, refer to Oracle Integrated Lights Out Manager (ILOM) documentation at Servers Documentation - Systems Management.

Oracle Service personnel can find more information about the diagnosis and triage of DIMM Fault failures on x86 servers at My Oracle Support. Refer to the Knowledge Article Doc ID 2698328.1. If there are multiple, simultaneous DIMM Fault message problems on a server, Oracle Service personnel can refer to Knowledge Articles Doc IDs 1603015.1 (KA single symbol error) and 2317012.1 (KA multiple symbol errors).

Note:

Adaptive Double DRAM Device Correction (ADDDC) is also referred to as Adaptive Device Correction (ADC) in some Oracle documents.

System Might Not Power On with POST After Oracle ILOM Software Upgrade or Downgrade

Important Operating Note

Issue: On rare occasions, after Oracle Server X8-8 Oracle ILOM software upgrade or downgrade, Oracle Server X8-8 might not power on or the Service Processor POST process stops. You can identify the issue if powering on the server system using a chassis power button fails, or Oracle ILOM power-on user interfaces fail, or the HOST_AUTO_POWER_ON policy fails.

Note:

Update your system to the latest Software Release before you use the system. Downgrade software only when required for testing.

Affected Software: Software Release 1.0.0

Affected Hardware: Oracle Server X8-8

Workaround: There are multiple ways to avoid or workaround power cycle issues. Choose from the following methods.

  • To avoid power cycle issues when upgrading or downgrading firmware, do not preserve the BIOS configuration.

  • You must update your server firmware and software as soon as possible after a new Software Release becomes available. Software releases often include bug fixes, and updating your server ensures that your server has the latest firmware and software. These updates will increase your system performance, security, and stability.

  • In general, to minimize the chances of encountering power cycle issues, use the latest software update.

    Note:

    Update your system to the latest Software Release before you use the system. Downgrade software only when required for testing.
  • Upgrade software updates in stepwise increments, between major versions. For example: From Software Release 1.0 to Software Release 1.1, From Software Release 1.1 to Software Release 1.2, From Software Release 1.2 to Software Release 1.3.

  • On rare occasions, after Oracle Server X8-8 does not power on, first check, then view and clear faults using Oracle ILOM FMA (Fault Management Architecture) interfaces. For information about Oracle ILOM FMA, refer to Oracle Integrated Lights Out Manager (ILOM) 4.0 Information Library at https://www.oracle.com/goto/ilom/docs .

  • When a power-off condition occurs, press the System A or System B server power button on the host system, or use Oracle ILOM CLI commands to power on the Oracle Server X8-8 host system from a local or remote location. Refer to Power On the Server in Oracle Server X8-8 Service Manual.

Recovery: If a system encounters this issue during a firmware upgrade or downgrade, do the following to recover:

From Oracle ILOM CLI:

  1. Power off the host.

    -> stop -script -force /System

  2. Wait 30 seconds.

  3. Power on the host.

    -> start -script /System

From Oracle ILOM web interface Summary screen:

  1. Power off the host.

  2. Wait 30 seconds.

  3. Power on the host.

  4. The Administrator can safely clear the fault after recovery.

    For details, refer to "Managing Server Hardware Faults Using the Oracle ILOM Fault Management Shell" in the Oracle Server X8-8 Service Manual.

If a system encounters this issue during a firmware upgrade or downgrade, and still does not power on after resolving and clearing hardware faults, or does not complete the Service Processor POST process, perform the following steps to power off the host and clear CMOS from the Diag Shell.

  1. Power off the host.

    Refer to the Oracle Server X8-8 Installation Guide.

  2. Clear system CMOS.

    1. If powered on, force host power off: -> stop -script -force /System

    2. Start the Diag Shell: -> start -script /SP/diag/shell

    3. From the Diag Shell, clear the system CMOS: diag-> hwdiag system clear_cmos

    4. Exit the Diag Shell: diag> exit

    5. Start the host: -> start -script /System

  3. In the rare case that the Oracle Server X8-8 still fails to power on, perform an AC power cycle, by removing and reinstalling the server AC power cables.

    Refer to the Oracle Server X8-8 Service Manual.

  4. If the AC power cycle attempt fails, contact an Oracle Service representative for assistance.

System Might Not Set BIOS to Defaults After Power Cycle

Bug ID: 26785322

Issue: On rare occasions, typically after an AC power cycle, Oracle Server X8-8 might not set BIOS to defaults. You can identify this issue when the reset_to_defaults property does not change from factory to none after 15 minutes.

To verify the BIOS settings, enter the following command:

-> show -d properties

/System/BIOS reset_to_defaults
/System/BIOS
Properties:
reset_to_defaults = factory

Affected Hardware: Oracle Server X8-8

Workaround: If you encounter this issue, log in to Oracle ILOM and reset the service processor (SP) with the following command:

-> reset /SP

After you reset the SP, re-enter the command to verify the BIOS default settings.

SSL Must Be Turned On When Booting a Redirected ISO Image

Important Operating Note

When booting a redirected ISO image for an operating system installation, SSL (Secure Sockets Layer) must be turned on. This is the default setting. If SSL is not turned on, the installation might stall or fail. This affects all supported operating systems.

Reset Takes a Long Time and Causes the Server to Power Cycle

Important Operating Note

If you have a pending BIOS upgrade, a routine reset takes longer than expected, and causes the server to power cycle and reboot several times. This is expected behavior, as it is necessary to power cycle the server to upgrade the BIOS firmware. If the upgrade includes an FPGA update, it can take as long as 30 minutes to complete the upgrade and the service processor (SP) resets during the process. Since the SP resets during the process, you will need to re-establish any active connection to the SP after the update.

A pending BIOS upgrade exists when the following conditions are true:

  • You update the BIOS and service processor firmware using Oracle Integrated Lights Out Manager (ILOM).

  • You select the Oracle ILOM option to Delay BIOS Upgrade.

  • The host is powered on.

If you reboot the server expecting a routine server reset and instead initiate a (delayed) BIOS upgrade, wait until the upgrade finishes. Do not interrupt the process, as this can result in corrupted firmware and server down time.

Caution:

Firmware corruption and system downtime. Interrupting the firmware upgrade process can corrupt the firmware and render the server inoperable. Do not interrupt the upgrade. Allow the process to finish.

Note:

Oracle ILOM and BIOS updates are designed to work together. When you have a pending BIOS upgrade, it is recommended that you install the upgrade by resetting or power cycling your server as soon as possible.

For details, refer to "Update the BIOS and Service Processor Firmware (Oracle ILOM)" in the Oracle X8 Series Servers Administration Guide at https://www.oracle.com/goto/x86admindiag/docs .

Oracle VTS Network Test Floods Serial Console Triggering Soft Lockups

Bug ID: 27240264

Issue: The Network test in Oracle VTS floods the Oracle ILOM SP serial console triggering soft lockup events.

Affected Software: Oracle VTS 8.2.0-patch 1.4

Workaround: There are multiple ways to avoid or workaround this issue. Choose one of the following methods.

Disable the Network test (see example below) in the profile and rerun Oracle VTS to resolve the soft lockups issue.

Set the following Network test profile (see example below) and rerun Oracle VTS to resolve soft lockups:

******************************Test_Groups*****************************
[*]Processor             Options  idle(Pass=0/Error=0)
[*]System_Interconnect   Options  idle(Pass=0/Error=0)
[*]Memory                Options  idle(Pass=0/Error=0)
[ ]Network               Options  idle(Pass=0/Error=0)
[*]Disk                  Options  idle(Pass=0/Error=0)
[*]Removable_Disk        Options  idle(Pass=0/Error=0)

Run Oracle VTS Network test for a shorter time to reduce soft lockup events.

  1. Select Options:

    [*]Network Options idle(Pass=0/Error=0)

  2. Enter 10 into Test Time. Select Apply.

    Test Time:[0-99999] 10

  3. Save the profile.

  4. Restart the test.