DIMM Fault SPX86A-800A-95 - Memtest Single Symbol Test Failed - ILOM 5.1.0.21

Bug ID: 34325538, 34445460

Issue: The following DIMM Fault message is seen: SPX86A-800A-95 - Memtest Single Symbol Test Failed (Doc ID 2317012.1) SPX86A-800A-95 indicates that the ILOM fault manager has received an error report indicating a memory DIMM produced correctable errors (CE) during both passes of the memory test.

If the server encounters multiple runtime memory fault related events, increased runtime error messages may be related to DIMM memory testing conditions. Oracle ILOM Adaptive Double DRAM Device Correction (ADDDC) and Post Package Repair (PPR) features are enabled in the server firmware. ADDDC Sparing is a RAS feature to test memory reliability. The Advanced Memory Test (AMT) in the Memory Reference Code (MRC) can fail a DIMM with a single symbol error and then PPR would try to repair the defect.

When enabled, PPR may be able to repair affected DRAM areas on a DIMM. PPR runs when ADDDC was previously activated before reboot or MRC initialization failed memory tests. Upon encountering any memory related fault event during MRC initialization or experiencing certain memory correctable events during runtime that triggers ADDDC on first occurrence, then PPR would be activated after the next system initialization/reboot and attempt to repair the DIMM.

Note:

Certain DIMM manufacturers may exhibit different memory failure patterns, and may not support soft PPR configuration (which enables temporarily attempting a repair action).

Affected Hardware: Oracle Communications Server E6-2L, Oracle Server X8-8, Oracle Server X8-2, Oracle Server X8-2L, Oracle Server X7-8, Oracle Server X7-2, Oracle Server X7-2L

Note:

Not all server systems enable ADDDC.

Affected Software:

The following x86 server software Oracle ILOM releases or later, support PPR (Post-Package Repair).

  • Oracle Server X9-2 SW1.1.0 ILOM 5.0.2.21 (Does not enable ADDDC.)
  • Oracle Server X9-2L SW1.1.0 ILOM 5.0.2.21 (Does not enable ADDDC.)
  • Oracle Server X8-8 SW3.2.2.1 ILOM 5.0.2.22 (Does not enable ADDDC.)
  • Oracle Server X8-2 SW3.2.2 ILOM 5.0.2.24
  • Oracle Server X8-2L SW3.2.2 ILOM 5.0.2.24
  • Oracle Server X7-8 SW3.2.2.1 ILOM 5.0.2.22 (Does not enable ADDDC.)
  • Oracle Server X7-2 SW3.2.2 ILOM 5.0.2.24
  • Oracle Server X7-2L SW3.2.2 ILOM 5.0.2.24

Workaround: Some DIMM faults are recoverable errors if PPR is enabled on the server. If multiple DIMM memory errors are detected on the server:

  1. Log in to the Oracle ILOM command-line interface (CLI) using an account with admin (a) role privileges.
  2. From the Oracle ILOM CLI, launch the Oracle ILOM Fault Management Shell.
    -> start /SP/faultmgmt/shell 
    Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
                
  3. Display information about server components using Oracle ILOM FMA CLI command.
    faultmgmtsp> fmadm faulty
                
  4. Manually clear server faults using Oracle ILOM FMA CLI command.
    faultmgmtsp> fmadm repair <FRU>
                
  5. Exit the Oracle ILOM Fault Management Shell and return to the the Oracle ILOM CLI command prompt.
    faultmgmtsp> exit
                
  6. Upgrade the server to the latest ILOM/UEFI firmware release that supports PPR. The system resets during the firmware upgrade and runs memory tests again.
  7. If memory related events faults continue to be logged, replace the faulted DIMMs in the server. Log an Oracle Support case through the support portal for further assistance.

For updated information about Oracle ILOM, refer to Oracle Integrated Lights Out Manager (ILOM) documentation at Servers Documentation - Systems Management.

Oracle Service personnel can find more information about the diagnosis and triage of DIMM Fault failures on x86 servers at My Oracle Support. Refer to the Knowledge Article Doc ID 2698328.1. If there are multiple, simultaneous DIMM Fault message problems on a server, Oracle Service personnel can refer to Knowledge Articles Doc IDs 1603015.1 (KA single symbol error) and 2317012.1 (KA multiple symbol errors).

Note:

Adaptive Double DRAM Device Correction (ADDDC) is also referred to as Adaptive Device Correction (ADC) in some Oracle documents.