Go to main content
Sun Server X4-4 Service Manual

Exit Print View

Updated: October 2015
 
 

Troubleshooting a Multi-DIMM Failure State

A multi-DIMM failure state is when a single DIMM failure causes other DIMMs in the same channel (or a second channel) on a memory riser card to become disabled or appear as if they have failed.

When a DIMM failure occurs, check the Oracle ILOM system event log (SEL) to:

  • Identify the first DIMM that failed.

  • Note any additional DIMM failures occurring closely after the initial DIMM failure.

  • Identify the memory riser card that contains the failed DIMM.

  • Note the channel(s) in which any additional DIMM failures have occurred.

If another DIMM (or DIMMs) has failed after the initial occurrence of a single DIMM failure, and, if the DIMM failure is on the same memory riser card, then the server might be in multi-DIMM failure state.

For example, you might see the following in the system error log:

135 Sun May 21 00:53:57 2000 DIMM Service Required Memory P0/MR0/D9 (CPU
Memory Riser 0 DIMM 9)
  A failure has occurred during Memory Reference Code (MRC) DIMM module
training. (Probability:100, UUID:2a182715-983f-c4fb-e94f-b5a5b50d3650, Part
Number:001-0003-01,HMT42GR7AFR4A-PB, Serial Number:00AD011321345849FF,
Reference Document:http://support.oracle.com/msg/SPX86A-8004-67)
 
126 Sun May 21 00:53:56 2000 DIMM Service Required Memory P0/MR0/D6 (CPU
Memory Riser 0 DIMM 6)
  A failure has occurred during Memory Reference Code (MRC) DIMM module
training. (Probability:33, UUID:9014a82c-7bf9-ee96-b61b-9c7ccedc9aed, Part
Number:001-0003-01,HMT42GR7AFR4A-PB, Serial Number:00AD01132129B11E9E,
Reference Document:http://support.oracle.com/msg/SPX86A-8004-67)

In this scenario, the DIMM in D6 shows as failed at 00:53:56, and that failure is subsequently followed by a reported failure of the DIMM in D9, which failed at 00:53:57. Both DIMMs are on the same memory riser card (P0/MR0). Each DIMM is on a separate channel, but both are linked to the same memory buffer ASIC. Additionally, all of the DIMMs in both channels might have been disabled by the system. This scenario could be an instance of a multi-DIMM failure state.

How to Troubleshoot a Multi-DIMM Failure State

To troubleshoot this issue, replace only the DIMM that logged the initial failure and return the server to operation to see if the multi-DIMM failure state persists. If a multi-DIMM failure state had occurred, replacing only the DIMM that failed initially would rectify the fault state of the initial DIMMs and the subsequent DIMMs. If the failures persists, the issue could be with the DIMMs or with the memory riser card.