A variety of features play a role in how the memory subsystem is configured and how memory faults are handled. Understanding the underlying features helps you identify and repair memory problems. This topic describes how the server module deals with memory faults.
The server module uses advanced ECC technology that corrects up to 4 bits in error on nibble boundaries, as long as the bits are all in the same DRAM. On some DIMMs, if a DRAM fails, the DIMM continues to function.
The following server module features independently manage memory faults:
POST – Based on Oracle ILOM configuration variables, POST runs when the server module is powered on.
For correctable memory errors (sometimes called CEs), POST forwards the error to the Oracle Solaris PSH daemon for error handling.
If an uncorrectable memory fault is detected, POST displays the fault with the device name of the faulty DIMMs and logs the fault. POST then disables the faulty DIMMs. Depending on the memory configuration and the location of the faulty DIMM, POST disables half of physical memory in the system, or half the physical memory and half the processor threads. When the offlining process occurs in normal operation, you must replace the faulty DIMMs based on the fault message and then enable the disabled DIMMs. See Clear the Fault and Verify the Functionality of the Replacement DIMM.
Oracle Solaris PSH technology – A feature of the Solaris OS, PSH uses the fault manager daemon (fmd) to watch for various kinds of faults. When a fault occurs, the fault is assigned a UUID and logged. PSH reports the fault and suggests a replacement for the DIMMs associated with the fault.
If you suspect that the server module has a memory problem, follow the Diagnostics Process. The flowchart helps you determine if the memory problem was detected by POST or by the PSH technology.
Once you identify which DIMMs you want to replace, see Locate a Faulty DIMM (LEDs). After replacing a faulty DIMM, You must perform the instructions in Clear the Fault and Verify the Functionality of the Replacement DIMM.