A variety of features plays a role in how the memory subsystem is configured and how memory faults are handled. Understanding the underlying features helps you identify and repair memory problems. This section describes how the server deals with memory faults.
Note - For memory configuration information, see FB-DIMM Configuration.
The server uses advanced ECC technology that corrects up to 4‐bits in error on nibble boundaries, as long as the bits are all in the same DRAM. On 4 GB FB-‐DIMMs, if a DRAM fails, the DIMM continues to function.
The following server features independently manage memory faults:
POST – Based on Oracle ILOM configuration variables, POST runs when the server is powered on.
For correctable memory errors (CEs), POST forwards the error to the Predictive Self-Healing (PSH) daemon for error handling. If an uncorrectable memory fault is detected, POST displays the fault with the device name of the faulty FB-DIMMs, and logs the fault. POST then disables the faulty FB-DIMMs. Depending on the memory configuration and the location of the faulty FB-DIMM, POST disables half of physical memory in the system, or half the physical memory and half the processor threads. When this offlining process occurs in normal operation, you must replace the faulty FB-DIMMs based on the fault message and enable the disabled FB-DIMMs with the Oracle ILOM command set device component_state=enabled where device is the name of the FB-DIMM being enabled (for example, set /SYS/MB/CPU0/CMP0/BR0/CH0/D0 component_state=enabled).
Predictive Self-Healing (PSH) technology – A feature of the Oracle Solaris OS, PSH uses the Fault Manager daemon (fmd) to watch for various kinds of faults. When a fault occurs, the fault is assigned a unique fault ID (UUID), and logged. PSH reports the fault and identifies the locations of the faulty FB-DIMMs.
If you suspect that the server has a memory problem, follow the flowchart (see Diagnostic Flowchart). Run the Oracle ILOM show faulty command. The show faulty command lists memory faults and lists the specific FB-DIMMs that are associated with the fault.
Note - You can use the FB-DIMM DIAG buttons on the CMP module and memory module to identify faulty FB-DIMMs. See FB-DIMM Fault Button Locations.
Once you identify which FB-DIMMs you want to replace, see Servicing FB-DIMMs for FB-DIMM removal and replacement instructions. You must perform the instructions in that section to clear the faults and enable the replaced FB-DIMMs.