A variety of features play a role in how the memory subsystem is configured and how memory faults are handled. Understanding the underlying features helps you identify and repair memory problems.
The following server features manage memory faults:
POST—By default, POST runs when the server is powered on.
For correctable memory errors (CEs), POST forwards the error to the Oracle Solaris PSH daemon for error handling. If an uncorrectable memory fault is detected, POST displays the fault with the device name of the faulty DIMMs, and logs the fault. POST then disables the faulty DIMMs. Depending on the memory configuration and the location of the faulty DIMM, POST disables half of physical memory in the system, or half the physical memory and half the processor threads. When this offlining process occurs in normal operation, you must replace the faulty DIMMs based on the fault message and enable the disabled DIMMs with an Oracle ILOM command:
-> set device component_state=enabled
where device is the name of the DIMM being enabled. For example:
-> set /SYS/MB/CMP1/MR0/BOB0/CH0/D0 component_state=enabled
Oracle Solaris PSH technology—PSH uses the Fault Manager daemon (fmd) to watch for various kinds of faults. When a fault occurs, the fault is assigned a unique fault ID (UUID) and logged. PSH reports the fault and suggests a replacement for the DIMMs associated with the fault.
If you suspect that the server has a memory problem, run the Oracle ILOM show faulty command. This command lists memory faults and identifies the DIMM modules associated with the faults.