3.5.1 Replacing a PMEM Device Due to Device Failure

If the PMEM device has a status of Failed, you should replace the PMEM device on the Oracle Exadata Storage Server.

A PMEM fault could cause server to reboot. The failed device should be replaced with a new PMEM device at the earliest opportunity. Until the PMEM device is replaced, the corresponding cache size is reduced. If the PMEM device is used for commit acceleration (XRMEMLOG or PMEMLOG), then the size of the corresponding commit accelerator is also reduced.

An alert is generated when a PMEM device failure is detected. The alert message includes the slot number and cell disk name. If you have configured the system for alert notification, then an alert is sent by e-mail message to the designated address.

To identify a failed PMEM device, you can also use the following command:

CellCLI> LIST PHYSICALDISK WHERE disktype=PMEM AND status=failed DETAIL

    name:                          PMEM_0_1
    diskType:                      PMEM
    luns:                          P0_D1
    makeModel:                     "Intel NMA1XBD128GQS"
    physicalFirmware:              1.02.00.5365
    physicalInsertTime:            2019-09-28T11:29:13-07:00
    physicalSerial:                8089-A2-1838-00001234
    physicalSize:                  126.375G
    slotNumber:                    "CPU: 0; DIMM: 1"
    status:                        failed

In the above output, the slotNumber shows the socket number and DIMM slot number.

  1. Locate the storage server that contains the failed PMEM device.
    A white Locator LED is lit to help locate the affected storage server. When you have located the server, you can use the Fault Remind button to locate the failed DIMM.

    Caution:

    Do not attempt to remove a faulty DCPMM DIMM when the Do Not Service LED indicator is illuminated.
  2. Power down the storage server with the failed PMEM device and unplug the power cable for the server.
  3. Replace the failed PMEM device.
  4. Restart the storage server.

    Note:

    During the restart, the storage server will shut down a second time to complete the initialization of the new PMEM device.

The new PMEM device is automatically used by the system. If the PMEM device is used for caching, then the effective cache size increases. If the PMEM device is used for commit acceleration, then commit acceleration is enabled on the device.