2.6.2 Proactively Replacing a Storage Device

You may need to proactively replace a storage device that has not failed.

For example, you may choose to replace a device after Exadata detects a temporary performance anomaly (temporary confinement), or you may need to replace a functioning device for any other reason.

The following steps outline the procedure to proactively replace a storage device:

  1. Prepare the storage device for replacement.

    Use the CELLCLI ALTER PHYSICALDISK command to prepare the system for replacement of the specified storage device. You can choose whether or not to maintain redundancy throughout the device replacement process.

    • If you maintain redundancy throughout the device replacement process, Exascale moves data off any affected pool disk and rebalances the associated storage pool. This option is generally recommended and provides the best protection from any other failure that could occur while replacing the device. However, it comes with the cost of moving the affected data and rebalancing the storage pool.

      To choose this option, add the MAINTAIN REDUNDANCY clause to the ALTER PHYSICALDISK ... DROP FOR REPLACEMENT command. For example, you can use the following command to prepare storage device 0:5 for replacement while maintaining redundancy:

      CellCLI> alter physicaldisk 0:5 drop for replacement maintain redundancy
    • If you choose not to maintain redundancy throughout the device replacement process, Exascale only performs a quick redundancy check before the device is taken offline. The redundancy check ensures that another copy of data is available for any affected pool disk. This option allows for quicker device replacement using reduced redundancy. However, it comes with an increased risk of data loss or a storage pool outage if another failure occurs while replacing the device. Consequently, this option is only recommended where data loss can be tolerated (for example, a non-critical test system) or where the increased risk associated with reduced redundancy is mitigated by some other means (for example, the data is replicated using Oracle Data Guard).

      To choose this option, use the ALTER PHYSICALDISK ... DROP FOR REPLACEMENT command without the MAINTAIN REDUNDANCY clause. For example, you can use the following command to prepare storage device 0:5 for replacement without maintaining redundancy:

      CellCLI> alter physicaldisk 0:5 drop for replacement
  2. Ensure that the storage server Do Not Service LED is not lit.
  3. Ensure that the device is ready for removal.
    • If the failed device is a hard disk drive (HDD) or flash drive located in one of the hot-swappable drive bays in the front of the server, ensure that the blue OK to Remove LED on the device is lit before removing the device.

    • If the failed device is a hot-swappable flash card contained inside the server, ensure that the power LED on the flash card is not lit before removing the device. Starting with Exadata Storage Server X7-2, all storage server models contain hot-swappable flash cards.

  4. Remove the failed storage device and install the replacement.

    See the associated server hardware guide for additional details about physical hardware replacement.

  5. Wait for the server to recognize the replaced device.

    When you physically replace a hot-swappable storage device, it may take a few minutes for the server to recognize the new device.

  6. Confirm the status of the replacement device.

    Use the CELLCLI LIST PHYSICALDISK command to confirm that the status of the replacement device is normal.

    For example:

    CellCLI> list physicaldisk 0:5 detail
             name:                   0:5
             deviceName:             /dev/sdi
             diskType:               HardDisk
             enclosureDeviceId:      0
             luns:                   0_5
             makeModel:              "WDC W7222A520ORA022T"
             physicalFirmware:       A7B0
             physicalInsertTime:     2023-09-01T12:00:25-07:00
             physicalInterface:      sas
             physicalSerial:         75X8RD
             physicalSize:           20.009765625T
             slotNumber:             5
             status:                 normal
    
  7. Monitor the storage pool rebalance operation.

    After the storage device is replaced, the Exadata cell disks and grid disks that existed on the original device are re-created on the new device. If any grid disk is designated as an Exascale pool disk, it is added to the storage pool and the storage pool is automatically rebalanced.

    Use the ESCLI lsstoragepooloperation command to monitor the storage pool rebalance operation.

  8. Confirm that Exascale is using the replacement device.

    Use the ESCLI lspooldisk command and examine the status attribute.

    Initially, as the replacement device comes online, the pool disk status is briefly set to BEING ADDED. However, the status value transitions to ONLINE as Exascale reintegrates the replacement device.