2.6.1 Replacing a Failed Storage Device
Failure of a storage device can affect performance and data redundancy. Consequently, a failed storage device should be replaced as soon as possible.
A storage device is considered to have failed in the following circumstances:
-
A hardware or firmware fault causes the device to stop functioning.
-
The device enters a predictive failure state.
In this case, the device is still usable, but there is an indication that the device may soon stop functioning. For example, a hard disk drive (HDD) could be short of spare sectors, or a flash device could be approaching wear limits.
-
Exadata software confines the device, and the device fails the post-confinement checks.
Exadata automatically confines a storage device after detecting a significant performance problem or functional anomaly. After confinement, Exadata attempts to resolve the issue and recheck the device. However, if the post-confinement checks fail, the device is considered to have failed.
If a storage device fails, Exadata automatically drops all of the grid disks contained on the storage device. If any grid disk is used as an Exascale pool disk, Exascale automatically removes it and rebalances the storage pool to restore data redundancy.
Exadata also generates an alert when a storage device fails. The alert message includes specific instructions for replacing the device. If alert notifications are configured on the storage server, then the alert notification is automatically sent using email and SNMP.
After the failed storage device is replaced, Exadata automatically creates the cell and grid disks on the new device. If any grid disk is configured as an Exascale pool disk, it is automatically added to the storage pool, and the storage pool is rebalanced.
The following steps outline the procedure for replacing a failed storage device: