3.4.1 Replacing a Flash Disk Due to Flash Disk Failure
Each Oracle Exadata Storage Server is equipped with flash devices.
Starting with Oracle Exadata Database Machine X7, the
flash devices are hot-pluggable on the Oracle Exadata Storage Servers. When performing a hot-pluggable replacement of a flash device on
Oracle Exadata Storage Servers for X7 or later, the
disk status should be Dropped for replacement
, and the power LED on
the flash card should be off, which indicates the flash disk is ready for online
replacement.
Caution:
Removing a card with power LED on could result in a system crash. If a failed disk has a status ofFailed – dropped for replacement
but the power LED is still on, contact Oracle Support Services.
For Oracle Exadata Database Machine X6 and earlier, the flash devices are hot-pluggable on Extreme Flash (EF) storage servers, but not on High Capacity (HC) storage servers. On HC storage servers, you need to power down the storage servers before replacing them.
To identify a failed flash disk, use the following command:
CellCLI> LIST PHYSICALDISK WHERE disktype=flashdisk AND status=failed DETAIL
The following is an example of the output from an Extreme Flash storage server:
name: NVME_10
deviceName: /dev/nvme7n1
diskType: FlashDisk
luns: 0_10
makeModel: "Oracle NVMe SSD"
physicalFirmware: 8DV1RA13
physicalInsertTime: 2016-09-28T11:29:13-07:00
physicalSerial: CVMD426500E21P6LGN
physicalSize: 1.4554837569594383T
slotNumber: 10
status: failed
The following is an example of the output from an Oracle Flash Accelerator F160 PCIe Card:
CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS=failed DETAIL
name: FLASH_5_1
deviceName: /dev/nvme1n1
diskType: FlashDisk
luns: 5_1
makeModel: "Oracle Flash Accelerator F160 PCIe Card"
physicalFirmware: 8DV1RA13
physicalInsertTime: 2016-11-30T21:24:45-08:00
physicalSerial: 1030M03UYM
physicalSize: 1.4554837569594383T
slotNumber: "PCI Slot: 5; FDOM: 1"
status: failed
The following is an example of the output from a Sun Flash Accelerator F40 PCIe card:
name: FLASH_5_3
diskType: FlashDisk
luns: 5_3
makeModel: "Sun Flash Accelerator F40 PCIe Card"
physicalFirmware: TI35
physicalInsertTime: 2012-07-13T15:40:59-07:00
physicalSerial: 5L002X4P
physicalSize: 93.13225793838501G
slotNumber: "PCI Slot: 5; FDOM: 3"
status: failed
For the PCIe cards, the name
and slotNumber
attributes show the PCI slot and the FDOM number. For Extreme Flash storage servers,
the slotNumber
attribute shows the NVMe slot on the front panel.
On Oracle Exadata Database Machine X7 and later systems, all flash disks are in the form of an Add-in-Card (AIC), which is inserted into a PCIe slot on the motherboard. The slotNumber
attribute shows the PCI number and FDOM number, regardless of whether it is an EF or HC storage server.
If an flash disk is detected to have failed, then an alert is generated indicating that the flash disk, as well as the LUN on it, has failed. The alert message includes either the PCI slot number and FDOM number or the NVMe slot number. These numbers uniquely identify the field replaceable unit (FRU). If you have configured the system for alert notification, then an alert is sent by e-mail message to the designated address.
A flash disk outage can cause reduction in performance and data redundancy. The failed disk should be replaced with a new flash disk at the earliest opportunity. If the flash disk is used for flash cache, then the effective cache size for the storage server is reduced. If the flash disk is used for flash log, then flash log is disabled on the disk thus reducing the effective flash log size. If the flash disk is used for grid disks, then the Oracle Automatic Storage Management (Oracle ASM) disks associated with these grid disks are automatically dropped with the FORCE
option from the Oracle ASM disk group, and a rebalance operation starts to restore the data redundancy.
The following procedure describes how to replace an FDOM due to disk failure on High Capacity storage servers that do not support online flash replacement. Replacing an NVMe drive on Extreme Flash storage servers is the same as replacing a physical disk: you can just remove the NVMe drive from the front panel and insert a new one. You do not need to shut down the storage server.
The new flash disk is automatically used by the system. If the flash disk is used for flash cache, then the effective cache size increases. If the flash disk is used for grid disks, then the grid disks are re-created on the new flash disk. If those grid disks were part of an Oracle ASM disk group, then they are added back to the disk group, and the data is rebalanced on them based on the disk group redundancy and ASM_POWER_LIMIT
parameter.
See Also:
- Performing a Hot Pluggable Replacement of a Flash Disk
- V$ASM_OPERATION in Oracle Database Reference
- ASM_POWER_LIMIT in Oracle Automatic Storage Management Administrator's Guide
- The appropriate PCIe Card User's Guide for your system, which is listed in "Related Documentation"