About the Flash Disks

Recovery Appliance mirrors data across storage servers, and sends write operations to at least two storage servers. If a flash card in one storage server has problems, then Recovery Appliance services the read and write operations using the mirrored data in another storage server. Service is not interrupted.

If a flash card fails, then the storage server software identifies the data in the flash cache by reading the data from the surviving mirror. It then writes the data to the server with the failed flash card. When the failure occurs, the software saves the location of the data lost in the failed flash cache. Resilvering then replaces the lost data with the mirrored copy. During resilvering, the grid disk status is ACTIVE -- RESILVERING WORKING.

Each storage server has four PCIe cards. Each card has four flash disks (FDOMs) for a total of 16 flash disks. The four PCIe cards are located in PCI slot numbers 1, 2, 4, and 5.

To identify a failed flash disk, use the following command:

CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS=failed DETAIL

         name:                   FLASH_5_3
         diskType:               FlashDisk
         luns:                   5_3
         makeModel:              "Sun Flash Accelerator F40 PCIe Card"
         physicalFirmware:       TI35
         physicalInsertTime:     2012-07-13T15:40:59-07:00
         physicalSerial:         5L002X4P
         physicalSize:           93.13225793838501G
         slotNumber:             "PCI Slot: 5; FDOM: 3"
         status:                 failed

The card name and slotNumber attributes show the PCI slot and the FDOM number.

When the server software detects a failure, it generates an alert that indicates that the flash disk, and the LUN on it, failed. The alert message includes the PCI slot number of the flash card and the exact FDOM number. These numbers uniquely identify the field replaceable unit (FRU). If you configured the system for alert notification, then the alert is sent to the designated address in an email message.

A flash disk outage can reduce performance and data redundancy. Replace the failed disk at the earliest opportunity. If the flash disk is used for flash cache, then the effective cache size for the server is reduced. If the flash disk is used for flash log, then the flash log is disabled on the disk, thus reducing the effective flash log size. If the flash disk is used for grid disks, then the Oracle ASM disks associated with them are automatically dropped with the FORCE option from the Oracle ASM disk group, and an Oracle ASM rebalance starts to restore the data redundancy.

See Also: