Replacing a Faulty Flash Disk

Caution:

The PCIe cards are not hot pluggable; you must power down a storage server before replacing the flash disks or cards.

Before you perform the following procedure, review the "When Is It Safe to Replace a Faulty Flash Disk?" topic.

To replace a faulty flash disk:

  1. Use the following command to check the cachedBy attribute of all grid disks.
    CellCLI> LIST GRIDDISK ATTRIBUTES name, cachedBy
    

    The cell disk on the flash disk should not appear in any grid disk cachedBy attribute. If the flash disk is used for both grid disks and flash cache, then wait until receiving the alert, and the cell disk is not shown in any grid disk cachedBy attribute.

  2. Stop all services:
    CellCLI> ALTER CELL SHUTDOWN SERVICES ALL
    

    The preceding command checks if any disks are offline, in predictive failure status, or must be copied to a mirror. If Oracle ASM redundancy is intact, then the command takes the grid disks offline in Oracle ASM, and then stops the services.

    The following error indicates that it might be unsafe to stop the services, because stopping them might force a disk group to dismount:

    Stopping the RS, CELLSRV, and MS services...
    The SHUTDOWN of ALL services was not successful.
    CELL-01548: Unable to shut down CELLSRV because disk group DATA, RECO may be
    forced to dismount due to reduced redundancy.
    Getting the state of CELLSRV services... running
    Getting the state of MS services... running
    Getting the state of RS services... running
    

    If this error occurs, then restore Oracle ASM disk group redundancy, and retry the command when the disk status is normal for all disks.

  3. Shut down the server.
  4. Replace the failed flash disk. Use the PCI number and FDOM number to locate the failed disk. A white cell LED is lit to help you locate the affected server.
  5. Power up the server. The services start automatically. As part of the server startup, all grid disks are automatically online in Oracle ASM.
  6. Verify that all grid disks are online:
    CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
    

    Wait until asmmodestatus shows ONLINE or UNUSED for all grid disks.

The system automatically uses the new flash disk, as follows:

  • If the flash disk is used for flash cache, then the effective cache size increases.

  • If the flash disk is used for grid disks, then the grid disks are re-created on the new flash disk.

  • If the grid disks were part of an Oracle ASM disk group, then they are added back to the disk group. The data is rebalanced on them, based on the disk group redundancy and the ASM_POWER_LIMIT parameter.