Removing an Underperforming Flash Disk

A bad flash disk can degrade the performance of other good flash disks. You should remove a bad flash disk. See "Identifying Underperforming Flash Disks".

To remove an underperforming flash drive:

  1. If the flash disk is used for flash cache:

    1. Ensure that data not synchronized with the disk (dirty data) is flushed from flash cache to the grid disks:

      CellCLI> ALTER FLASHCACHE ... FLUSH
      
    2. Disable the flash cache and create a new one. Do not include the bad flash disk when creating the flash cache.

      CellCLI > DROP FLASHCACHE
      CellCLI > CREATE FLASHCACHE CELLDISK='fd1,fd2,fd3,fd4, ...' 
      
  2. If the flash disk is used for grid disks, then direct Oracle ASM to stop using the bad disk immediately:

    SQL> ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name FORCE 
    

    Offline partners might cause the DROP command with the FORCE option to fail. If the previous command fails, do one of the following:

    • Restore Oracle ASM data redundancy by correcting the other server or disk failures. Then retry the DROP...FORCE command.

    • Direct Oracle ASM to rebalance the data off the bad disk:

      SQL> ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name  NOFORCE
      
  3. Wait until the Oracle ASM disks associated with the bad flash disk are dropped successfully. The storage server software automatically sends an alert when it is safe to replace the flash disk.

  4. Stop the services:

    CellCLI> ALTER CELL SHUTDOWN SERVICES ALL
    

    The preceding command checks if any disks are offline, in predictive failure status, or must be copied to its mirror. If Oracle ASM redundancy is intact, then the command takes the grid disks offline in Oracle ASM, and stops the services.

    The following error indicates that stopping the services might cause redundancy problems and force a disk group to dismount:

    Stopping the RS, CELLSRV, and MS services...
    The SHUTDOWN of ALL services was not successful.
    CELL-01548: Unable to shut down CELLSRV because disk group DATA, RECO may be
    forced to dismount due to reduced redundancy.
    Getting the state of CELLSRV services... running
    Getting the state of MS services... running
    Getting the state of RS services... running
    

    If this error occurs, then restore Oracle ASM disk group redundancy. Retry the command when the status is normal for all disks.

  5. Shut down the server. See "Shutting Down a Storage Server".

  6. Remove the bad flash disk, and replace it with a new flash disk.

  7. Power up the server. The services are started automatically. As part of the server startup, all grid disks are automatically online in Oracle ASM.

  8. Add the new flash disk to flash cache:

    CellCLI> DROP FLASHCACHE
    CellCLI> CREATE FLASHCACHE ALL
    
  9. Verify that all grid disks are online:

    CellCLI> LIST GRIDDISK ATTRIBUTES asmmodestatus
    

    Wait until asmmodestatus shows ONLINE or UNUSED for all grid disks.

The flash disks are added as follows:

  • If the flash disk is used for grid disks, then the grid disks are re-created on the new flash disk.

  • If these grid disks were part of an Oracle ASM disk group and DROP...FORCE was used in Step 2, then they are added back to the disk group and the data is rebalanced on based on disk group redundancy and the ASM_POWER_LIMIT parameter.

  • If DROP...NOFORCE was used in Step 2, then you must manually add the grid disks back to the Oracle ASM disk group.