Removing an Underperforming Flash Disk
A bad flash disk can degrade the performance of other good flash disks. You should remove a bad flash disk. See "Identifying Underperforming Flash Disks".
To remove an underperforming flash drive:
-
If the flash disk is used for flash cache:
-
Ensure that data not synchronized with the disk (dirty data) is flushed from flash cache to the grid disks:
CellCLI> ALTER FLASHCACHE ... FLUSH
-
Disable the flash cache and create a new one. Do not include the bad flash disk when creating the flash cache.
CellCLI > DROP FLASHCACHE CellCLI > CREATE FLASHCACHE CELLDISK='fd1,fd2,fd3,fd4, ...'
-
-
If the flash disk is used for grid disks, then direct Oracle ASM to stop using the bad disk immediately:
SQL> ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name FORCE
Offline partners might cause the
DROP
command with theFORCE
option to fail. If the previous command fails, do one of the following:-
Restore Oracle ASM data redundancy by correcting the other server or disk failures. Then retry the
DROP...FORCE
command. -
Direct Oracle ASM to rebalance the data off the bad disk:
SQL> ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name NOFORCE
-
-
Wait until the Oracle ASM disks associated with the bad flash disk are dropped successfully. The storage server software automatically sends an alert when it is safe to replace the flash disk.
-
Stop the services:
CellCLI> ALTER CELL SHUTDOWN SERVICES ALL
The preceding command checks if any disks are offline, in predictive failure status, or must be copied to its mirror. If Oracle ASM redundancy is intact, then the command takes the grid disks offline in Oracle ASM, and stops the services.
The following error indicates that stopping the services might cause redundancy problems and force a disk group to dismount:
Stopping the RS, CELLSRV, and MS services... The SHUTDOWN of ALL services was not successful. CELL-01548: Unable to shut down CELLSRV because disk group DATA, RECO may be forced to dismount due to reduced redundancy. Getting the state of CELLSRV services... running Getting the state of MS services... running Getting the state of RS services... running
If this error occurs, then restore Oracle ASM disk group redundancy. Retry the command when the status is normal for all disks.
-
Shut down the server. See "Shutting Down a Storage Server".
-
Remove the bad flash disk, and replace it with a new flash disk.
-
Power up the server. The services are started automatically. As part of the server startup, all grid disks are automatically online in Oracle ASM.
-
Add the new flash disk to flash cache:
CellCLI> DROP FLASHCACHE CellCLI> CREATE FLASHCACHE ALL
-
Verify that all grid disks are online:
CellCLI> LIST GRIDDISK ATTRIBUTES asmmodestatus
Wait until
asmmodestatus
showsONLINE
orUNUSED
for all grid disks.
The flash disks are added as follows:
-
If the flash disk is used for grid disks, then the grid disks are re-created on the new flash disk.
-
If these grid disks were part of an Oracle ASM disk group and
DROP...FORCE
was used in Step 2, then they are added back to the disk group and the data is rebalanced on based on disk group redundancy and theASM_POWER_LIMIT
parameter. -
If
DROP...NOFORCE
was used in Step 2, then you must manually add the grid disks back to the Oracle ASM disk group.