3.3.3 Replacing a Hard Disk Due to Disk Problems

You may need to replace a hard disk because the disk is in warning - predictive failure status.

The predictive failure status indicates that the hard disk will soon fail, and should be replaced at the earliest opportunity. The Oracle ASM disks associated with the grid disks on the hard drive are automatically dropped, and an Oracle ASM rebalance relocates the data from the predictively failed disk to other disks.

If the drop did not complete before the hard drive dies, then refer to Replacing a Hard Disk Due to Disk Failure.

An alert is sent when the disk is removed. After replacing the hard disk, the grid disks and cell disks that existed on the previous disk in the slot are re-created on the new hard disk. If those grid disks were part of an Oracle ASM disk group, then they are added back to the disk group, and the data is rebalanced based on disk group redundancy and the ASM_POWER_LIMIT parameter.

Note:

On Oracle Exadata Storage Servers running Oracle Exadata System Software release 12.1.2.0 with Oracle Database release 12.1.0.2 with BP4, Oracle ASM sends an e-mail about the status of a rebalance operation. In earlier releases, the administrator had to check the status of the operation.

For earlier releases, check the rebalance operation status as described in Checking the Status of an ASM Rebalance Operation.

  1. Determine which disk is the failing disk.
    CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status= \
            "warning - predictive failure" DETAIL
    

    The following is an example of the output. The slot number shows the location of the disk, and the status shows the disk is expected to fail.

    CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status= \
             "warning - predictive failure" DETAIL
             name:                   28:3
             deviceId:               19
             diskType:               HardDisk
             enclosureDeviceId:      28
             errMediaCount:          0
             errOtherCount:          0
             foreignState:           false
             luns:                   0_3
             makeModel:              "SEAGATE ST360057SSUN600G"
             physicalFirmware:       0705
             physicalInterface:      sas
             physicalSerial:         E07L8E
             physicalSize:           558.9109999993816G
             slotNumber:             3
             status:                 warning - predictive failure
    
  2. Ensure the blue OK to Remove LED on the disk is lit before removing the disk.
  3. Wait until the Oracle ASM disks associated with the grid disks on the hard disk have been successfully dropped. To determine if the grid disks have been dropped, query the V$ASM_DISK_STAT view on the Oracle ASM instance.

    Caution:

    On all systems prior to Oracle Exadata Database Machine X7, the disks in the first two slots are system disks which store the operating system and Oracle Exadata System Software. One system disk must be in working condition to keep up the server.

    Wait until ALTER CELL VALIDATE CONFIGURATION shows no mdadm errors, which indicates the system disk resynchronization has completed, before replacing the other system disk.

  4. Replace the hard disk on Oracle Exadata Storage Server and wait for three minutes. The hard disk is hot-pluggable, and can be replaced when the power is on.
  5. Confirm the disk is online.

    When you replace a hard disk, the disk must be acknowledged by the RAID controller before you can use it. This does not take long. Use the LIST PHYSICALDISK command to ensure the status is NORMAL.

    CellCLI> LIST PHYSICALDISK WHERE name=28:3 ATTRIBUTES status
    
  6. Verify the firmware is correct using the ALTER CELL VALIDATE CONFIGURATION command.

See Also: