Replacing a Faulty Physical Disk

You might need to replace a physical disk because its status is warning - predictive failure. This status indicates that the physical disk will fail soon, and you should replace it at the earliest opportunity.

If the drive fails before you replace it, then see "Replacing a Failed Physical Disk".

To replace a disk before it fails:

  1. Identify the faulty disk:
    CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status= \
            "warning - predictive failure" DETAIL
    
             name:                   28:3
             deviceId:               19
             diskType:               HardDisk
             enclosureDeviceId:      28
             errMediaCount:          0
             errOtherCount:          0
             foreignState:           false
             luns:                   0_3
             makeModel:              "SEAGATE ST360057SSUN600G"
             physicalFirmware:       0705
             physicalInterface:      sas
             physicalSerial:         E07L8E
             physicalSize:           558.9109999993816G
             slotNumber:             3
             status:                 warning - predictive failure
    

    In the sample output from the previous command, the slot number shows the location of the disk, and the status shows that the disk is expected to fail.

  2. Ensure that the blue "OK to Remove" LED on the disk is lit, before you remove the disk.
  3. Wait while the affected Oracle ASM disks are dropped. To check the status, query the V$ASM_DISK_STAT view on the Oracle ASM instance.

    Caution:

    The disks in the first two slots are system disks, which store the operating system and the Recovery Appliance storage server software. One system disk must be in working condition for the server to operate.

    Before replacing the other system disk, wait until ALTER CELL VALIDATE CONFIGURATION shows no RAID mdadm errors. This output indicates that the system disk resynchronization is complete.

    See Also:

    Oracle Database Reference for information about querying the V$ASM_DISK_STAT view

  4. Replace the physical disk on the storage server and wait three minutes. The physical disk is hot pluggable, and you can replace it when the power is on.
  5. Confirm that the disk is online and its status is NORMAL:
    CellCLI> LIST PHYSICALDISK WHERE name=28:5 ATTRIBUTES status
    

    When you replace a physical disk, the RAID controller must acknowledge the replacement disk before you can use it. Acknowledgment is quick.

  6. Verify that the firmware is correct:
    ALTER CELL VALIDATE CONFIGURATION
    
  7. Re-create the grid disks and cell disks that existed on the previous disk in that slot. See "About Rebalancing the Data".

See Also: