Solaris Volume Manager Administration Guide

Replacing Disks

This section describes how to replace disks in a Solaris Volume Manager environment.


Caution – Caution –

If you have soft partitions on a failed disk or on volumes built on a failed disk, you must put the new disk in the same physical location, with the same c*t*d* number as the disk it replaces.


How to Replace a Failed Disk

  1. Identify the failed disk to be replaced by examining the /var/adm/messages file and the metastat command output.

  2. Locate any state database replicas that might have been placed on the failed disk.

    Use the metadb command to find the replicas.

    The metadb command might report errors for the state database replicas located on the failed disk. In this example, c0t1d0 is the problem device.


    # metadb
       flags       first blk        block count
      a m     u        16               1034            /dev/dsk/c0t0d0s4
      a       u        1050             1034            /dev/dsk/c0t0d0s4
      a       u        2084             1034            /dev/dsk/c0t0d0s4
      W   pc luo       16               1034            /dev/dsk/c0t1d0s4
      W   pc luo       1050             1034            /dev/dsk/c0t1d0s4
      W   pc luo       2084             1034            /dev/dsk/c0t1d0s4

    The output shows three state database replicas on slice 4 of the local disks, c0t0d0 and c0t1d0. The W in the flags field of the c0t1d0s4 slice indicates that the device has write errors. Three replicas on the c0t0d0s4 slice are still good.

  3. Record the slice name where the state database replicas reside and the number of state database replicas, then delete the state database replicas.

    The number of state database replicas is obtained by counting the number of appearances of a slice in the metadb command output. In this example, the three state database replicas that exist on c0t1d0s4 are deleted.


    # metadb -d c0t1d0s4
    

    Caution – Caution –

    If, after deleting the bad state database replicas, you are left with three or fewer, add more state database replicas before continuing. This will help ensure that configuration information remains intact.


  4. Locate and delete any hot spares on the failed disk. Use the metastat command to find hot spares. In this example, hot spare pool hsp000 included c0t1d0s6, which is then deleted from the pool.


    # metahs -d hsp000 c0t1d0s6
    hsp000: Hotspare is deleted
  5. Physically replace the failed disk.

  6. Logically replace the failed disk using the devfsadm command, cfgadm command, luxadm command, or other commands as appropriate for your hardware and environment.

  7. Update the Solaris Volume Manager state database with the device ID for the new disk using the metadevadm -u cntndn command.

    In this example, the new disk is c0t1d0.


    # metadevadm -u c0t1d0
    
  8. Repartition the new disk.

    Use the format command or the fmthard command to partition the disk with the same slice information as the failed disk. If you have the prtvtoc output from the failed disk, you can format the replacement disk with fmthard -s /tmp/failed-disk-prtvtoc-output

  9. If you deleted state database replicas, add the same number back to the appropriate slice.

    In this example, /dev/dsk/c0t1d0s4 is used.


    # metadb -a -c 3 c0t1d0s4
    
  10. If any slices on the disk are components of RAID 5 volumes or are components of RAID 0 volumes that are in turn submirrors of RAID 1 volumes, run the metareplace -e command for each slice.

    In this example, /dev/dsk/c0t1d0s4 and mirror d10 are used.


    # metareplace -e d10 c0t1d0s4
    
  11. If any soft partitions are built directly on slices on the replaced disk, run the metarecover -d -p command on each slice containing soft partitions to regenerate the extent headers on disk.

    In this example, /dev/dsk/c0t1d0s4 needs to have the soft partition markings on disk regenerated, so is scanned and the markings are reapplied, based on the information in the state database replicas.


    # metarecover c0t1d0s4 -d -p 
    
  12. If any soft partitions on the disk are components of RAID 5 volumes or are components of RAID 0 volumes that are submirrors of RAID 1 volumes, run the metareplace -e command for each slice.

    In this example, /dev/dsk/c0t1d0s4 and mirror d10 are used.


    # metareplace -e d10 c0t1d0s4
    
  13. If any RAID 0 volumes have soft partitions built on them, run the metarecover command for each of the RAID 0 volume.

    In this example, RAID 0 volume d17 has soft partitions built on it.


    # metarecover d17 -m -p
    
  14. Replace hot spares that were deleted, and add them to the appropriate hot spare pool or pools.


    # metahs -a hsp000 c0t0d0s6
    hsp000: Hotspare is added
  15. If soft partitions or non-redundant volumes were affected by the failure, restore data from backups. If only redundant volumes were affected, then validate your data.

    Check the user/application data on all volumes. You might have to run an application-level consistency checker or use some other method to check the data.

Example—Replacing a Failed Disk

In the following example, a disk (/dev/dsk/c0t1d0) has failed and needs to be replaced.


panic[cpu0]/thread=70a41e00: md: state database problem


ok boot -s
Resetting ... 


Jun  7 08:57:25 su: 'su root' succeeded for root on /dev/console
Sun Microsystems Inc.   SunOS 5.9       s81_39  May 2002
# metadb
        flags           first blk       block count
     a m  p  lu         16              8192            /dev/dsk/c0t0d0s7
     a    p  l          8208            8192            /dev/dsk/c0t0d0s7
     a    p  l          16400           8192            /dev/dsk/c0t0d0s7
#  

SOME TEXT HERE.