Sun Cluster 2.2 System Administration Guide

11.6.6 How to Replace a SPARCstorage Array Disk (SSVM or CVM)

In an SSVM or CVM configuration, it is possible to replace a SPARCstorage Array disk without halting the system, as long as the configuration is mirrored.


Note -

If you need to replace a disk in a bootable SPARCstorage Array, do not remove the SSA trays containing the boot disk of the hosts. Instead, shut down the host whose boot disk is present on that tray. Let the cluster software reconfigure the surviving nodes for failover to take effect before servicing the faulty disk. Refer to the SPARCstorage Array User's Guide for more information.


These are the high-level steps to replace a multihost disk in an SSVM environment using SPARCstorage Array 100 series disks.

These are the detailed steps to replace a multihost disk in an SSVM environment using SPARCstorage Array 100 series disks.

  1. If the replaced disk is a quorum device, use the scconf -q command to change the quorum device to a different disk.

  2. Identify all the volumes and corresponding plexes on the disks in the tray which contains the faulty disk.

    1. From the physical device address cNtNdN, obtain the controller number and the target number.

      For example, if the device address is c3t2d0, the controller number is 3 and the target is 2.

    2. Identify devices from a vxdisk list output.

      If the target is 0 or 1, identify all devices with physical addresses beginning with cNt0 and cNt1, where N is the controller number. If the target is 2 or 3, identify all devices with physical addresses beginning with cNt2 and cNt3. If the target is 4 or 5, identify all devices with physical addresses beginning with cNt4 and cNt5. Here is an example of how vxdisk can be used to obtain the information.

      # vxdisk -g diskgroup -q list | egrep c3t2\|c3t3 | nawk '{print $3}'
      
    3. Record the volume media name for the faulty disk from the output of the command.

      You will need this name in Step 10.

    4. Identify all plexes on the above devices by using the appropriate version (csh, ksh, or Bourne shell) of the following command.

      PLLIST=`vxprint -ptq -g diskgroup -e '(aslist.sd_dm_name in
       ("c3t2d0","c3t3d0","c3t3d1")) && (pl_kstate=ENABLED)' | nawk '{print $2}'`
      

      For csh, the syntax is set PLLIST .... For ksh, the syntax is export PLLIST= .... The Bourne shell requires the command export PLLIST after the variable is set.

  3. After you have set the variable, stop I/O to the volumes whose components (subdisks) are on the tray.

    Make sure all volumes associated with that tray are detached (mirrored or RAID5 configurations) or stopped (simple plexes). Issue the following command to detach a mirrored plex.

    # vxplex det ${PLLIST}
    

    An alternate command for detaching each plex in a tray is:

    # vxplex -g diskgroup -v volume det plex
    

    To stop I/O to simple plexes, unmount any file systems or stop database access.


    Note -

    Mirrored volumes will still be active because the other half of the mirror is still available.


  4. Remove the disk from the disk group.

    # vxdg -g diskgroup rmdisk diskname
    
  5. Spin down the disks in the tray.

    # luxadm stop -t tray controller
    
  6. Replace the faulty disk.

  7. Spin up the drives.

    # luxadm start -t tray controller
    
  8. Initialize the replacement disk.

    # vxdisksetup -i devicename
    
  9. Scan the current disk configuration again.

    Enter the following commands on all nodes in the cluster.

    # vxdctl enable
    # vxdisk -a online
    
  10. Add the new disk to the disk group.

    The device-media-name is the volume media name recorded in Step 2.

    # vxdg -g diskgroup -k adddisk device-media-name=device-name
    
  11. Resynchronize the volumes.

    # vxrecover -g diskgroup -b -o