Sun Cluster 2.2 Cluster Volume Manager Guide

2.4.1 Disk/Controller/Storage Device Failure

Failure of a disk, controller, or other storage device may make one or more devices inaccessible from one or more nodes. If a device was being accessed at the time of failure, that device is detached from the disk group. The data layout of a mirrored device should be such that no single failure can make both and/or all mirrors unavailable.

The first step of recovery is to make the failed device(s) accessible again, which includes:

  1. Rectify the fault condition (hardware and/or software) and make sure the devices are accessible again.

  2. Run vxdctl enable on all nodes of clusters.

  3. Run vxreattach on the master node.

  4. Run vxreattach on the other nodes that have non-shared disk groups.

  5. Verify (by running vxprint) that the devices have been reattached. (Under certain circumstances, vxreattach may not reattach a disk removed and/or replaced disks. These disks must be manually reattached using vxdg/vxdiskadm/vxva.

  6. Run vxrecover -sb on the master node.

  7. Run vxrecover -g <dg> -sb on another node with a non-shared disk group.