The following examples show some typical recovery situations. You start the recovery process by checking the operating mode of the cluster nodes; recovery must be performed on the master node (for a non-shared disk group, recovery must be performed on the node where the disk group was imported).
Root@capri:/# vxdctl -c mode mode: enabled: cluster active - SLAVE Root@palermo:/# vxdctl -c mode mode: enabled: cluster active - MASTER |
To check the available disk groups on both nodes, you can use vxdg list:
Root@capri:/# vxdg list NAME STATE ID rootdg enabled 885258939.1025.capri test enabled,shared 885331519.1233.palermo Root@palermo:/# vxdg list NAME STATE ID rootdg enabled 885258917.1025.palermo test enabled,shared 885331519.1233.palermo |
In this case there is one non-shared (rootdg) and one shared (test) disk group. The disk group ID of rootdg differs between the two hosts, even though the name is the same. Notice the state of the volume manager objects. For each object KSTATE and STATE, you can consider the object state and decide whether or not recovery is warranted.
vxprint -g test TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 dg test test - - - - - - dm disk1 c4t0d6s2 - 8379057 - - - - dm disk2 c5t0d6s2 - 8379057 - - - - v test1 fsgen ENABLED 131072 - ACTIVE - - pl test1-01 test1 ENABLED 132867 - ACTIVE - - sd c4t0d6s2-01 test1-01 ENABLED 132867 0 - - - pl test1-02 test1 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-01 test1-02 ENABLED 132867 0 - - - v test2 fsgen ENABLED 131072 - ACTIVE - - pl test2-01 test2 ENABLED 132867 - ACTIVE - - sd c4t0d6s2-02 test2-01 ENABLED 132867 0 - - - pl test2-02 test2 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-02 test2-02 ENABLED 132867 0 - - - |
If device c4t0d6s2 or all devices under controller c4 become unavailable, the device is detached from the disk group. The following example shows how the vxprint output looks after fault injection.
vxprint -g test TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 dg test test - - - - - - dm disk1 - - - - NODEVICE - - dm disk2 c5t0d6s2 - 8379057 - - - - v test1 fsgen ENABLED 131072 - ACTIVE - - pl test1-01 test1 DISABLED 132867 - NODEVICE - - sd c4t0d6s2-01 test1-01 DISABLED 132867 0 NODEVICE - - pl test1-02 test1 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-01 test1-02 ENABLED 132867 0 - - - v test2 fsgen ENABLED 131072 - ACTIVE - - pl test2-01 test2 DISABLED 132867 - NODEVICE - - sd c4t0d6s2-02 test2-01 DISABLED 132867 0 NODEVICE - - pl test2-02 test2 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-02 test2-02 ENABLED 132867 0 - - - |
Notice that the state of the DM entry, disk1, and the subdisk and plex using this disk is NODEVICE. In this case, the device should be reattached when it becomes accessible again. The vxdisk list output shows the state of the disk. If the device state has changed, you should run vxdctl enable before you run vxdisk list.
vxdctl enable vxdisk list | grep c[45]t0d6s2 c4t0d6s2 sliced - - error shared c5t0d6s2 sliced disk2 test online shared - - disk1 test failed was:c4t0d6s2 |
Notice that c5t0d6s2 is online, but c4t0d6s2 is in an error state. If the device was accessible to some nodes but not others, the vxdisk list output might differ between nodes (nodes that can still access the device will show it online). Now we can rectify the fault condition (in this case palermo lost connectivity to one of the SSAs; connection was restored later). At this point, running vxreattach is enough to reattach the devices.
Next, you can run vxdctl enable and verify that the devices are now accessible (device state is online). The following examples show the use of the vxdisk and vxdg commands.
vxdctl enable vxdisk -a online vxdisk list | grep c[45]t0d6s2 c4t0d6s2 sliced - - error shared c5t0d6s2 sliced disk2 test online shared - - disk1 test failed was:c4t0d6s2 |
The preceding listing shows that c4t0d6s2, which is no longer associated with any disk group, was associated with disk group test as DM disk1. You can reattach it with the command vxdg -g test -k adddisk disk1=c4t0d6s2, after you verify that disk1 is still disassociated, and c4t0d6s2 is the right disk (that is, it has not been swapped).
vxprint -d -g test -F "%name %nodarec %diskid" disk1 on 882484157.1163.palermo disk2 off 884294145.1517.palermo |
The preceding listing shows the DM name to disk ID association. Since the nodarec attribute of disk1 is on, it is still disassociated. Disk 882484157.1163.palermo used to be associated with it. If you did not physically replace or move the disk, this disk ID should correspond to c4t0d6s2. If it was replaced by a new initialized disk, you may not find a matching disk ID. To verify the disk ID, you can run the command vxdisk -s list.
vxdisk -s list c4t0d6s2 c5t0d6s2 Disk: c4t0d6s2 type: sliced flags: online ready private autoconfig shared autoimport diskid: 882484157.1163.palermo dgname: test dgid: 885331519.1233.palermo clusterid: italia Disk: c5t0d6s2 type: sliced flags: online ready private autoconfig shared autoimport imported diskid: 884294145.1517.palermo dgname: test dgid: 885331519.1233.palermo clusterid: italia |
The preceding listing shows that disk c4t0d6s2 ID is 882484157.1163.palermo. Verifying the association this way is rather tedious. Fortunately, vxreattach (with the -c option) can show you the disk group and DM with which a disk should be reattached:
vxreattach -c c4t0d6s2 test disk1 |
You can now associate the disk using the command vxdg -g test -k adddisk disk1=c4t0d6s2. Under most circumstances, running vxreattach takes care of all the preceding steps (running vxdctl enable and reattaching devices with respective disk groups). However, if a disk was removed administratively using the vxdiskadm command, or was physically replaced, it must be replaced using the vxdg command (option 5) or the VxVM GUI.
vxreattach -bv ! vxdg -g 885331519.1233.palermo -k adddisk disk1=c4t0d6s2 |
You can verify state change of the DM and plex using the vxprint command.
vxprint -g test TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 dg test test - - - - - - dm disk1 c4t0d6s2 - 8379057 - - - - dm disk2 c5t0d6s2 - 8379057 - - - - v test1 fsgen ENABLED 131072 - ACTIVE - - pl test1-01 test1 DISABLED 132867 - IOFAIL - - sd c4t0d6s2-01 test1-01 ENABLED 132867 0 - - - pl test1-02 test1 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-01 test1-02 ENABLED 132867 0 - - - v test2 fsgen ENABLED 131072 - ACTIVE - - pl test2-01 test2 DISABLED 132867 - RECOVER - - sd c4t0d6s2-02 test2-01 ENABLED 132867 0 - - - pl test2-02 test2 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-02 test2-02 ENABLED 132867 0 - - - |
You can recover the volume and plex now by specifying the -rb option of vxreattach to start vxrecover.
vxrecover -g test -vb job 026404 dg test volume test1: reattach plex test1-01 ps -ef | grep plex root 26404 26403 1 13:58:04 ? 0:01 /usr/lib/vxvm/type/fsgen/vxplex -U fsgen -g 885331519.1233.palermo -- att test1 root 26406 916 0 13:58:10 console 0:00 grep plex |
Running in the background, vxrecover started vxplex to attach the plex to the volume (note the STALE state and ATT in tutil0).
vxprint -g test TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 dg test test - - - - - - dm disk1 c4t0d6s2 - 8379057 - - - - dm disk2 c5t0d6s2 - 8379057 - - - - v test1 fsgen ENABLED 131072 - ACTIVE ATT1 - pl test1-01 test1 ENABLED 132867 - STALE ATT - sd c4t0d6s2-01 test1-01 ENABLED 132867 0 - - - pl test1-02 test1 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-01 test1-02 ENABLED 132867 0 - - - v test2 fsgen ENABLED 131072 - ACTIVE - - pl test2-01 test2 DISABLED 132867 - RECOVER - - sd c4t0d6s2-02 test2-01 ENABLED 132867 0 - - - pl test2-02 test2 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-02 test2-02 ENABLED 132867 0 - - - Root@palermo:/ # job 026404 done status=0 job 026408 dg test volume test2: reattach plex test2-01 job 026408 done status=0 |
After the volumes have been recovered, check the state of the devices again (KSTATE should be ENABLED and STATE should be ACTIVE):
vxprint -g test TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 dg test test - - - - - - dm disk1 c4t0d6s2 - 8379057 - - - - dm disk2 c5t0d6s2 - 8379057 - - - - v test1 fsgen ENABLED 131072 - ACTIVE - - pl test1-01 test1 ENABLED 132867 - ACTIVE - - sd c4t0d6s2-01 test1-01 ENABLED 132867 0 - - - pl test1-02 test1 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-01 test1-02 ENABLED 132867 0 - - - v test2 fsgen ENABLED 131072 - ACTIVE - - pl test2-01 test2 ENABLED 132867 - ACTIVE - - sd c4t0d6s2-02 test2-01 ENABLED 132867 0 - - - pl test2-02 test2 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-02 test2-02 ENABLED 132867 0 - - - |
Now the recovery is complete. If the fault had occurred on the slave node rather than the master node, the behavior might vary slightly. Following fault injection the vxprint output is similar to the following listing:
vxprint -g test TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 dg test test - - - - - - dm disk1 c4t0d6s2 - 8379057 - - - - dm disk2 c5t0d6s2 - 8379057 - - - - v test1 fsgen ENABLED 131072 - ACTIVE - - pl test1-01 test1 DETACHED 132867 - IOFAIL - - sd c4t0d6s2-01 test1-01 ENABLED 132867 0 - - - pl test1-02 test1 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-01 test1-02 ENABLED 132867 0 - - - v test2 fsgen ENABLED 131072 - ACTIVE - - pl test2-01 test2 ENABLED 132867 - ACTIVE - - sd c4t0d6s2-02 test2-01 ENABLED 132867 0 - - - pl test2-02 test2 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-02 test2-02 ENABLED 132867 0 - - - |
Since the devices are not detached, running vxrecover on the master after the slave can access the disk again will be enough. However, if the disk is removed administratively, it must be added using vxdg/vxdiskadm/vxva (vxreattach does not work) and then recovered by using vxrecover.
vxdg -k rmdisk disk1 vxprint -g test TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 dg test test - - - - - - dm disk1 - - - - REMOVED - - dm disk2 c5t0d6s2 - 8379057 - - - - v test1 fsgen ENABLED 131072 - ACTIVE - - pl test1-01 test1 DISABLED 132867 - REMOVED - - sd c4t0d6s2-01 test1-01 DISABLED 132867 0 REMOVED - - pl test1-02 test1 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-01 test1-02 ENABLED 132867 0 - - - v test2 fsgen ENABLED 131072 - ACTIVE - - pl test2-01 test2 DISABLED 132867 - REMOVED - - sd c4t0d6s2-02 test2-01 DISABLED 132867 0 REMOVED - - pl test2-02 test2 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-02 test2-02 ENABLED 132867 0 - - - |
Note that vxreattach reports any disk that it can reattach. However, you can reattach the disk manually as follows:
vxdg -g test -k adddisk disk1=c4t0d6s2 vxprint -g test TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 dg test test - - - - - - dm disk1 c4t0d6s2 - 8379057 - - - - dm disk2 c5t0d6s2 - 8379057 - - - - v test1 fsgen ENABLED 131072 - ACTIVE - - pl test1-01 test1 DISABLED 132867 - RECOVER - - sd c4t0d6s2-01 test1-01 ENABLED 132867 0 - - - pl test1-02 test1 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-01 test1-02 ENABLED 132867 0 - - - v test2 fsgen ENABLED 131072 - ACTIVE - - pl test2-01 test2 DISABLED 132867 - RECOVER - - sd c4t0d6s2-02 test2-01 ENABLED 132867 0 - - - pl test2-02 test2 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-02 test2-02 ENABLED 132867 0 - - - vxrecover -v -g test job 026416 dg test volume test1: reattach plex test1-01 waiting... job 026416 done status=0 job 026417 dg test volume test2: reattach plex test2-01 waiting... job 026417 done status=0 |
The following example shows how you can reattach and recover:
# ps -ef | grep vx root 21935 1 1 20:10:31 ? 5:36 vxconfigd root 29295 1 0 14:29:11 ? 0:00 /usr/sbin/vxrecover -c -v -s root 29349 1 0 14:29:13 ? 0:00 /usr/sbin/vxrecover -c -v -s root 29399 29295 0 14:29:14 ? 0:00 /usr/lib/vxvm/type/fsgen/vxvol -U fsgen -g 885331519.1233.palermo -- resync tes root 29507 29399 0 14:29:16 ? 0:00 /usr/lib/vxvm/type/fsgen/vxvol -U fsgen -g 885331519.1233.palermo -- resync tes root 29508 29349 0 14:29:17 ? 0:00 /usr/lib/vxvm/type/fsgen/vxvol -U fsgen -g 885331519.1233.palermo -- resync tes root 29509 29508 0 14:29:17 ? 0:00 /usr/lib/vxvm/type/fsgen/vxvol -U fsgen -g 885331519.1233.palermo -- resync tes root 29511 916 0 14:29:21 console 0:00 grep vx vxprint -g test TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 dg test test - - - - - - dm disk1 c4t0d6s2 - 8379057 - - - - dm disk2 c5t0d6s2 - 8379057 - - - - v test1 fsgen ENABLED 131072 - SYNC - - pl test1-01 test1 ENABLED 132867 - ACTIVE - - sd c4t0d6s2-01 test1-01 ENABLED 132867 0 - - - pl test1-02 test1 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-01 test1-02 ENABLED 132867 0 - - - v test2 fsgen ENABLED 131072 - SYNC - - pl test2-01 test2 ENABLED 132867 - ACTIVE - - sd c4t0d6s2-02 test2-01 ENABLED 132867 0 - - - pl test2-02 test2 ENABLED 132867 - ACTIVE - - sd c5t0d6s2-02 test2-02 ENABLED 132867 0 - - - |
Notice that the state of the volumes is now SYNC. Their state will be ACTIVE after vxplex completes.