Restoring a Corrupted Disk Set

Language:

Use this procedure if a disk set is corrupted or in a state that the nodes in the cluster are unable to take ownership of the disk set. If your attempts to clear the state have failed, use this procedure as a last attempt to fix the disk set.

These procedures work for Solaris Volume Manager metasets and multi-owner Solaris Volume Manager metasets.

How to Save the Solaris Volume Manager Software Configuration

Restoring a disk set from scratch can be time-consuming and error prone. A better alternative is to use the metastat command to regularly back up replicas or use Oracle Explorer (SUNWexplo) to create a backup. You can then use the saved configuration to recreate the disk set. You should save the current configuration into files (using the prtvtoc and metastat commands), and then recreate the disk set and its components. See How to Recreate the Solaris Volume Manager Software Configuration.

Save the partition table for each disk in the disk set.

# /usr/sbin/prtvtoc /dev/global/rdsk/disk-name > /etc/lvm/disk-name.vtoc

Save the Solaris Volume Manager software configuration.
```
# /bin/cp /etc/lvm/md.tab /etc/lvm/md.tab_ORIGINAL
# /usr/sbin/metastat -p -s set-name >> /etc/lvm/md.tab
```
Note - Other configuration files, such as the /etc/vfstab file, might reference the Solaris Volume Manager software. This procedure assumes that an identical Solaris Volume Manager software configuration is rebuilt and therefore, the mount information is the same. If Oracle Explorer (SUNWexplo) is run on a node that owns the set, it retrieves the prtvtoc and metaset -p information.

How to Purge the Corrupted Disk Set

Purging a set from a node or all nodes removes the configuration. To purge a disk set from a node, the node must not have ownership of the disk set.

Run the purge command on all nodes.
```
# /usr/sbin/metaset -s set-name -P
```
Running this command removes the disk set information from the database replicas, as well as the Oracle Solaris Cluster repository. The –P and –C options allow a disk set to be purged without the need to completely rebuild the Solaris Volume Manager environment.
Note - If a multi-owner disk set is purged while the nodes were booted out of cluster mode, you might need to remove the information from the DCS configuration files.
```
# /usr/cluster/lib/sc/dcs_config -c remove -s set-name
```
For more information, see the dcs_config(8) man page.

If you want to remove only the disk set information from the database replicas, use the following command.
```
# /usr/sbin/metaset -s set-name -C purge
```
You should generally use the –P option, rather than the –C option. Using the –C option can cause a problem recreating the disk set because the Oracle Solaris Cluster software still recognizes the disk set.
1. If you used the –C option with the metaset command, first create the disk set to see if a problem occurs.
2. If a problem exists, remove the information from the dcs configuration files.
```
# /usr/cluster/lib/sc/dcs_config -c remove -s setname
```
  If the purge options fail, verify that you installed the latest kernel and metadevice updates and contact My Oracle Support.

How to Recreate the Solaris Volume Manager Software Configuration

Use this procedure only if you experience a complete loss of your Solaris Volume Manager software configuration. The steps assume that you have saved your current Solaris Volume Manager configuration and its components and purged the corrupted disk set.

Note - Use mediators only on two-node clusters.

Create a new disk set.

# /usr/sbin/metaset -s set-name -a -h node1 node2

If this is a multi-owner disk set, use the following command to create a new disk set.

 /usr/sbin/metaset -s set-name -aM -h node1 node2

On the same host where the set was created, add mediator hosts if required (two nodes only).
```
 /usr/sbin/metaset -s set-name -a -m node1 node2
```

Add the same disks back into the disk set from this same host.

 /usr/sbin/metaset -s set-name -a /dev/did/rdsk/disk-name /dev/did/rdsk/disk-name

If you purged the disk set and are recreating it, the Volume Table of Contents (VTOC) should remain on the disks, so you can skip this step.
However, if you are recreating a set to recover, you should format the disks according to a saved configuration in the /etc/lvm/disk-name.vtoc file. For example:
```
# /usr/sbin/fmthard -s /etc/lvm/d4.vtoc /dev/global/rdsk/d4s2
```
```
# /usr/sbin/fmthard -s /etc/lvm/d8.vtoc /dev/global/rdsk/d8s2
```
You can run this command on any node.
Check the syntax in the existing /etc/lvm/md.tab file for each metadevice.
```
# /usr/sbin/metainit -s set-name -n -a metadevice
```

Create each metadevice from a saved configuration.

# /usr/sbin/metainit -s set-name -a metadevice

If a file system exists on the metadevice, run the fsck command.
```
# /usr/sbin/fsck -n /dev/md/set-name/rdsk/metadevice
```
If the fsck command displays only a few errors, such as superblock count, then the device was probably reconstructed correctly. You can then run the fsck command without the –n option. If multiple errors appear, verify that you reconstructed the metadevice correctly. If you have, review the fsck errors to determine if the file system can be recovered. If it cannot, you should restore the data from a backup.
Concatenate all other metasets on all cluster nodes to the /etc/lvm/md.tab file and then concatenate the local disk set.
```
# /usr/sbin/metastat -p >> /etc/lvm/md.tab
```