Oracle® Solaris Cluster System Administration Guide

Exit Print View

Updated: October 2015
 
 

Restoring a Corrupted Diskset

Use this procedure if a diskset is corrupted or in a state that the nodes in the cluster are unable to take ownership of the diskset. If your attempts to clear the state have failed, use this procedure as a last attempt to fix the diskset.

These procedures work for Solaris Volume Manager metasets and multi-owner Solaris Volume Manager metasets.

How to Save the Solaris Volume Manager Software Configuration

Restoring a disk set from scratch can be time-consuming and error prone. A better alternative is to use the metastat command to regularly back up replicas or use Oracle Explorer (SUNWexplo) to create a backup. You can then use the saved configuration to recreate the diskset. You should save the current configuration into files (using the prtvtoc and the metastat commands), and then recreate the disk set and its components. See How to Recreate the Solaris Volume Manager Software Configuration.

  1. Save the partition table for each disk in the disk set.
    # /usr/sbin/prtvtoc /dev/global/rdsk/diskname > /etc/lvm/diskname.vtoc
  2. Save the Solaris Volume Manager software configuration.
    # /bin/cp /etc/lvm/md.tab /etc/lvm/md.tab_ORIGINAL
    # /usr/sbin/metastat -p -s setname >> /etc/lvm/md.tab

    Note - Other configuration files, such as the /etc/vfstab file, might reference the Solaris Volume Manager software. This procedure assumes that an identical Solaris Volume Manager software configuration is rebuilt and therefore, the mount information is the same. If Oracle Explorer (SUNWexplo) is run on a node that owns the set, it retrieves the prtvtoc and metaset —p information.

How to Purge the Corrupted Diskset

Purging a set from a node or all nodes removes the configuration. To purge a diskset from a node, the node must not have ownership of the diskset.

  1. Run the purge command on all nodes.
    # /usr/sbin/metaset -s setname -P

    Running this command removes the diskset information from the database replicas, as well as the Oracle Solaris Cluster repository. The –P and –C options allow a diskset to be purged without the need to completely rebuild the Solaris Volume Manager environment.


    Note - If a multi-owner diskset is purged while the nodes were booted out of cluster mode, you might need to remove the information from the dcs configuration files.
    # /usr/cluster/lib/sc/dcs_config -c remove -s setname

    For more information, see the dcs_config(1M) man page.


  2. If you want to remove only the diskset information from the database replicas, use the following command.
    # /usr/sbin/metaset -s setname -C purge

    You should generally use the –P option, rather than the –C option. Using the –C option can cause a problem recreating the diskset because the Oracle Solaris Cluster software still recognizes the diskset.

    1. If you used the –C option with the metaset command, first create the diskset to see if a problem occurs.
    2. If a problem exists, remove the information from the dcs configuration files.
      # /usr/cluster/lib/sc/dcs_config -c remove -s setname

      If the purge options fail, verify that you installed the latest kernel and metadevice updates and contact My Oracle Support.

How to Recreate the Solaris Volume Manager Software Configuration

Use this procedure only if you experience a complete loss of your Solaris Volume Manager software configuration. The steps assume that you have saved your current Solaris Volume Manager configuration and its components and purged the corrupted diskset.


Note - Mediators should be used only on two-node clusters.
  1. Create a new diskset.
    # /usr/sbin/metaset -s setname -a -h nodename1 nodename2

    If this is a multi-owner diskset, use the following command to create a new diskset.

     /usr/sbin/metaset -s setname -aM -h nodename1 nodename2
  2. On the same host where the set was created, add mediator hosts if required (two nodes only).
     /usr/sbin/metaset -s setname -a -m nodename1 nodename2
  3. Add the same disks back into the diskset from this same host.
     /usr/sbin/metaset -s setname -a /dev/did/rdsk/diskname /dev/did/rdsk/diskname
  4. If you purged the diskset and are recreating it, the Volume Table of Contents (VTOC) should remain on the disks, so you can skip this step.

    However, if you are recreating a set to recover, you should format the disks according to a saved configuration in the /etc/lvm/diskname.vtoc file. For example:

    # /usr/sbin/fmthard -s /etc/lvm/d4.vtoc /dev/global/rdsk/d4s2
    # /usr/sbin/fmthard -s /etc/lvm/d8.vtoc /dev/global/rdsk/d8s2

    You can run this command on any node.

  5. Check the syntax in the existing /etc/lvm/md.tab file for each metadevice.
    # /usr/sbin/metainit -s setname -n -a metadevice
  6. Create each metadevice from a saved configuration.
    # /usr/sbin/metainit -s setname -a metadevice
  7. If a file system exists on the metadevice, run the fsck command.
    # /usr/sbin/fsck -n /dev/md/setname/rdsk/metadevice

    If the fsck command displays only a few errors, such as superblock count, then the device was probably reconstructed correctly. You can then run the fsck command without the –n option. If multiple errors appear, verify that you reconstructed the metadevice correctly. If you have, review the fsck errors to determine if the file system can be recovered. If it cannot, you should restore the data from a backup.

  8. Concatenate all other metasets on all cluster nodes to the /etc/lvm/md.tab file and then concatenate the local diskset.
    # /usr/sbin/metastat -p >> /etc/lvm/md.tab