Troubleshooting

This section contains troubleshooting procedures that you can use for testing purposes.

Running an Application Outside the Global Cluster

How to Take a Solaris Volume Manager Metaset From Nodes Booted in Noncluster Mode

Use this procedure to run an application outside the global cluster for testing purposes.

Determine if the quorum device is used in the Solaris Volume Manager metaset, and determine if the quorum device uses SCSI2 or SCSI3 reservations.
```
phys-schost# clquorum show
```
1. If the quorum device is in the Solaris Volume Manager metaset, add a new quorum device which is not part of the metaset to be taken later in noncluster mode.
```
phys-schost# clquorum add did
```
2. Remove the old quorum device.
```
phys-schost# clqorum remove did
```
3. If the quorum device uses a SCSI2 reservation, scrub the SCSI2 reservation from the old quorum and verify that no SCSI2 reservations remain.
  The following command finds the Persistent Group Reservation Emulation (PGRE) keys. If there are no keys on the disk, an errno=22 message is displayed.
```
# /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/dids2
```
  After you locate the keys, scrub the PGRE keys.
```
# /usr/cluster/lib/sc/pgre -c pgre_scrub -d /dev/did/rdsk/dids2
```
  Caution - If you scrub the active quorum device keys from the disk, the cluster will panic on the next reconfiguration with a Lost operational quorum message.
Evacuate the global-cluster node that you want to boot in noncluster mode.
```
phys-schost# clresourcegroup evacuate -n targetnode
```
Take offline any resource group or resource groups that contain HAStorage or HAStoragePlus resources and contain devices or file systems affected by the metaset that you want to later take in noncluster mode.
```
phys-schost# clresourcegroup offline resourcegroupname
```
Disable all the resources in the resource groups that you took offline.
```
phys-schost# clresource disable resourcename
```

Unmanage the resource groups.

phys-schost# clresourcegroup unmanage resourcegroupname

Take offline the corresponding device group or device groups.
```
phys-schost# cldevicegroup offline devicegroupname
```

Disable the device group or device groups.

phys-schost# cldevicegroup disable devicegroupname

Boot the passive node into noncluster mode.
```
phys-schost# reboot -x
```
Verify that the boot process has been completed on the passive node before proceeding.
```
phys-schost# svcs -x
```
Determine if any SCSI3 reservations exist on the disks in the metasets.
Run the following command on all disks in the metasets.
```
phys-schost# /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/dids2
```

If any SCSI3 reservations exist on the disks, scrub them.

phys-schost# /usr/cluster/lib/sc/scsi -c scrub -d /dev/did/rdsk/dids2

Take the metaset on the evacuated node.
```
phys-schost# metaset -s name -C take -f
```
Mount the file system or file systems that contain the defined device on the metaset.
```
phys-schost# mount device mountpoint
```
Start the application and perform the desired test. After finishing the test, stop the application.
Reboot the node and wait until the boot process has ended.
```
phys-schost# reboot
```

Bring online the device group or device groups.

phys-schost# cldevicegroup online -e devicegroupname

Start the resource group or resource groups.

phys-schost# clresourcegroup online -eM resourcegroupname

Restoring a Corrupted Diskset

Use this procedure if a diskset is corrupted or in a state that the nodes in the cluster are unable to take ownership of the diskset. If your attempts to clear the state have failed, use this procedure as a last attempt to fix the diskset.

These procedures work for Solaris Volume Manager metasets and multi-owner Solaris Volume Manager metasets.

How to Save the Solaris Volume Manager Software Configuration

Restoring a disk set from scratch can be time-consuming and error prone. A better alternative is to use the metastat command to regularly back up replicas or use Oracle Explorer (SUNWexplo) to create a backup. You can then use the saved configuration to recreate the diskset. You should save the current configuration into files (using the prtvtoc and the metastat commands), and then recreate the disk set and its components. See How to Recreate the Solaris Volume Manager Software Configuration.

Save the partition table for each disk in the disk set.

# /usr/sbin/prtvtoc /dev/global/rdsk/diskname > /etc/lvm/diskname.vtoc

Save the Solaris Volume Manager software configuration.
```
# /bin/cp /etc/lvm/md.tab /etc/lvm/md.tab_ORIGINAL
```
```
# /usr/sbin/metastat -p -s setname >> /etc/lvm/md.tab
```
Note - Other configuration files, such as the /etc/vfstab file, might reference the Solaris Volume Manager software. This procedure assumes that an identical Solaris Volume Manager software configuration is rebuilt and therefore, the mount information is the same. If Oracle Explorer (SUNWexplo) is run on a node that owns the set, it retrieves the prtvtoc and metaset —p information.

How to Purge the Corrupted Diskset

Purging a set from a node or all nodes removes the configuration. To purge a diskset from a node, the node must not have ownership of the diskset.

Run the purge command on all nodes.
```
# /usr/sbin/metaset -s setname -P
```
Running this command removes the diskset information from the database replicas, as well as the Oracle Solaris Cluster repository. The -P and -C options allow a diskset to be purged without the need to completely rebuild the Solaris Volume Manager environment.
Note - If a multi-owner diskset is purged while the nodes were booted out of cluster mode, you might need to remove the information from the dcs configuration files.
```
# /usr/cluster/lib/sc/dcs_config -c remove -s setname
```
For more information, see the dcs_config(1M) man page.
If you want to remove only the diskset information from the database replicas, use the following command.
```
# /usr/sbin/metaset -s setname -C purge
```
You should generally use the -P option, rather than the -C option. Using the -C option can cause a problem recreating the diskset because the Oracle Solaris Cluster software still recognizes the diskset.
1. If you used the -C option with the metaset command, first create the diskset to see if a problem occurs.
2. If a problem exists, remove the information from the dcs configuration files.
```
# /usr/cluster/lib/sc/dcs_config -c remove -s setname
```
  If the purge options fail, verify that you installed the latest kernel and metadevice updates and contact My Oracle Support.

How to Recreate the Solaris Volume Manager Software Configuration

Use this procedure only if you experience a complete loss of your Solaris Volume Manager software configuration. The steps assume that you have saved your current Solaris Volume Manager configuration and its components and purged the corrupted diskset.

Note - Mediators should be used only on two-node clusters.

Create a new diskset.

# /usr/sbin/metaset -s setname -a -h nodename1 nodename2

If this is a multi-owner diskset, use the following command to create a new diskset.

 /usr/sbin/metaset -s setname -aM -h nodename1 nodename2

On the same host where the set was created, add mediator hosts if required (two nodes only).
```
 /usr/sbin/metaset -s setname -a -m nodename1 nodename2
```

Add the same disks back into the diskset from this same host.

 /usr/sbin/metaset -s setname -a /dev/did/rdsk/diskname /dev/did/rdsk/diskname

If you purged the diskset and are recreating it, the Volume Table of Contents (VTOC) should remain on the disks, so you can skip this step.
However, if you are recreating a set to recover, you should format the disks according to a saved configuration in the /etc/lvm/diskname.vtoc file. For example:
```
# /usr/sbin/fmthard -s /etc/lvm/d4.vtoc /dev/global/rdsk/d4s2
```
```
# /usr/sbin/fmthard -s /etc/lvm/d8.vtoc /dev/global/rdsk/d8s2
```
You can run this command on any node.
Check the syntax in the existing /etc/lvm/md.tab file for each metadevice.
```
# /usr/sbin/metainit -s setname -n -a metadevice
```
Create each metadevice from a saved configuration.
```
# /usr/sbin/metainit -s setname -a metadevice
```
If a filesystem exists on the metadevice, run the fsck command.
```
# /usr/sbin/fsck -n /dev/md/setname/rdsk/metadevice
```
If the fsck command displays only a few errors, such as superblock count, then the device was probably reconstructed correctly. You can then run the fsck command without the -n option. If multiple errors appear, verify that you reconstructed the metadevice correctly. If you have, review the fsck errors to determine if the filesystem can be recovered. If it cannot, you should restore the data from a backup.
Concatenate all other metasets on all cluster nodes to the /etc/lvm/md.tab file and then concatenate the local diskset.
```
# /usr/sbin/metastat -p >> /etc/lvm/md.tab
```

Skip Navigation Links
Exit Print View
	Oracle Solaris Cluster System Administration Guide Oracle Solaris Cluster 4.1