Sun Cluster 2.2 Cluster Volume Manager Guide

1.5 Software Limitations and Known Problems

The following caveats and usage issues are known for this release of CVM Release 2.2.1:

If CVM has deported a disk group because the disk group has lost access to one or more of its disks (due to a node leaving the cluster), the only way to try to regain access to the deported disks that are still attached to nodes in the cluster is to force-import the deported disk group. However, forcing an import in this situation is dangerous because it can cause mirrors to become unsynchronized in such a way that it cannot be determined which mirror has correct data.
It is possible to have private (non-shared) disk groups on physically shared disks. If these disks are on controllers that have been designated for fencing (for example, reserved by Sun Cluster), the owner of the private disk group may not be able to access it when it is not in the cluster.
CVM does not currently support RAID5 volumes.
Only gen volume types are supported in shared disk groups. The use of fsgen volumes can cause system deadlocks.
When a node leaves the cluster due to clean shutdown or abort, the surviving node performs a cluster reconfiguration. If the leaving node attempts to rejoin before the cluster reconfiguration is complete, the outcome depends on whether the leaving node is a slave or master.

If the leaving node is a slave, the attempt will fail with one of the following pairs of error messages:

Resource temporarily unavailable

[vxclust] return from cluster_establish is configuration
daemon error -1
Resource temporarily unavailable

master has disconnected

A retry at a later time should succeed.

If the leaving node is a master, the attempt will generate disk-related error messages on both nodes and the remaining node will abort. The joining node will eventually join and may become master.

If vxconfigd is stopped on both the master and slave nodes and then restarted on the slave first, its displays will not be reliable until vxconfigd has started on the master and the slave has reconnected (which may take about 30 seconds). In particular, shared disk groups will be marked "disabled" and no information about them will be available. vxconfigd should therefore be started on the master first.
When a node aborts from the cluster, open volume devices in shared disk groups on which I/O is not active are not removed until the volumes are closed. If this node later joins the cluster as the master while these volumes are still open, the presence of these volumes does not cause a problem. However, if the node tries to rejoin the cluster as a slave, this may fail with the error message:

cannot assign minor #

This is accompanied by the console message:

WARNING:minor number ### disk group group in use

The current disk hot-sparing mechanism does not work well for partial disk failures. The model was written using the presumption that disks fail totally, rather than partially, and that partial errors can usually be fixed by writing back a failing sector. Usually this is a good assumption, but some users have encountered situations where only a few sectors failed and hot-sparing did not occur.
When vxconfigd is stopped and restarted, it may disable large disk groups (for example, disk groups containing hundreds of volumes).

Workaround: Restart vxconfigd with the cleartempdir option. If needed, deport and reimport the disk groups and start all volumes.
Under some circumstances, a node abort may lead to a panic. This is relatively rare, but can occur- if I/O cannot be quiesced in a timely manner and the node needs to be brought down to ensure data integrity.