Sun Cluster 2.2 Software Installation Guide

1.3.2 CCD Quorum

The Cluster Configuration Database (CCD) needs to obtain quorum to elect a valid and consistent copy of the CCD. Refer to "1.5.6 Cluster Configuration Database", for an overview of the CCD.

Sun Cluster does not have a storage topology that guarantees direct access from all cluster nodes to underlying storage devices for all configurations. This precludes the possibility of using a single logical volume to store the CCD database, which would guarantee that updates would be propagated correctly across restarts of the cluster framework.The CCD communicates with its peers through the cluster interconnect, and this logical link is unavailable on nodes that are not cluster members. We will illustrate the CCD quorum requirement with a simple example.

Assume a three-node cluster consisting of nodes A, B, and C. Node A exits the cluster leaving B and C as the surviving cluster members. The CCD is updated and the updates are propagated to nodes B and C. Now, nodes B and C leave the cluster. Subsequently, node A is restarted. However, A does not have the most recent copy of the CCD database because it has no means of knowing the updates that happened on nodes B and C after it left the cluster membership the last time around. In fact, irrespective of which node is started first, it is not possible to determine in an unambiguous manner which node has the most recent copy of the CCD database. Only when all three nodes are restarted is there sufficient information to determine the most recent copy of the CCD.If a valid CCD could not be elected, all query or update operations on the CCD would fail with an invalid CCD error. In practice, starting all cluster nodes before determining a valid copy of the CCD is too restrictive a condition.

This condition can be relaxed by imposing a restriction on the update operation. If N is the number of currently configured nodes in the cluster, at least floor(n/2)+1 [floor(n) = n, if (n modulo 1 = 0), = n - (n modulo 1), if (n modulo 1 != 0)] nodes must be up for updates to be propagated. In this case, it is sufficient for ceiling(n/2) [ceiling(n) = n, if (n modulo 1 = 0), = n +1 - (n modulo 1), if (n modulo 1 != 0)] identical copies to be present to elect a valid database on a cluster restart. The valid CCD is then propagated to all cluster nodes that do not already have it.

Note that even if the CCD is invalid, a node is allowed to join the cluster. However, the CCD can neither be updated or queried in this state. This implies that all components of the cluster framework that rely on the CCD remain in a dysfunctional state. In particular, logical hosts cannot be mastered and data services cannot be activated in this state. The CCD is enabled only after sufficient number of nodes join the cluster for quorum to be reached. Alternatively, an administrator can restore the CCD database with the maximum CCD generation number.

CCD quorum problems can be avoided if at least one or more nodes stay up during a reconfiguration. In this case, the valid copy on any of these nodes will be propagated to the newly joining nodes. Another alternative is to ensure that the cluster is started up on the node that has the most recent copy of the CCD database. Nevertheless, it is quite possible that after a system crash while a database update was in progress, the recovery algorithm finds inconsistent CCD copies. In such cases, it is the responsibility of the administrator to restore the database using the ccdadm(1M) restore option. The CCD also provides a checkpoint facility to backup the current contents of the database. It is good practice to make a backup copy of the CCD database after any change to system configuration. The backup copy can then be used to subsequently restore the database. The CCD is quite small compared to conventional relational databases and the backup and restore operations take no more than a few seconds to complete.

1.3.2.1 CCD Quorum in Two-Node Clusters

In the case of two-node clusters, the previously discussed quorum majority rule would require both nodes to be cluster members for updates to succeed, which is too restrictive. On the other hand, if updates are allowed in this configuration while only one node is up, the database will have to be manually made consistent before restarting the cluster. This can be accomplished by either restarting the node that has the most recent copy first, or restoring the database with the ccdadm(1M) restore operation after both nodes have joined. In the latter case, even though both nodes will be able to join the cluster membership, the CCD will be in an invalid state until the restore operation is complete.

This problem is solved by configuring persistent storage for the database on a shared disk device. The shared copy is used only when a single node is active. When the second node joins, the shared CCD copy is copied into the local copy on each node.

Whenever one of the nodes leave, the shared copy is reactivated by copying the local CCD into the shared copy. This enables updates only when a single node is in the cluster membership and also ensures reliable propagation of updates across cluster restarts.

The downside of using a shared storage device for the shared copy of the CCD is that two disks need to be allocated exclusively for this purpose, because the volume manager precludes these disks from being used for any other purpose. The usage of the two disks can be avoided if application downtime caused by the procedural limitations described above are understood and can be tolerated in a production environment.

Similar to the Sun Cluster 2.2 integration issues with the CMM quorum, a shared CCD is not supported in all Sun Cluster configurations. If Solstice DiskSuite is the volume manager, the shared CCD is not supported. Because the shared CCD is only used when one node is active, the failure addressed by the shared CCD is not common.