Sun Cluster 2.2 Software Installation Guide

CCD Quorum

The Cluster Configuration Database (CCD) needs to obtain quorum to elect a valid and consistent copy of the CCD. Refer to "Cluster Configuration Database", for an overview of the CCD.

Sun Cluster does not have a storage topology that guarantees direct access from all cluster nodes to underlying storage devices for all configurations. This precludes the possibility of using a single logical volume to store the CCD database, which would guarantee that updates would be propagated correctly across restarts of the cluster framework.The CCD communicates with its peers through the cluster interconnect, and this logical link is unavailable on nodes that are not cluster members. We will illustrate the CCD quorum requirement with a simple example.

Assume a three-node cluster consisting of nodes A, B, and C. Node A exits the cluster leaving B and C as the surviving cluster members. The CCD is updated and the updates are propagated to nodes B and C. Now, nodes B and C leave the cluster. Subsequently, node A is restarted. However, A does not have the most recent copy of the CCD database because it has no means of knowing the updates that happened on nodes B and C. In fact, irrespective of which node is started first, it is not possible to determine absolutely which node has the most recent copy of the CCD database. Only when all three nodes are restarted is there sufficient information to determine the most recent copy of the CCD. If a valid CCD could not be elected, all query or update operations on the CCD would fail with an invalid CCD error.

In practice, starting all cluster nodes before determining a valid copy of the CCD is too restrictive a condition. This condition can be relaxed by imposing a restriction on the update operation.

If n is the number of nodes currently configured in the cluster, at least floor(n) nodes [If n is an odd number, then floor(n) = 1 + ((n-1)/2). If n is an even number, then floor(n) = 1 + (n/2)] must be up for updates to be propagated. In this case, it is sufficient for ceiling(n) identical copies [If n is an odd number, then ceiling(n) = (n+1)/2 If n is an even number, then ceiling(n) = (n/2)] to be present to elect a valid database on a cluster restart. The valid CCD is then propagated to all cluster nodes that do not already have it.

Note that even if the CCD is invalid, a node is allowed to join the cluster. However, the CCD can be neither updated nor queried in this state. This implies that all components of the cluster framework that rely on the CCD remain in a dysfunctional state. In particular, logical hosts cannot be mastered and data services cannot be activated in this state. The CCD is enabled only after sufficient number of nodes join the cluster for quorum to be reached. Alternatively, an administrator can restore the CCD database with the maximum CCD generation number.

CCD quorum problems can be avoided if at least one or more nodes stay up during a reconfiguration. In this case, the valid copy on any of these nodes will be propagated to the newly joining nodes. Another alternative is to ensure that the cluster is started up on the node that has the most recent copy of the CCD database. Nevertheless, it is quite possible that after a system crash while a database update was in progress, the recovery algorithm finds inconsistent CCD copies. In such cases, it is the responsibility of the administrator to restore the database using the restore option to the ccdadm(1M) command (see the man page for details). The CCD also provides a checkpoint facility to back up the current contents of the database. It is good practice to make a backup copy of the CCD database after any change to system configuration. The backup copy can subsequently be used to restore the database. The CCD is very small compared to conventional relational databases and the backup and restore operations take no more than a few seconds to complete.

CCD Quorum in Two-Node Clusters

In the case of two-node clusters, the previously discussed quorum majority rule would require both nodes to be cluster members for updates to succeed, which is too restrictive. On the other hand, if updates are allowed in this configuration while only one node is up, the database must be made consistent manually before the cluster is restarted. This can be accomplished by either restarting the node with the most recent copy first, or by restoring the database with the ccdadm(1M) restore operation after both nodes have joined the cluster. In the latter case, even though both nodes are able to join the cluster, the CCD will be in an invalid state until the restore operation is complete.

This problem is solved by configuring persistent storage for the database on a shared disk device. The shared copy is used only when a single node is active. When the second node joins, the shared CCD is copied to each node.

Whenever a node leaves the cluster, the shared CCD is reactivated by copying the local CCD into the shared. In this way, updates are enabled only when a single node is in the cluster membership. This also ensures reliable propagation of updates across cluster restarts.

The downside of using a shared storage device for the shared copy of the CCD is that two disks must be allocated exclusively for this purpose, because the volume manager precludes the use of these disks for any other purpose. The use of the two disks can be avoided if some application downtime as described above can be tolerated in a production environment.

Similar to the Sun Cluster 2.2 integration issues with the CMM quorum, a shared CCD is not supported in all Sun Cluster configurations. If Solstice DiskSuite is the volume manager, the shared CCD is not supported. Because the shared CCD is only used when one node is active, the failure addressed by the shared CCD is not common.