Sun Cluster 3.0 Concepts

Quorum and Quorum Devices

Because cluster nodes share data and resources, the cluster must take steps to maintain data and resource integrity. When a node does not meet the cluster rules for membership, the cluster must disallow the node from participating in the cluster.

In Sun Cluster, the mechanism that determines node participation in the cluster is known as a quorum. Sun Cluster uses a majority voting algorithm to implement quorum. Both cluster nodes and quorum devices, which are disks that are shared between two or more nodes, vote to form quorum. A quorum device can contain user data.

The quorum algorithm operates dynamically: as cluster events trigger its calculations, the results of calculations can change over the lifetime of a cluster.Quorum protects against two potential cluster problems--split brain and amnesia--both of which can cause inconsistent data to be made available to clients. The following table describes these two problems and how quorum solves them.

Table 3-3 Cluster Quorum, and Split-Brain and Amnesia Problems

Problem 

Description 

Quorum's Solution 

Split brain 

Occurs when the cluster interconnect between nodes is lost and the cluster becomes partitioned into sub-clusters, each of which believes that it is the only partition 

Allows only the partition (sub-cluster) with a majority of votes to run as the cluster (where at most one partition can exist with such a majority) 

Amnesia 

Occurs when the cluster restarts after a shutdown with cluster data older than at the time of the shutdown 

Guarantees that when a cluster is booted, it has at least one node that was a member of the most recent cluster membership (and thus has the latest configuration data) 

Quorum Vote Counts

Both cluster nodes and quorum devices (disks that are shared between two or more nodes) vote to form quorum. By default, cluster nodes acquire a quorum vote count of one when they boot and become cluster members. Nodes can also have a vote count of zero, for example, when the node is being installed, or when an administrator has placed a node into maintenance state.

Quorum devices acquire quorum vote counts based on the number of node connections to the device. When a quorum device is set up, it acquires a maximum vote count of N-1 where N is the number of nodes with non zero vote counts that have ports to the quorum device. For example, a quorum device connected to two nodes with non zero vote counts has a quorum count of one (two minus one).

You configure quorum devices during the cluster installation, or later by using the procedures described in the Sun Cluster 3.0 System Administration Guide.


Note -

A quorum device contributes to the vote count only if at least one of the nodes to which it is currently attached is a cluster member. Also, during cluster boot, a quorum device contributes to the count only if at least one of the nodes to which it is currently attached is booting and was a member of the most recently booted cluster when it was shut down.


Quorum Configurations

Quorum configurations depend on the number of nodes in the cluster:

Figure 3-3 Quorum Device Configuration Examples

Graphic

Quorum Guidelines

Use the following guidelines when setting up quorum devices:


Tip -

Configure more than one quorum device between sets of nodes. Use disks from different enclosures, and configure an odd number of quorum devices between each set of nodes. This protects against individual quorum device failures.


Failure Fencing

A major issue for clusters is a failure that causes the cluster to become partitioned (called split brain). When this happens, not all nodes can communicate, so individual nodes or subsets of nodes might try to form individual or subset clusters. Each subset or partition might believe it has sole access and ownership to the multihost disks. Multiple nodes attempting to write to the disks can result in data corruption.

Failure fencing limits node access to multihost disks by physically preventing access to the disks. When a node leaves the cluster (it either fails or becomes partitioned), failure fencing ensures that the node can no longer access the disks. Only current member nodes have access to the disks, resulting in data integrity.

Disk device services provide failover capability for services that make use of multihost disks. When a cluster member currently serving as the primary (owner) of the disk device group fails or becomes unreachable, a new primary is chosen, enabling access to the disk device group to continue with only minor interruption. During this process, the old primary must give up access to the devices before the new primary can be started. However, when a member drops out of the cluster and becomes unreachable, the cluster cannot inform that node to release the devices for which it was the primary. Thus, you need a means to enable surviving members to take control of and access global devices from failed members.

Sun Cluster uses SCSI disk reservations to implement failure fencing. Using SCSI reservations, failed nodes are "fenced" away from the multihost disks, preventing them from accessing those disks.

SCSI-2 disk reservations support a form of reservations, which either grants access to all nodes attached to the disk (when no reservation is in place) or restricts access to a single node (the node that holds the reservation).

When a cluster member detects that another node is no longer communicating over the cluster interconnect, it initiates a failure fencing procedure to prevent the other node from accessing shared disks. When this failure fencing occurs, it is normal to have the fenced node panic with a "reservation conflict" messages on its console.

The reservation conflict occurs because after a node has been detected to no longer be a cluster member, a SCSI reservation is put on all of the disks that are shared between this node and other nodes. The fenced node might not be aware that it is being fenced and if it tries to access one of the shared disks, it detects the reservation and panics.