Failure Fencing (Sun Cluster Overview for Solaris OS)

Sun Cluster Overview for Solaris OS

Failure Fencing

A major issue for clusters is a failure that causes the cluster to become partitioned (called split brain). When this situation occurs, not all nodes can communicate, so individual nodes or subsets of nodes might try to form individual or subset clusters. Each subset or partition might “believe” it has sole access and ownership to the multihost disks. Attempts by multiple nodes to write to the disks can result in data corruption.

Failure fencing limits node access to multihost disks by preventing access to the disks. When a node leaves the cluster (it either fails or becomes partitioned), failure fencing ensures that the node can no longer access the disks. Only current member nodes have access to the disks, ensuring data integrity.

The Sun Cluster system uses SCSI disk reservations to implement failure fencing. Using SCSI reservations, failed nodes are “fenced” away from the multihost disks, preventing them from accessing those disks.

When a cluster member detects that another node is no longer communicating over the cluster interconnect, it initiates a failure-fencing procedure to prevent the failed node from accessing shared disks. When this failure fencing occurs, the fenced node panics and a “reservation conflict” message is displayed on its console.