Cluster Synchronization Services (CSS) is the service that determines which nodes in the cluster are available, provides cluster group membership and simple locking services to other processes. CSS typically determines node availability via communication through the private network with a voting file used as a secondary communication mechanism. This is done by sending network heartbeat messages through the private network and writing a disk heartbeat to the voting file, as illustrated by the top graphic in the slide. The voting file resides on a clustered file system that is accessible to all nodes in the cluster. Its primary purpose is to help in situations where the private network communication fails. The voting file is then used to communicate the node state information to the other nodes in the cluster. Without the voting file, it can be difficult for isolated nodes to determine whether it is experiencing a network failure or whether the other nodes are no longer available. It would then be possible for the cluster to enter a state where multiple sub-clusters of nodes would have unsynchronized access to the same database files. The bottom graphic illustrates what happens when Node3 can no longer send network heartbeats to other members of the cluster. When other nodes can no longer see Node3's heartbeats, they decide to evict that node by using the voting disk. When Node3 reads the removal message or "kill block," it generally reboots itself to ensure that all outstanding write I/Os are lost.

A "split brain" condition is generally where one of the nodes cannot talk to other nodes via network but is still able to access voting disk, and can happen for any number of reasons. The primary causes are that a cluster node is unable to respond to the heartbeat requests from another node. This can be caused by network failure/interruptions, hardware failures, software failures, or resource starvation (probably the most common cause). There are many causes, but Oracle Clusterware has a very conservative design, to absolutely guarantee the integrity of the cluster and the data.