5.6.2 Cluster Heartbeat Channel

The Cluster Heartbeat channel is used to verify if the Oracle VM Servers in a clustered server pool are up and running. The heartbeat function has a network component, where a TCP/IP communication channel is created with each Oracle VM Server. Each Oracle VM Server sends regular keep-alive packets and these packets are used to determine if each Oracle VM Server is alive.

Important

It is recommended to separate the Cluster Heartbeat function from networks with high load, such as Storage and Live Migrate networks. If bandwidth drops too low, heartbeating connectivity might be interrupted, which could lead to rebooting of virtual machines and Oracle VM Servers.

Oracle VM uses OCFS2 as its underlying clustering file system to manage its storage repositories and provide access to shared storage.

A cluster heartbeat is an essential component in any OCFS2 cluster. It is charged with accurately designating nodes (in this case, nodes are Oracle VM Servers) as dead or alive. There are two types of heartbeats used in OCFS2:

  • The disk heartbeat where all the Oracle VM Servers in the cluster write a time stamp to the server pool file system device. See Section 3.8, “How is Storage Used for Server Pool Clustering?” for more information on this part of the clustering technology.

  • The network heartbeat which is where the Oracle VM Servers communicate through the network to signal to each other that every cluster member is alive.

The quorum is the group of Oracle VM Servers in a cluster that is allowed to operate on the shared storage. When there is a failure in the cluster, Oracle VM Servers may be split into groups that can communicate within their groups and with the shared storage, but not between groups. In this case, OCFS2 determines which group is allowed to continue and initiates fencing of the other group(s). Fencing is the act of forcefully removing an Oracle VM Server from a cluster. An Oracle VM Server with OCFS2 mounted will fence itself when it realizes that it does not have quorum in a degraded cluster. It does this so that other Oracle VM Servers are not stuck trying to access the cluster's resources. When an Oracle VM Server is fenced, it is rebooted and rejoins the cluster. If an Oracle VM Server is fenced, the virtual machines running on the fenced Oracle VM Server are migrated and restarted on other Oracle VM Servers if the virtual machines are HA enabled (virtual machines that are not HA enabled are not migrated).

The cluster heartbeat is sensitive to network interruptions and therefore the Cluster Heartbeat network should be given special attention and be treated separately to make sure that:

  • It is not sharing the same links with high traffic networks or networks that may experience sudden traffic spikes like the Storage or Live Migrate networks.

  • It has redundancy using a bond which ensures continued operation if one network path fails. See Section 5.4, “How is Network Bonding Used in Oracle VM?” for more information on configuring bonding.