Appendix A. Managing a Failure of the Majority

Normal operation of JE HA requires that at least a simple majority of electable nodes be available to form a quorum for election of a new Master, or when committing a transaction with default durability requirements. The number of electable nodes (the Electable Group Size) is obtained from persistent internal metadata that is stored in the environment and replicated across all members. See Replication Group Life Cycle for details.

Under exceptional circumstances, a simple majority of electable nodes may become unavailable for some period of time. With only a minority of electable nodes available, the overall availability of the group can be adversely affected. For example, the group may be unavailable for writes because a master cannot be elected. Also, the Master may be unable to satisfy the durability requirements for a transaction commit. The group may also be unavailable for reads, because the absence of a Master might cause a Replica to be unable to meet consistency requirements.

To deal with this exceptional circumstance — especially if the situation is likely to persist for an unacceptably long period of time — JE HA provides a mechanism by which you can modify the way in which the number of electable nodes, and consequently the quorum requirements for elections and commit acknowledgments, is calculated. The escape mechanism provides a way to override the normal computation of the Electable Group Size. The override is accomplished by specifying the size using the mutable replication configuration parameter ELECTABLE_GROUP_SIZE_OVERRIDE.

Note

You should use this parameter sparingly, if at all. Overriding your Electable Group Size can have the consequence of allowing your replication group's election participants to elect two Masters simultaneously. This is especially likely to occur if a majority of the nodes are unavailable due to a network partition event, and so all nodes are running but are simply not communicating with one another.

Be very cautious when using this configuration option.

Overriding the Electable Group Size

When you set ELECTABLE_GROUP_SIZE_OVERRIDE to a non-zero value, the number that you provide identifies the number of electable nodes that are required to meet quorum requirements. This means that the internally stored Electable Group Size value is ignored (but not changed) when this option is non-zero. By setting ELECTABLE_GROUP_SIZE_OVERRIDE to the number of electable nodes known to be available, the remaining replication group participants can make forward progress, both in terms of electing a new Master (if this is required) and in terms of meeting durability and consistency requirements.

When this option is zero (0), then the node will behave normally, and the internal Electable Group Size is honored by the node. This is the default value and behavior.

Setting the Override

To override the internal Electable Group Size value:

  1. Verify that the simple majority of electable nodes are in fact down and cannot elect their own independent Master.

  2. Set ELECTABLE_GROUP_SIZE_OVERRIDE to the number of electable nodes known to be available. For best results, set this override on all available electable nodes.

    It might be sufficient to set ELECTABLE_GROUP_SIZE_OVERRIDE on just one electable node in order to hold an election, because the proposer at that one node can conclude the election. However, if the election results in Master that is not configured with this override, it might result in InsufficientAcksExceptions at the Master. So, again, set the override on all available electable nodes.

Having set the override, the available electable members of the replication group can now meet quorum requirements.

Restoring the Default State

Having restored the group to a functioning state by use of the ELECTABLE_GROUP_SIZE_OVERRIDE override, it is desirable to return the group to its normal state as soon as possible. The normal operating state is one where the Electable Group Size is maintained by JE HA, and the override is no longer used.

To restore the group to its normal operational state, do one of the following:

  • Remove from the group any electable nodes that you know will be down for an extended period of time. Remove the nodes using the ReplicationGroupAdmin.removeMember() API.

  • Bring up electable nodes as they once again come on line, so that they can join the functioning group. This must be done carefully one node at a time in order to avoid the small possibility that a majority of the downed nodes hold an election amongst themselves and elect a second Master.

  • Perform some combination of node removal and bringing up nodes which were previously down.

As soon as there is a sufficient number of electable nodes up and running that election quorum requirements can be met in the absence of the override, the override can be removed, and normal HA operations resumed.

Override Example

Consider a group consisting of 5 electable nodes: n1-n5. Suppose a simple majority of the nodes (n3-n5) have become unavailable.

If one of the nodes in n3-n5 was the Master, then nodes n1 and n2 will try to hold an election, and fail due to the lack of a quorum. We now carry out the steps described, above:

  1. Verify that n3-n5 are down.

  2. Set ELECTABLE_GROUP_SIZE_OVERRIDE to 2. Do this at both n1 and n2. You can do this dynamically using JConsole, or by setting the property in the je.properties file and restarting the node.

  3. n1 and n2 will choose a new Master, say, n1. n1 can now process write operations, and n2 can acknowledge transaction commits.

  4. Suppose that n3 is now repaired. You can bring it back online and it will automatically locate the new Master and join the group. As is normal, it will catch up to n1 and n2 in the replication stream, and then begin acknowledging commits as requested by n1.

  5. We now have three electable nodes that are operational. Because we have a true simple majority of electable nodes available, we can now reset ELECTABLE_GROUP_SIZE_OVERRIDE to 0 (do this on n1 and n2), which causes the replication group to resume normal operations. Note that n1 remains the Master.

If n2 was the Master at the time of the failure, then the situation is similar, except that an election is not held. In this case, n2 will continue to remain the Master throughout the entire process described above. However, n2 might not be able to meet quorum requirements for transaction commits until step 2 (above) is performed.