The Cluster Coordinator ensures that the nodes (represented by the Dgraph processes) in the cluster provide query processing that is stable in the face of individual follower node failures. This topic discusses cluster behavior in various scenarios, such as cluster startup, updates to the data files, and response to a node failure.
One node for which you do not specify that it must be a follower is the leader; you identify all other nodes as follower nodes.
Follower nodes refuse all updating web service and HTTP requests (such as admin?op=updateaspell) with a 403 HTTP status code (forbidden).
In a cluster, updates to the records in the data files (or any other updates) are sent only to the Oracle Endeca Server that is hosting the leader node Dgraph process.
The leader node processes the update and commits it to the on-disk data files. The Cluster Coordinator informs all follower nodes that a new version of data files is available. The leader node and all follower nodes can continue to use files from the previous version of the data files to finish query processing that had started against that version.
As each node finishes processing queries on the previous version, it releases references to it. Once the follower nodes are notified of the new version, they acquire read-only access to it and start using it.
It is recommended to wrap updates in the cluster in an outer transaction operation, although this is optional.
For updates to resume, the Dgraph process on the leader node must be restarted. If the Dgraph process is running as a service, you can set it to restart automatically. Once restarted, the leader node joins the cluster and becomes operational, and the updates start to propagate from this node to all other nodes.
If a network connection fails between the nodes in the cluster that connect to the Cluster Coordinator service, the Dgraph process on those nodes will shut down.
A node will rejoin the cluster once the Dgraph process on the leader node is restarted (this will happen automatically if it is run as a service) and is able to establish a connection with the Cluster Coordinator service.
The Endeca Cluster Coordinator service cannot be configured to run as a Windows service. This means that if you are using the Endeca Clustering feature and run the Endeca Server as a Windows service, you should closely monitor the state of the Cluster Coordinator service.
The reason is that if the Endeca Server service crashes, the Windows Service Controller will automatically restart it. However, if the Cluster Coordinator service crashes, then it is not automatically restarted. This can lead to a situation where the Dgraph processes are running but the Cluster Coordinator service is not.