Data Synchronization Scenarios

Use this chapter to perform cluster maintenance on SDM servers, which includes information about how the automatic cluster recovery process works and the different methods that can be employed to maintain SDM cluster operation.

A cluster can operate as a two-node or multi-node cluster. The master node contains the database for configuration data and no replication of large configurations is required. The failure of a node does not effect on-demand retrieval of configuration from devices serviced by other nodes.

A two-node cluster has one master node and one replica node. The following recovery scenarios can occur when one node fails:

If the master node fails in a two-node cluster, the remaining replica node becomes master. The non-operational tries to recover and rejoin the cluster. If the node is non-operational for more than 24 hours, the cluster needs to be manually restored.
If network connectivity is lost between the two active nodes in a two-node cluster, a network partition occurs that causes both members to become masters. Once network connectivity is re-established and the network partition is resolved, the Berkley database elects one node master and the other node shuts down and restarts as the replica node.

A multi-node cluster has one master node that has multiple replicas. If a master database failure occurs in a cluster with multiple replicas, re-election among the replicated database occurs and a new master database is elected. Transactions are successful on a three or more node cluster if a quorum of replies from replicas is achieved only to guarantee that the data exists on more than the master database after the transaction completes. If a quorum is not met, then the transaction fails.

Message events and data is distributed in the cluster through the MOM, which is based on a store and forward process and guarantees message delivery by storing the message in a local database first before declaring that the message was properly processed. In a MOM cluster, there is no master node because all MOM brokers that participate in the cluster ensure that messages are synchronized in the cluster. Durable subscribers ensure that even if a node leaves the cluster and reenters within a 24 hour period, missed messages are re-delivered. Tasks entered in a queue are processed even if the host where the task was originally submitted goes down.