Disaster Recovery Administration Example

Language:

This appendix provides an example of a disaster recovery scenario and the actions an administrator might perform.

Example Company has two geographically separated clusters, cluster-paris in Paris, and cluster-newyork in New York. These clusters are configured as partner clusters. The cluster in Paris is configured as the primary cluster and the cluster in New York is the secondary.

The cluster-paris cluster fails temporarily as a result of power outages during a windstorm. An administrator can expect the following events:

The heartbeat communication is lost between cluster-paris and cluster-newyork. Because heartbeat notification was configured during the creation of the partnership, a heartbeat-loss notification email is sent to the administrator.

For information about the configuring partnerships and heartbeat notification, see Modifying Partnership Properties.
The administrator receives the notification email and follows the company procedure to verify that the disconnect occurred because of a situation that requires a takeover by the secondary cluster. Because a takeover might take a long time, depending on the requirements of the applications being protected, Example Company does not allow takeovers unless the primary cluster cannot be repaired within two hours.

For information about verifying a disconnect on a system, see Detecting Cluster Failure.
Because the cluster-paris cluster cannot be brought online again for at least another day, the administrator runs a geopg takeover command on a node in the cluster in New York. This command starts the protection group on the secondary cluster cluster-newyork in New York.

For information about performing a takeover on a system, see Forcing a Takeover of a Protection Group.
After the takeover, the secondary cluster cluster-newyork becomes the new primary cluster. The failed cluster in Paris is still configured to be the primary cluster. Therefore, when the cluster-paris cluster restarts, the cluster detects that the primary cluster was down and lost contact with the partner cluster. Then, the cluster-paris cluster enters an error state that requires administrative action to clear. You might also be required to recover and resynchronize data on the cluster.

For information about recovering data after a takeover, see the disaster recovery framework guide for your data replication product.