Sun Cluster Geographic Edition System Administration Guide

Overview of Disaster Recovery Administration

This section provides an example of a disaster recovery scenario and actions an administrator might perform.

Company X has two geographically separated clusters, cluster-paris in Paris, and cluster-newyork in New York. These clusters are configured as partner clusters. The cluster in Paris is configured as the primary cluster and the cluster in New York is the secondary.

The cluster-paris cluster fails temporarily as a result of power outages during a windstorm. For an administrator, the following events occur:

The heartbeat communication is lost between cluster-paris and cluster-newyork. Because heartbeat notification was configured during the creation of the partnership, a heartbeat-loss notification email is sent to the administrator.

For information about the configuring partnerships and heartbeat notification, see Creating and Modifying a Partnership.
The administrator receives the notification email and follows the company procedure to verify the disconnect occurred because of a situation that requires a takeover by the secondary cluster. Because a takeover is expensive, Company X does not allow takeovers unless the primary cluster cannot be repaired within two hours.

For information about verifying a disconnect on a system that uses Sun StorEdge Availability Suite 3.2.1, see Detecting Cluster Failure on a System That Uses Sun StorEdge Availability Suite 3.2.1 Data Replication.

For information about verifying a disconnect on a system that uses Hitachi TrueCopy, see Detecting Cluster Failure on a System That Uses Hitachi TrueCopy Data Replication.
Because the cluster-paris cluster cannot be brought online again for at least another day, the administrator executes a geopg takeover command on a New York node, which starts the protection group on the secondary cluster cluster-newyork in New York.

For information about performing a takeover on a system that uses Sun StorEdge Availability Suite 3.2.1 data replication, see Forcing a Takeover on Systems That Use Sun StorEdge Availability Suite 3.2.1. For information about performing a takeover on a system that uses Hitachi TrueCopy data replication, see Forcing a Takeover on a System That Uses Hitachi TrueCopy Data Replication.
After the takeover, the secondary cluster cluster-newyork becomes the new primary cluster. The failed cluster in Paris is still configured to be primary, so when cluster-paris restarts, the cluster detects that it was down and lost contact with the partner cluster. Then, cluster-paris enters an error state that requires administrative action to repair. The cluster might also need to recover and resynchronize data.

For information about recovering data after a takeover on a system that uses Sun StorEdge Availability Suite 3.2.1 data replication, see Recovering Sun StorEdge Availability Suite 3.2.1 Data After a Takeover. For information about performing a takeover on a system that uses Hitachi TrueCopy data replication, see Failback of Services to the Original Primary Cluster on a System That Uses Hitachi TrueCopy Replication.