Recovering From a Switchover Failure on a System That Uses Hitachi TrueCopy or Universal Replicator Replication

Language:

This section describes the initial conditions that lead to a switchover failure and how to recover from a switchover failure.

Overview of Recovering Services After a Switchover

When you run the geopg switchover command, the horctakeover command runs at the Hitachi TrueCopy or Universal Replicator data replication level. If the horctakeover command returns a value of 1, the switchover is successful.

In Hitachi TrueCopy and Universal Replicator terminology, a switchover is called a swap-takeover. In some cases, the horctakeover command might not be able to perform a swap-takeover. In these cases, a return value other than 1 is returned, which is considered a switchover failure.

Note - In a failure, the horctakeover command usually returns a value of 5, which indicates a SVOL-SSUS-takeover.

One reason the horctakeover command might fail to perform a swap-takeover is because the data replication link, ESCON/FC, is down.

Any result other than a swap-takeover implies that the secondary volumes might not be fully synchronized with the primary volumes. The Geographic Edition framework does not start the applications on the new intended primary cluster in a switchover failure scenario.

Switchover Failure Conditions

This section describes a switchover failure scenario. In this scenario, cluster-paris is the original primary cluster and cluster-newyork is the original secondary cluster.

A switchover switches the services from cluster-paris to cluster-newyork as follows:

phys-newyork-1# geopg switchover -f -m cluster-newyork hdspg

While processing the geopg switchover command, the horctakeover command performs an SVOL-SSUS-takeover and returns a value of 5 for the Hitachi TrueCopy or Universal Replicator data replication component, devgroup1. As a result, the geopg switchover command returns with the following failure message:

Processing operation.... this may take a while ....
"Switchover" failed for the following reason:
   Switchover failed for Truecopy DG devgroup1

After this failure message has been issued, the two clusters are in the following states:

cluster-paris:
  hdspg role: Secondary
cluster-newyork:
  hdspg role: Secondary

phys-newyork-1# pairdisplay -g devgroup1 -fc
Group  PairVol(L/R) (Port#,TID,LU),Seq#,LDEV#.P/S, Status,Fence,%, P-LDEV# M 
devgroup1 pair1(L) (CL1-C , 0, 20)12345 609..S-VOL SSWS  ASYNC,100   1    -
devgroup1 pair1(R) (CL1-A , 0, 1) 54321   1..P-VOL PSUS  ASYNC,100  609   -

Recovering From Switchover Failure

This section describes procedures to recover from the failure scenario described in the previous section. These procedures bring the application online on the appropriate cluster.

Place the Hitachi TrueCopy or Universal Replicator data replication component, devgroup1, in the SMPL state.

Use the pairsplit commands to place the data replication components that are in the protection group on both cluster-paris and cluster-newyork in the SMPL state. For the pair states that are shown in the previous section, run the following pairsplit commands:
```
phys-newyork-1# pairsplit -R -g devgroup1
phys-newyork-1# pairsplit -S -g devgroup1
```

Designate one of the clusters Primary for the protection group. Follow procedures in Migrating Replication Services by Switching Over Protection Groups in Oracle Solaris Cluster 4.3 Geographic Edition System Administration Guide.

Designate the original primary cluster, cluster-paris, Primary for the protection group if you intend to start the application on the original primary cluster. The application uses the current data on the original primary cluster.
Designate the original secondary cluster, cluster-newyork, Primary for the protection group if you intend to start the application on the original secondary cluster. The application uses the current data on the original secondary cluster.

Caution - Because the horctakeover command did not perform a swap-takeover, the data volumes on cluster-newyork might not be synchronized with the data volumes on cluster-paris. If you intend to start the application with the same data that appears on the original primary cluster, you must not make the original secondary cluster Primary.