Recovering From a Switchover Failure on a System That Uses Hitachi TrueCopy Replication

When you run the geopg switchover command, the horctakeover command runs at the Hitachi TrueCopy data replication level. If the horctakeover command returns a value of 1, the switchover is successful.

In Hitachi TrueCopy terminology, a switchover is called a swap-takeover. In some cases, the horctakeover command might not be able to perform a swap-takeover. In these cases, a return value other than 1 is returned, which is considered a switchover failure.

Note –

In a failure, the horctakeover command usually returns a value of 5, which indicates a SVOL-SSUS-takeover.

One reason the horctakeover command might fail to perform a swap-takeover is because the data replication link, ESCON/FC, is down.

Any result other than a swap-takeover implies that the secondary volumes might not be fully synchronized with the primary volumes. Sun Cluster Geographic Edition software does not start the applications on the new intended primary cluster in a switchover failure scenario.

The remainder of this section describes the initial conditions that lead to a switchover failure and how to recover from a switchover failure.

Switchover Failure Conditions

This section describes a switchover failure scenario. In this scenario, cluster-paris is the original primary cluster and cluster-newyork is the original secondary cluster.

A switchover switches the services from cluster-paris to cluster-newyork as follows:

phys-newyork-1# geopg switchover -f -m cluster-newyork tcpg

While processing the geopg switchover command, the horctakeover command performs an SVOL-SSUS-takeover and returns a value of 5 for the Hitachi TrueCopy device group, devgroup1. As a result, the geopg switchover command returns with the following failure message:

Processing operation.... this may take a while ....
"Switchover" failed for the following reason:
			Switchover failed for Truecopy DG devgroup1

After this failure message has been issued, the two clusters are in the following states:

cluster-paris:
		tcpg role: Secondary
cluster-newyork:
		tcpg role: Secondary

phys-newyork-1# pairdisplay -g devgroup1 -fc 
Group  PairVol(L/R) (Port#,TID,LU),Seq#,LDEV#.P/S, Status,Fence,%, P-LDEV# M 
devgroup1 pair1(L) (CL1-C , 0, 20)12345 609..S-VOL SSWS  ASYNC,100   1    -
devgroup1 pair1(R) (CL1-A , 0, 1) 54321   1..P-VOL PSUS  ASYNC,100  609   -

Recovering From Switchover Failure

This section describes procedures to recover from the failure scenario described in the previous section. These procedures bring the application online on the appropriate cluster.

Place the Hitachi TrueCopy device group, devgroup1, in the SMPL state.

Use the pairsplit commands to place the device groups that are in the protection group on both cluster-paris and cluster-newyork in the SMPL state. For the pair states that are shown in the previous section, run the following pairsplit commands:
phys-newyork-1# pairsplit -R -g devgroup1 phys-newyork-1# pairsplit -S -g devgroup1
Designate one of the clusters Primary for the protection group.

Designate the original primary cluster, cluster-paris, Primary for the protection group if you intend to start the application on the original primary cluster. The application uses the current data on the original primary cluster.

Designate the original secondary cluster, cluster-newyork, Primary for the protection group if you intend to start the application on the original secondary cluster. The application uses the current data on the original secondary cluster.

Caution –
Because the horctakeover command did not perform a swap-takeover, the data volumes on cluster-newyork might not be synchronized with the data volumes on cluster-paris. If you intend to start the application with the same data that appears on the original primary cluster, you must not make the original secondary cluster Primary.

How to Make the Original Primary Cluster Primary for a Hitachi TrueCopy Protection Group

Deactivate the protection group on the original primary cluster.
phys-paris-1# geopg stop -e Local tcpg

Resynchronize the configuration of the protection group.

This command updates the configuration of the protection group on cluster-paris with the configuration information of the protection group on cluster-newyork.
phys-paris-1# geopg update tcpg
After the geopg update command completes successfully, tcpg has the following role on each cluster:
cluster-paris: tcpg role: Primary cluster-newyork: tcpg role: secondary

Activate the protection group on both clusters in the partnership.
phys-paris-1# geopg start -e Global tcpg
This command starts the application on cluster-paris. Data replication starts from cluster-paris to cluster-newyork.

How to Make the Original Secondary Cluster Primary for a Hitachi TrueCopy Protection Group

Resynchronize the configuration of the protection group.

This command updates the configuration of the protection group on cluster-newyork with the configuration information of the protection group on cluster-paris.
phys-newyork-1# geopg update tcpg
After the geopg update command completes successfully, tcpg has the following role on each cluster:
cluster-paris: tcpg role: Secondary cluster-newyork: tcpg role: Primary

Activate the protection group on both clusters in the partnership.
phys-newyork-1# geopg start -e Global tcpg
This command starts the application on cluster-newyork. Data replication starts from cluster-newyork to cluster-paris.

Caution –
This command overwrites the data on cluster-paris.