Sun Cluster Geographic Edition System Administration Guide

Recovering From a Switchover Failure on a System That Uses Hitachi TrueCopy Replication

When the geopg switchover command is executed, a horctakeover command is executed at the Hitachi TrueCopy data replication level. If the horctakeover command returns a value of 1, the switchover is successful.

In Hitachi TrueCopy terminology, a switchover is called a swap-takeover. In some cases, the horctakeover command may not be able to perform a swap-takeover. In these cases, a return value other than 1 is returned, which is considered a switchover failure.


Note –

In the case of failure, the horctakeover command usually returns a value of 5, which indicates a SVOL-SSUS-takeover.


One reason the horctakeover command might fail to do a swap-takeover is because the data replication link, ESCON/FC, is down.

Any result other than a swap-takeover implies that the secondary volumes might not be fully synchronized with the primary volumes. Sun Cluster Geographic Edition software does not bring up the applications on the new intended primary cluster in a switchover failure scenario.

The remainder of this section describes the initial conditions that lead up to a switchover failure and how to recover from a switchover failure.

Switchover Failure Conditions

This section describes an example switchover failure scenario. in this scenario, cluster-paris is the original primary cluster and cluster-newyork is the original secondary cluster.

A switchover is executed to switch the services from cluster-paris to cluster-newyork as follows:


phys-newyork-1# geopg switchover -f -m cluster-newyork tcpg

While processing the geopg switchover command, the horctakeover command performs a SVOL-SSUS-takeover and returns a value of 5 for the Hitachi TrueCopy device group, devgroup1. As a result, the geopg switchover command returns with the following failure message:


Processing operation.... this may take a while ....
"Switchover" failed for the following reason:
			Switchover failed for Truecopy DG devgroup1

After this failure message has been issued, the two clusters are in the following states:


cluster-paris:
		tcpg role: Secondary
cluster-newyork:
		tcpg role: Secondary

phys-newyork-1# pairdisplay -g devgroup1 -fc 
Group  PairVol(L/R) (Port#,TID,LU),Seq#,LDEV#.P/S, Status,Fence,%, P-LDEV# M 
devgroup1 pair1(L) (CL1-C , 0, 20)12345 609..S-VOL SSWS  ASYNC,100   1    -
devgroup1 pair1(R) (CL1-A , 0, 1) 54321   1..P-VOL PSUS  ASYNC,100  609   -

Recovering From Switchover Failure

This section describes procedures to recover from the failure scenario described in the previous section. These procedures will bring the application online on the appropriate cluster.

  1. Put the Hitachi TrueCopy device group, devgroup1, in the SMPL state.

    Use the pairsplit commands to put the device groups that are in the protection group on bothcluster-paris and cluster-newyork in the SMPL state. For the pair states shown in the previous section, the following pairsplit commands should be issued:


    phys-newyork-1# pairsplit -R -g devgroup1
    phys-newyork-1# pairsplit -S -g devgroup1
  2. Make one of the clusters Primary for the protection group.

    Make the original primary cluster, cluster-paris, Primary for the protection group if you intend to bring up the application on the original primary cluster. The application will use the current data on the original primary cluster.

    Make the original secondary cluster, cluster-newyork, Primary for the protection group if you intend to bring up the application on the original secondary cluster. The application will use the current data on the original secondary cluster.


    Caution – Caution –

    Because the horctakeover command did not perform a swap-takeover, the data volumes on cluster-newyork may not be synchronized with the data volumes on cluster-paris. If you intent to bring up the application with the same data as appears on the original primary cluster, you must not make the original secondary cluster Primary.


ProcedureHow to Make the Original Primary Cluster Primary for a Hitachi TrueCopy Protection Group

Steps
  1. Deactivate the protection group on the original primary cluster.


    phys-paris-1# geopg stop -e Local tcpg
  2. Resynchronize the configuration of the protection group.

    This command updates the configuration of the protection group on cluster-paris with the configuration information of the protection group on cluster-newyork.


    phys-paris-1# geopg update tcpg

    After the geopg update command has successfully executed, tcpg has the following role on each cluster:


    cluster-paris:
    		tcpg role: Primary
    cluster-newyork:
    		tcpg role: secondary
  3. Activate the protection group on both clusters in the partnership.


    phys-paris-1# geopg start -e Global tcpg

    This command brings up the application on cluster-paris. Data replication starts from cluster-paris to cluster-newyork.

ProcedureHow to Make the Original Secondary Cluster Primary for a Hitachi TrueCopy Protection Group

Steps
  1. Resynchronize the configuration of the protection group.

    This command updates the configuration of the protection group on cluster-newyork with the configuration information of the protection group on cluster-paris.


    phys-newyork-1# geopg update tcpg

    After the geopg update command has successfully executed, tcpg has the following role on each cluster:


    cluster-paris:
    		tcpg role: Secondary
    cluster-newyork:
    		tcpg role: Primary
  2. Activate the protection group on both clusters in the partnership.


    phys-newyork-1# geopg start -e Global tcpg

    This command brings up the application on cluster-newyork. Data replication starts from cluster-newyork to cluster-paris.


    Caution – Caution –

    This command will overwrite the data on cluster-paris.