Sun Cluster Geographic Edition Data Replication Guide for Hitachi TrueCopy and Universal Replicator

Recovering From a Hitachi TrueCopy or Universal Replicator Data Replication Error

When an error occurs at the data replication level, the error is reflected in the status of the resource in the replication resource group of the relevant device group.

This section provides the following information:

How to Detect Data Replication Errors

For information about how different Resource status values map to actual replication pair states, see Table 2–6.

You can check the status of the replication resources by using the clresource command as follows:


phys-paris-1# clresource status -v

Running the clresource status command might return the following:


=== Cluster Resources ===

Resource Name          de Name        State      Status Message
-------------            ---------    -----      --------------
r-tc-tcpg1-devgroup1   phys-paris-2   Offline    Offline
                       phys-paris-1   Online     Faulted - P-VOL:PSUE

hasp4nfs               phys-paris-2   Offline    Offline
                       phys-paris-1   Offline    Offline

The aggregate resource status for all device groups in the protection group is provided by using the geoadm status command. For example, the output of the clresource status command in the preceding example indicates that the Hitachi TrueCopy or Universal Replicator device group, devgroup1, is in the PSUE state on cluster-paris. Table 2–6 indicates that the PSUE state corresponds to a resource status of FAULTED. So, the data replication state of the protection group is also FAULTED. This state is reflected in the output of the geoadm status command, which displays the state of the protection group as Error.


phys-paris-1# geoadm status
Cluster: cluster-paris

Partnership "paris-newyork-ps"  : OK
   Partner clusters             : cluster-newyork
   Synchronization              : OK      
   ICRM Connection              : OK

   Heartbeat "paris-to-newyork" monitoring "cluster-newyork": OK 
      Heartbeat plug-in "ping_plugin"             : Inactive
      Heartbeat plug-in "tcp_udp_plugin"          : OK

Protection group "tcpg"   : Error
      Partnership         : paris-newyork-ps
      Synchronization     : OK

      Cluster cluster-paris    : Error
         Role                  : Primary
         PG activation state   : Activated
         Configuration         : OK
         Data replication      : Error
         Resource groups       : OK 
   
      Cluster cluster-newyork  : Error
         Role                  : Secondary
         PG activation state   : Activated
         Configuration         : OK
         Data replication      : Error
         Resource groups       : OK

Pending Operations
      Protection Group         : "tcpg"
      Operations               : start

ProcedureHow to Recover From a Hitachi TrueCopy or Universal Replicator Data Replication Error

To recover from an error state, you might perform some or all of the steps in the following procedure.

  1. Use the procedures in the Hitachi TrueCopy or Universal Replicator documentation to determine the causes of the FAULTED state. This state is indicated as PSUE.

  2. Recover from the faulted state by using the Hitachi TrueCopy or Universal Replicator procedures.

    If the recovery procedures change the state of the device group, this state is automatically detected by the resource and is reported as a new protection group state.

  3. Revalidate the protection group configuration.


    phys-paris-1# geopg validate protectiongroupname 
    
    protectiongroupname

    Specifies the name of the Hitachi TrueCopy or Universal Replicator protection group

  4. Review the status of the protection group configuration.


    phys-paris-1# geopg list protectiongroupname 
    
    protectiongroupname

    Specifies the name of the Hitachi TrueCopy or Universal Replicator protection group

  5. Review the runtime status of the protection group.


    phys-paris-1# geoadm status