JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Solaris Cluster Geographic Edition Data Replication Guide for EMC Symmetrix Remote Data Facility     Oracle Solaris Cluster 4.1
search filter icon
search icon

Document Information

Preface

1.  Replicating Data With EMC Symmetrix Remote Data Facility Software

2.  Administering SRDF Protection Groups

3.  Migrating Services That Use SRDF Data Replication

Detecting Cluster Failure on a System That Uses SRDF Data Replication

Detecting Primary Cluster Failure

Detecting Secondary Cluster Failure

Migrating Services That Use SRDF Data Replication With a Switchover

Validations That Occur Before a Switchover

Results of a Switchover From a Replication Perspective

How to Switch Over an SRDF Protection Group From Primary to Secondary

Forcing a Takeover on a System That Uses SRDF Data Replication

Validations That Occur Before a Takeover

Results of a Takeover From a Replication Perspective

How to Force Immediate Takeover of SRDF Services by a Secondary Cluster

Recovering Services to a Cluster on a System That Uses SRDF Replication

How to Resynchronize and Revalidate the Protection Group Configuration

How to Perform a Failback-Switchover on a System That Uses SRDF Replication

How to Perform a Failback-Takeover on a System That Uses SRDF Replication

Recovering From a Switchover Failure on a System That Uses SRDF Replication

Switchover Failure Conditions

Recovering From Switchover Failure

How to Make the Original Primary Cluster Primary for an SRDF Protection Group

How to Make the Original Secondary Cluster Primary for an SRDF Protection Group

Recovering From an SRDF Data Replication Error

How to Detect Data Replication Errors

How to Recover From an SRDF Data Replication Error

A.  Geographic Edition Properties for SRDF

Index

Recovering From an SRDF Data Replication Error

When an error occurs at the data replication level, the error is reflected in the status of the resource in the replication resource group of the relevant device group. This changed status appears in the Data Replication status field in the output of the geoadm status command for that protection group.

This section contains information about the following topics:

How to Detect Data Replication Errors

  1. Check the status of the replication resources by using the clresource status command.
    # clresource status -v sc_geo_dr-SRDF-pgname-dgname

    For information about how different Resource status values map to actual replication pair states, see Table 2-4.

    Running the clresource status command might return the following:

    …
    -- Resources --
    
                Resource Name       Node Name           State     Status Message
                -------------       ---------           -----     --------------
      Resource: sc_geo_dr-SRDF-srdfpg-devgroup1 pemc1  Online    Online - Partitioned
      Resource: sc_geo_dr-SRDF-srdfpg-devgroup1 pemc2  Offline   Offline
    …
  2. Display the aggregate resource status for all device groups in the protection group by using the geoadm status command.

    For example, the output of the clresource status command in the preceding example indicates that the SRDF device group, devgroup1, is in the Suspended state on cluster-paris. Table 2-4 indicates that the Suspended state corresponds to a resource status of FAULTED. So, the data replication state of the protection group is also FAULTED. This state is reflected in the output of the geoadm status command, which displays the state of the protection group as Error.

    phys-paris-1# geoadm status
    Cluster: cluster-paris
    
    Partnership "paris-newyork-ps"  : OK
       Partner clusters             : cluster-newyork
       Synchronization              : OK      
       ICRM Connection              : OK
    
       Heartbeat "paris-to-newyork" monitoring "cluster-newyork": OK 
          Heartbeat plug-in "ping_plugin"             : Inactive
          Heartbeat plug-in "tcp_udp_plugin"          : OK
    
    Protection group "srdfpg"   : Error
          Partnership         : paris-newyork-ps
          Synchronization     : OK
    
          Cluster cluster-paris    : Error
             Role                  : Primary
             PG activation state   : Activated
             Configuration         : OK
             Data replication      : Error
             Resource groups       : OK 
       
          Cluster cluster-newyork  : Error
             Role                  : Secondary
             PG activation state   : Activated
             Configuration         : OK
             Data replication      : Error
             Resource groups       : OK
     

How to Recover From an SRDF Data Replication Error

To recover from an error state, you might perform some or all of the steps in the following procedure.

  1. Use the procedures in the SRDF documentation to determine the causes of the FAULTED state.
  2. Recover from the faulted state by using the SRDF procedures.

    If the recovery procedures change the state of the device group, this state is automatically detected by the resource and is reported as a new protection group state.

  3. Revalidate the protection group configuration.
    phys-paris-1# geopg validate protectiongroupname 
    protectiongroupname

    Specifies the name of the SRDF protection group

    If the geopg validate command determines if the configuration is valid, the state of the protection group changes to reflect that fact. If the configuration is not valid, geopg validate returns a failure message.

  4. Review the status of the protection group configuration.
    phys-paris-1# geopg list protectiongroupname 
    protectiongroupname

    Specifies the name of the SRDF protection group

  5. Review the runtime status of the protection group.
    phys-paris-1# geoadm status