Sun Cluster System Administration Guide for Solaris OS

ProcedureHow to Recover EMC SRDF Data after a Primary Room's Complete Failure

This procedure performs data recovery when a campus cluster's primary room fails completely, the primary room fails over to a secondary room, and then the primary room comes back online. The campus cluster's primary room is the primary node and storage site. The complete failure of a room includes the failure of both the host and the storage in that room. If the primary room fails, Sun Cluster automatically fails over to the secondary room, makes the secondary room's storage device readable and writable, and enables the failover of the corresponding device groups and resource groups.

When the primary room returns online, you can manually recover the data from the SRDF device group that was written to the secondary room and resynchronize the data. This procedure recovers the SRDF device group by synchronizing the data from the original secondary room (this procedure uses phys-campus-2 for the secondary room) to the original primary room (phys-campus-1). The procedure also changes the SRDF device group type to RDF1 on phys-campus-2 and to RDF2 on phys-campus-1.

Before You Begin

You must configure the EMC replication group and DID devices, as well as register the EMC replication group before you can perform a manual failover. For information about creating a Solaris Volume Manager device group, see How to Add and Register a Device Group (Solaris Volume Manager). For information about creating a Veritas Volume Manager device group, see How to Create a New Disk Group When Encapsulating Disks (Veritas Volume Manager).


Note –

These instructions demonstrate one method you can use to manually recover SRDF data after the primary room fails over completely and then comes back online. Check the EMC documentation for additional methods.


Log into the campus cluster's primary room to perform these steps. In the procedure below, dg1 is the SRDF device group name. At the time of the failure, the primary room in this procedure is phys-campus-1 and the secondary room is phys-campus-2.

  1. Log into the campus cluster's primary room and become superuser or assume a role that provides solaris.cluster.modify RBAC authorization.

  2. From the primary room, use the symrdf command to query the replication status of the RDF devices and view information about those devices.


    phys-campus-1# symrdf -g dg1 query
    

    Tip –

    A device group that is in the split state is not synchronized.


  3. If the RDF pair state is split and the device group type is RDF1, then force a failover of the SRDF device group.


    phys-campus-1# symrdf -g dg1 -force failover
    
  4. View the status of the RDF devices.


    phys-campus-1# symrdf -g dg1 query
    
  5. After the failover, you can swap the data on the RDF devices that failed over.


    phys-campus-1# symrdf -g dg1 swap
    
  6. Verify the status and other information about the RDF devices.


    phys-campus-1# symrdf -g dg1 query
    
  7. Establish the SRDF device group in the primary room.


    phys-campus-1# symrdf -g dg1 establish
    
  8. Confirm that the device group is in a synchronized state and that the device group type is RDF2.


    phys-campus-1# symrdf -g dg1 query
    

Example 5–20 Manually Recovering EMC SRDF Data after a Primary Site Failover

This example provides the Sun Cluster-specific steps necessary to manually recover EMC SRDF data after a campus cluster's primary room fails over, a secondary room takes over and records data, and then the primary room comes back online. In the example, the SRDF device group is called dg1 and the standard logical device is DEV001. The primary room is phys-campus-1 at the time of the failure, and the secondary room is phys-campus-2. Perform the steps from the campus cluster's primary room, phys-campus-1.


phys-campus-1# symrdf -g dg1 query | grep DEV
DEV001 0012RW  0  0NR 0012RW  2031   O S.. Split

phys-campus-1# symdg list | grep RDF
dg1 RDF1  Yes  00187990182  1  0  0  0  0

phys-campus-1# symrdf -g dg1 -force failover
...

phys-campus-1# symrdf -g dg1 query | grep DEV
DEV001  0012  WD  0  0 NR 0012 RW  2031  O S..  Failed Over

phys-campus-1# symdg list | grep RDF
dg1  RDF1  Yes  00187990182  1  0  0  0  0

phys-campus-1# symrdf -g dg1 swap
...

phys-campus-1# symrdf -g dg1 query | grep DEV
DEV001  0012 WD  0  0 NR 0012 RW  0  2031 S.. Suspended

phys-campus-1# symdg list | grep RDF
dg1  RDF2  Yes  000187990182  1  0  0  0  0

phys-campus-1# symrdf -g dg1 establish
...

phys-campus-1# symrdf -g dg1 query | grep DEV
DEV001  0012 WD  0  0 RW 0012 RW  0  0 S.. Synchronized

phys-campus-1# symdg list | grep RDF
dg1  RDF2  Yes  000187990182  1  0  0  0  0