Sun Cluster Geographic Edition Data Replication Guide for Sun StorageTek Availability Suite

Chapter 3 Migrating Services That Use Sun StorageTek Availability Suite Data Replication

This chapter provides information about migrating services for maintenance or as a result of cluster failure. The chapter contains information about the following:

Detecting Cluster Failure on a System That Uses Sun StorageTek Availability Suite Data Replication

This section describes the internal processes that occur when failure is detected on a primary or a secondary cluster.

Detecting Primary Cluster Failure

When the primary cluster for a given protection group fails, the secondary cluster in the partnership detects the failure. The cluster that fails might be a member of more than one partnership, resulting in multiple failure detections.

The following actions occur when the overall state of a protection group changes to the Unknown state:

Detecting Secondary Cluster Failure

When a secondary cluster for a given protection group fails, a cluster in the same partnership detects the failure. The cluster that failed might be a member of more than one partnership, resulting in multiple failure detections.

During failure detection, the following actions occur:

Migrating Services That Use Sun StorageTek Availability Suite With a Switchover

You perform a switchover of a Sun StorageTek Availability Suite protection group when you want to migrate services to the partner cluster in an orderly fashion. A switchover consists of the following:

This section provides the following information:

ProcedureHow to Switch Over a Sun StorageTek Availability Suite Protection Group From Primary to Secondary

Before You Begin

For a switchover to occur, data replication must be active between the primary cluster and the secondary cluster. Additionally, the data volumes on the two clusters must be in a synchronized state.

Before you switch over a protection group from the primary cluster to the secondary cluster, ensure that the following conditions are met:

  1. Log in to a cluster node.

    You must be assigned the Geo Management RBAC rights profile to complete this procedure. For more information about RBAC, see Sun Cluster Geographic Edition Software and RBAC in Sun Cluster Geographic Edition System Administration Guide.

  2. Initiate the switchover.

    The application resource groups that are a part of the protection group are stopped and started during the switchover.


    # geopg  switchover [-f] -m newprimarycluster protectiongroupname 
    
    -f

    Forces the command to perform the operation without asking you for confirmation

    -m newprimarycluster

    Specifies the name of the cluster that is to be the primary cluster for the protection group

    protectiongroupname

    Specifies the name of the protection group


Example 3–1 Forcing a Switchover From Primary to Secondary

This example performs a switchover to the secondary cluster.


# geopg switchover -f -m cluster-newyork avspg

Actions Performed by the Sun Cluster Geographic Edition Software During a Switchover

When you run the geopg switchover command, the software confirms that the volume sets that are associated with the device groups are in the replicating state. Then, the software performs the following actions on the original primary cluster:

On the original secondary cluster, the command takes the following actions:

If the command completes successfully, the secondary cluster, cluster-newyork, becomes the new primary cluster for the protection group. The original primary cluster, cluster-paris, becomes the new secondary cluster. Volume sets associated with a device group of the protection group have their role reversed according to the role of the protection group on the local cluster. The application resource group is online on the new primary cluster. Data replication from the new primary cluster to the new secondary cluster begins.

This command returns an error if any of the previous operations fails. Run the geoadm status command to view the status of each component. For example, the Configuration status of the protection group might be set to Error, depending on the cause of the failure. The protection group might be activated or deactivated.

If the Configuration status of the protection group is set to Error, revalidate the protection group by using the procedures described in How to Validate a Sun StorageTek Availability Suite Protection Group.

If the configuration of the protection group is not the same on each partner cluster, you need to resynchronize the configuration by using the procedures described in How to Resynchronize a Sun StorageTek Availability Suite Protection Group.

Forcing a Takeover on Systems That Use Sun StorageTek Availability Suite

You perform a takeover when applications need to be brought online on the secondary cluster regardless of whether the data is completely consistent between the primary volume and the secondary volume. The information in this section assumes that the protection group has been started.

The following steps occur after a takeover is initiated:

For details about the possible conditions of the primary and secondary cluster before and after takeover, see Appendix C, Takeover Postconditions, in Sun Cluster Geographic Edition System Administration Guide.

The following procedures describe the steps you must perform to force a takeover by a secondary cluster, and how to recover data afterward.

ProcedureHow to Force Immediate Takeover of Sun StorageTek Availability Suite Services by a Secondary Cluster

Before You Begin

Before you force the secondary cluster to assume the activity of the primary cluster, ensure that the following conditions are met:

  1. Log in to a node in the secondary cluster.

    You must be assigned the Geo Management RBAC rights profile to complete this procedure. For more information about RBAC, see Sun Cluster Geographic Edition Software and RBAC in Sun Cluster Geographic Edition System Administration Guide.

  2. Initiate the takeover.


    # geopg takeover [-f] protectiongroupname
    
    -f

    Forces the command to perform the operation without your confirmation

    protectiongroupname

    Specifies the name of the protection group


Example 3–2 Forcing a Takeover by a Secondary Cluster

This example forces the takeover of avspg by the secondary cluster, cluster-newyork.

phys-newyork-1 is the first node of the secondary cluster. For a reminder of which node is phys-newyork-1, see Example Sun Cluster Geographic Edition Cluster Configuration in Sun Cluster Geographic Edition System Administration Guide.


phys-newyork-1# geopg takeover -f avspg

Next Steps

For information about the state of the primary and secondary clusters after a takeover, see Appendix C, Takeover Postconditions, in Sun Cluster Geographic Edition System Administration Guide.

Actions Performed by the Sun Cluster Geographic Edition Software During a Takeover

When you run the geopg takeover command, the software confirms that the volume sets are in a Replicating or Logging state on the secondary cluster.

If the original primary cluster, cluster-paris, can be reached, the software performs the following actions:

On the original secondary cluster, cluster-newyork, the software performs the following actions:

If the command completes successfully, the secondary cluster, cluster-newyork, becomes the new primary cluster for the protection group. Volume sets associated with a device group in the protection group have their role reversed according to the role of the protection group on the local cluster. If the protection group was active on the original secondary cluster before the takeover, the application resource groups are brought online on the new primary cluster. If the original primary cluster can be reached, it becomes the new secondary cluster of the protection group. Replication of all volume sets that are associated with the device groups of the protection group is stopped.


Caution – Caution –

After a successful takeover, data replication is stopped. If you want to continue to suspend replication, specify the -n option when you use the geopg start command. This option prevents the start of data replication from the new primary cluster to the new secondary cluster.


This command returns an error if any of the previous operations fails. Use the geoadm status command to view the status of each component. For example, the Configuration status of the protection group might be set to Error, depending on the cause of the failure. The protection group might be activated or deactivated.

If the Configuration status of the protection group is set to Error, revalidate the protection group by using the procedures described in How to Validate a Sun StorageTek Availability Suite Protection Group.

If the configuration of the protection group is not the same on each partner cluster, you need to resynchronize the configuration by using the procedures described in How to Resynchronize a Sun StorageTek Availability Suite Protection Group.

Recovering Sun StorageTek Availability Suite Data After a Takeover

After a successful takeover operation, the secondary cluster, cluster-newyork, becomes the primary for the protection group and the services are online on the secondary cluster. After the recovery of the original primary cluster, the services can be brought online again on the original primary by using a process called failback.

Sun Cluster Geographic Edition software supports the following two kinds of failback:

If you want to leave the new primary, cluster-newyork, as the primary cluster and the original primary cluster, cluster-paris, as the secondary after the original primary starts again , you can resynchronize and revalidate the protection group configuration without performing a switchover or takeover.

This section provides the following information:

ProcedureHow to Resynchronize and Revalidate the Protection Group Configuration

Use this procedure to resynchronize and revalidate data on the original primary cluster, cluster-paris, with the data on the current primary cluster, cluster-newyork.

Before You Begin

Before you resynchronize and revalidate the protection group configuration, a takeover has occurred on cluster-newyork. The clusters now have the following roles:

  1. Resynchronize the original primary cluster, cluster-paris, with the current primary cluster, cluster-newyork.

    The cluster cluster-paris forfeits its own configuration and replicates the cluster-newyork configuration locally. Resynchronize both the partnership and protection group configurations.

    1. On cluster-paris, deactivate the protection group on the local cluster.


      # geopg stop -e Local protectiongroupname
      
      -e Local

      Specifies the scope of the command.

      By specifying a local scope, the command operates on the local cluster only.

      protectiongroupname

      Specifies the name of the protection group.

      If the protection group is already deactivated, the state of the resource group in the protection group is probably Error. The state is Error because the application resource groups are managed and offline.

      Deactivating the protection group results in the application resource groups no longer being managed, clearing the Error state.

    2. On cluster-paris, resynchronize the partnership.


      # geops update partnershipname
      
      partnershipname

      Specifies the name of the partnership


      Note –

      You need to perform this step only once, even if you are resynchronizing multiple protection groups.


      For more information about synchronizing partnerships, see Resynchronizing a Partnership in Sun Cluster Geographic Edition System Administration Guide.

    3. On cluster-paris, resynchronize each protection group.

      Because the role of the protection group on cluster-newyork is primary, this step ensures that the role of the protection group on cluster-paris is secondary.


      # geopg update protectiongroupname 
      
      protectiongroupname

      Specifies the name of the protection group

      For more information about synchronizing protection groups, see Resynchronizing a Sun StorageTek Availability Suite Protection Group.

  2. On cluster-paris, validate the configuration for each protection group.


    # geopg validate protectiongroupname 
    
    protectiongroupname

    Specifies a unique name that identifies a single protection group

    For more information, see How to Validate a Sun StorageTek Availability Suite Protection Group.

  3. On cluster-paris, activate each protection group.

    When you activate a protection group, its application resource groups are also brought online.


    # geopg start -e Global protectiongroupname
    
    -e Global

    Specifies the scope of the command.

    By specifying a Global scope, the command operates on both clusters where the protection group is deployed.

    protectiongroupname

    Specifies the name of the protection group.


    Caution – Caution –

    Do not use the -n option because the data needs to be synchronized from the current primary, cluster-newyork, to the current secondary, cluster-paris.

    Because the protection group has a role of secondary, the data is synchronized from the current primary, cluster-newyork, to the current secondary, cluster-paris.

    For more information about the geopg start command, see How to Activate a Sun StorageTek Availability Suite Protection Group.


  4. Confirm that the data is completely synchronized.

    First, confirm that the state of the protection group on cluster-newyork is OK.


    phys-newyork-1# geoadm status
    

    Refer to the Protection Group section of the output.

    Next, confirm that all resources in the replication resource group, AVSprotectiongroupname-rep-rg, report a status of OK.


    phys-newyork-1# clresource status -v AVSdevicegroupname-rep-rs
    

ProcedureHow to Perform a Failback-Switchover on a System That Uses Sun StorageTek Availability Suite Replication

Use this procedure to restart an application on the original primary cluster, cluster-paris, after the data on the cluster has been resynchronized with the data on the current primary cluster, cluster-newyork.

The failback procedures apply only to clusters in a partnership. You need to perform the following procedure only once per partnership.

Before You Begin

Before you perform a failback-switchover, a takeover has occurred on cluster-newyork. The clusters now have the following roles:

  1. Resynchronize the original primary cluster, cluster-paris, with the current primary cluster, cluster-newyork.

    The cluster cluster-paris forfeits its own configuration and replicates the cluster-newyork configuration locally. Resynchronize both the partnership and protection group configurations.

    1. On cluster-paris, resynchronize the partnership.


      phys-paris-1# geops update partnershipname
      
      partnershipname

      Specifies the name of the partnership


      Note –

      You need to perform this step only once per partnership, even if you are performing a failback-switchover for multiple protection groups in the partnership.


      For more information about synchronizing partnerships, see Resynchronizing a Partnership in Sun Cluster Geographic Edition System Administration Guide.

    2. Determine whether the protection group on the original primary cluster, cluster-paris, is active.


      phys-paris-1# geoadm status
      
    3. If the protection group on the original primary cluster is active, stop it.


      phys-paris-1# geopg stop -e local protectiongroupname
      
      -e local

      Specifies the scope of the command.

      By specifying a local scope, the command operates on the local cluster only.

      protectiongroupname

      Specifies the name of the protection group.

      If the protection group is already deactivated, the state of the resource group in the protection group is probably Error. The state is Error because the application resource groups are managed and offline.

      Deactivating the protection group results in the application resource groups no longer being managed, clearing the Error state.

    4. Verify that the protection group is stopped.


      phys-paris-1# geoadm status
      
    5. On cluster-paris, resynchronize each protection group.

      Because the local role of the protection group on cluster-newyork is now primary, this steps ensures that the role of the protection group on cluster-paris becomes secondary.


      phys-paris-1# geopg update protectiongroupname 
      
      protectiongroupname

      Specifies the name of the protection group

      For more information about synchronizing protection groups, see Resynchronizing a Sun StorageTek Availability Suite Protection Group.

  2. On cluster-paris, validate the configuration for each protection group.

    A protection group cannot be started when it is in a error state. Ensure that the protection group is not in an error state.


    phys-paris-1# geopg validate protectiongroupname 
    
    protectiongroupname

    Specifies a unique name that identifies a single protection group

    For more information, see How to Validate a Sun StorageTek Availability Suite Protection Group.

  3. On cluster-paris, activate each protection group.

    When you activate a protection group, its application resource groups are also brought online.


    phys-paris-1# geopg start -e Global protectiongroupname
    
    -e Global

    Specifies the scope of the command.

    By specifying a Global scope, the command operates on both clusters where the protection group is deployed.

    protectiongroupname

    Specifies the name of the protection group.


    Caution – Caution –

    Do not use the -n option when performing a failback-switchover because the data needs to be synchronized from the current primary, cluster-newyork, to the current secondary, cluster-paris.

    Because the protection group has a role of secondary, the data is synchronized from the current primary, cluster-newyork, to the current secondary, cluster-paris.

    For more information about the geopg start command, see How to Activate a Sun StorageTek Availability Suite Protection Group.


  4. Confirm that the data is completely synchronized.

    First, confirm that the state of the protection group on cluster-newyork is OK.


    phys-newyork-1# geoadm status
    

    Refer to the Protection Group section of the output.

    Next, confirm that all resources in the replication resource group, AVSprotectiongroupname-rep-rg, report a status of OK.


    phys-newyork-1# clresource status -v AVSdevicegroupname-rep-rs
    
  5. On both partner clusters, ensure that the protection group is activated.


    # geoadm status
    
  6. On either cluster, perform a switchover from cluster-newyork to cluster-paris for each protection group.


    # geopg switchover [-f] -m clusterparis protectiongroupname
    

    For more information, see How to Switch Over a Sun StorageTek Availability Suite Protection Group From Primary to Secondary.

    cluster-paris resumes its original role as primary cluster for the protection group.

  7. Ensure that the switchover was performed successfully.

    Verify that the protection group is now primary on cluster-paris and secondary on cluster-newyork and that the state for “Data replication” and “Resource groups” is OK on both clusters.


    # geoadm status
    

    Check the runtime status of the application resource group and data replication for each Sun StorageTek Availability Suite protection group.


    # clresourcegroup status -v resourcegroupname
    # clresource status -v AVSdevicegroupname-rep-rs
    

    Refer to the Status and Status Message fields that are presented for the data replication device group you want to check. For more information about these fields, see Table 2–1.

    For more information about the runtime status of data replication, see Checking the Runtime Status of Sun StorageTek Availability Suite Data Replication.

ProcedureHow to Perform a Failback-Takeover on a System That Uses Sun StorageTek Availability Suite Replication

Use this procedure to restart an application on the original primary cluster, cluster-paris, and use the current data on the original primary cluster. Any updates that occurred on the secondary cluster, cluster-newyork, while it was acting as primary are discarded.

The failback procedures apply only to clusters in a partnership. You need to perform the following procedure only once per partnership.


Note –

Conditionally, you can resume using the data on the original primary, cluster-paris. You must not have replicated data from the new primary, cluster-newyork, to the original primary cluster, cluster-paris, at any point after the takeover operation on cluster-newyork.


Before You Begin

Before you begin the failback-takeover operation, the clusters have the following roles:

  1. Resynchronize the original primary cluster, cluster-paris, with the original secondary cluster, cluster-newyork.

    cluster-paris forfeits its own configuration and replicates the cluster-newyork configuration locally.

    1. On cluster-paris, resynchronize the partnership.


      phys-paris-1# geops update partnershipname
      
      partnershipname

      Specifies the name of the partnership


      Note –

      You need to perform this step only once per partnership, even if you are performing a failback-takeover for multiple protection groups in the partnership.


      For more information about synchronizing partnerships, see Resynchronizing a Partnership in Sun Cluster Geographic Edition System Administration Guide.

    2. Determine whether the protection group on the original primary cluster, cluster-paris, is active.


      phys-paris-1# geoadm status
      
    3. If the protection group on the original primary cluster is active, stop it.


      phys-paris-1# geopg stop -e local protectiongroupname
      
    4. Verify that the protection group is stopped.


      phys-paris-1# geoadm status
      
    5. On cluster-paris, resynchronize each protection group.

      If the protection group has been activated, deactivate the protection group by using the geopg stop command. For more information about deactivating a protection group, see How to Deactivate a Sun StorageTek Availability Suite Protection Group.


      phys-paris-1# geopg update protectiongroupname
      
      protectiongroupname

      Specifies the name of the protection group

      For more information about synchronizing protection groups, see How to Resynchronize a Sun StorageTek Availability Suite Protection Group.

  2. On cluster-paris, validate the configuration for each protection group.

    Ensure that the protection group is not in an error state. A protection group cannot be started when it is in an error state.


    phys-paris-1# geopg validate protectiongroupname 
    
    protectiongroupname

    Specifies a unique name that identifies a single protection group

    For more information, see How to Validate a Sun StorageTek Availability Suite Protection Group.

  3. On cluster-paris, activate each protection group in the secondary role without data replication.

    Because the protection group on cluster-paris has a role of secondary, the geopg start command does not restart the application on cluster-paris.


    phys-paris-1# geopg start -e local -n protectiongroupname
    
    -e local

    Specifies the scope of the command.

    By specifying a local scope, the command operates on the local cluster only.

    -n

    Prevents the start of data replication at protection group startup.


    Note –

    You must use the -n option.


    protectiongroupname

    Specifies the name of the protection group.

    For more information, see How to Activate a Sun StorageTek Availability Suite Protection Group.

    Replication from cluster-newyork to cluster-paris is not started because the -n option is used on cluster-paris.

  4. On cluster-paris, initiate a takeover for each protection group.


    phys-paris-1# geopg takeover  [-f] protectiongroupname
    
    -f

    Forces the command to perform the operation without your confirmation

    protectiongroupname

    Specifies the name of the protection group

    For more information about the geopg takeover command, see How to Force Immediate Takeover of Sun StorageTek Availability Suite Services by a Secondary Cluster.

    The protection group on cluster-paris now has the primary role, and the protection group on cluster-newyork has the secondary role.

  5. On cluster-newyork, activate each protection group.

    Because the protection group on cluster-newyork has a role of secondary, the geopg start command does not restart the application on cluster-newyork.


    phys-newyork-1# geopg start -e local [-n] protectiongroupname
    
    -e local

    Specifies the scope of the command.

    By specifying a local scope, the command operates on the local cluster only.

    -n

    Prevents the start of data replication at protection group startup.

    If you omit this option, the data replication subsystem starts at the same time as the protection group.

    protectiongroupname

    Specifies the name of the protection group.

    For more information about the geopg start command, see How to Activate a Sun StorageTek Availability Suite Protection Group.

  6. Start data replication.

    To start data replication, activate the protection group on the primary cluster, cluster-paris.


    phys-paris-1# geopg start -e local protectiongroupname
    

    For more information about the geopg start command, see How to Activate a Sun StorageTek Availability Suite Protection Group.

  7. For each cluster, verify that the protection groups are set correctly and that the application resource group status and the data replication status are okay.

    1. Verify that the protection group is now primary on cluster-paris and secondary on cluster-newyork. Run the following command from one node on each cluster:


      # geoadm status
      
    2. Check the runtime status of the application resource group and data replication for each Sun StorageTek Availability Suite protection group. Run the following commands from one node on each cluster:


      # clresourcegroup status -v resourcegroupname
      # clresource status -v AVSdevicegroupname-rep-rs
      

      Refer to the Status and Status Message fields that are presented for the data replication device group you want to check. For more information about these fields, see Table 2–1.

      For more information about the runtime status of data replication, see Checking the Runtime Status of Sun StorageTek Availability Suite Data Replication.

Recovering From a Sun StorageTek Availability Suite Data Replication Error

When an error occurs at the data replication level, the error is reflected in the status of the resource in the replication resource group of the relevant device group.

For example, suppose a device group controlled by Sun StorageTek Availability Suite that is called avsdg changes to a Volume failed state, VF. This state is reflected in the following resource status:


Resource Status = "FAULTED"
Resource status message = "FAULTED : Volume failed"

Note –

The Resource State remains Online because the probe is still running correctly.


Because the resource status has changed, the protection group status also changes. In this case, the local Data Replication state, the Protection Group state on the local cluster, and the overall Protection Group state become Error.

To recover from an error state, complete the relevant steps in the following procedure.

ProcedureHow to Recover From a Data Replication Error

  1. Use the procedures in the Sun StorageTek Availability Suite documentation to determine the causes of the FAULTED state. This state is indicated as VF.

  2. Recover from the faulted state by using the Sun StorageTek Availability Suite procedures.

    If the recovery procedures change the state of the device group, this state is automatically detected by the resource and is reported as a new protection group state.

  3. Revalidate the protection group configuration.


    phys-paris-1# geopg validate protectiongroupname 
    
    protectiongroupname

    Specifies the name of the Sun StorageTek Availability Suite protection group

  4. Review the status of the protection group configuration.


    phys-paris-1# geopg list protectiongroupname 
    
    protectiongroupname

    Specifies the name of the Sun StorageTek Availability Suite protection group