Sun Cluster Geographic Edition Data Replication Guide for Oracle Data Guard

Chapter 3 Migrating Services That Use Oracle Data Guard Data Replication

This chapter provides information about migrating services for maintenance or as a result of cluster failure.

This chapter covers the following topics:

Detecting Cluster Failure on a System That Uses Oracle Data Guard Data Replication

This section describes the internal processes that occur when failure is detected on a primary or a standby cluster.

Detecting Primary Cluster Failure

When the primary cluster for a given protection group fails, the standby cluster in the partnership detects the failure. If the cluster that fails is a member of more than one partnership, multiple failure detections might occur.

The following actions occur when the overall state of a protection group changes to the Unknown state:

Detecting Failure of the Standby Cluster

When a standby cluster for a given protection group fails, a cluster in the same partnership detects the failure. If the cluster that failed is a member of more than one partnership, multiple failure detections might occur.

During failure detection, the following actions occur:

Migrating Services That Use Oracle Data Guard With a Switchover

You perform a switchover of an Oracle Data Guard protection group when you want to migrate services to the partner cluster in an orderly fashion. A switchover includes the following operations:

This section provides the following information:

ProcedureHow to Switch Over an Oracle Data Guard Protection Group From the Primary to the Standby Cluster

Before You Begin

For a switchover to occur, data replication must be active between the primary cluster and the standby cluster, that is, the Oracle Data Guard Broker configuration is enabled. Additionally, the Oracle Data Guard Broker show configuration command must show a SUCCESS state. This state is reflected in the state of the Sun Cluster Geographic Edition replication resource for this Oracle Data Guard Broker configuration, which should show the online state.

Before you switch over a protection group from the primary cluster to the standby cluster, ensure that the following conditions are met:

  1. Log in to a cluster node.

    To complete this step, you need to be assigned the Geo Management RBAC rights profile. For more information about RBAC, see Sun Cluster Geographic Edition Software and RBAC in Sun Cluster Geographic Edition System Administration Guide.

  2. Initiate the switchover.

    The application resource groups that are a part of the protection group are stopped and started during the switchover.


    phys-node-n# geopg switchover [-f] -m newprimarycluster protectiongroupname
    
    -f

    Forces the command to perform the operation without asking you for confirmation.

    -m newprimarycluster

    Specifies the name of the cluster that is to be the primary cluster for the protection group.

    protectiongroupname

    Specifies the name of the protection group.


Example 3–1 Forcing a Switchover From the Primary to the Standby Cluster

This example shows how to perform a switchover to the standby cluster.


phys-paris-1# geopg switchover -f -m cluster-newyork sales-pg

Actions Performed by the Sun Cluster Geographic Edition Software During a Switchover

When you run the geopg switchover command, the software confirms that the primary cluster does indeed hold the primary database. The command checks that the remote database is in an enabled state in the Oracle Data Guard Broker configuration. The command also confirms that the configuration is healthy by issuing the Oracle Data Guard command-line interface (dgmgrl) show configuration command to ensure that the command returns a SUCCESS state. If the output from this command indicates that Oracle Data Guard Broker is busy performing its own health check, the Oracle Data Guard command-line interface retries the command until it receives a SUCCESS response or until two minutes have passed. If the command-line interface is unable to get a SUCCESS response, the command fails. If the configuration is healthy, the software performs the following actions on the original primary cluster:

On the original standby cluster, the command takes the following actions:

If the command completes successfully, the standby cluster, cluster-newyork, becomes the new primary cluster for the protection group. The original primary cluster, cluster-paris, becomes the new standby cluster. Databases that are associated with the Oracle Data Guard Broker configurations of the protection group have their role reversed according to the role of the protection group on the local cluster. The Oracle shadow RAC server proxy resource group and any other application resource groups are online on the new primary cluster. Data replication from the new primary cluster to the new standby cluster begins.

This command returns an error if any of the previous operations fails. Run the geoadm status command to view the status of each component. For example, the Configuration status of the protection group might be set to Error, depending on the cause of the failure. The protection group might be activated or deactivated.

If the Configuration status of the protection group is set to Error, revalidate the protection group by using the procedures that are described in How to Validate an Oracle Data Guard Protection Group.

If the configuration of the protection group is not the same on each partner cluster, you need to resynchronize the configuration by using the procedures that are described in How to Resynchronize an Oracle Data Guard Protection Group.

Forcing a Takeover on Systems That Use Oracle Data Guard

You perform a takeover when applications need to be brought online on the standby cluster, regardless of whether the data is completely consistent between the primary database and the standby database. In this section, it is assumed that the protection group has been started.

The following operations occur after you initiate a takeover:

For details about the possible conditions of the primary and standby clusters before and after a takeover, see Appendix C, Takeover Postconditions, in Sun Cluster Geographic Edition System Administration Guide.

This section provides the following information:

ProcedureHow to Force Immediate Takeover of Oracle Data Guard Services by a Standby Cluster

Before You Begin

Before you force the standby cluster to assume the activity of the primary cluster, ensure that the following conditions are met:

  1. Log in to a node in the standby cluster.

    To complete this step, you need to be assigned the Geo Management RBAC rights profile. For more information about RBAC, see Sun Cluster Geographic Edition Software and RBAC in Sun Cluster Geographic Edition System Administration Guide.

  2. Initiate the takeover.


    phys-node-n# geopg takeover [-f] protectiongroupname
    
    -f

    Forces the command to perform the operation without your confirmation.

    protectiongroupname

    Specifies the name of the protection group.


Example 3–2 Forcing a Takeover by a Standby Cluster

This example shows how to force the takeover of sales-pg by the standby cluster cluster-newyork.

The node phys-newyork-1 is the first node of the standby cluster. For a reminder of which node is phys-newyork-1, see Example Sun Cluster Geographic Edition Cluster Configuration in Sun Cluster Geographic Edition System Administration Guide.


phys-newyork-1# geopg takeover -f sales-pg

Next Steps

For information about the state of the primary and the standby clusters after a takeover, see Appendix C, Takeover Postconditions, in Sun Cluster Geographic Edition System Administration Guide.

Actions Performed by the Sun Cluster Geographic Edition Software During a Takeover

When you run the geopg takeover command, the software confirms that databases in the Oracle Data Guard Broker configuration on the standby cluster, that is, the future primary, are enabled (as you cannot perform a takeover to a disabled database). The software also confirms that the Oracle Data Guard command-line interface show configuration command either shows a SUCCESS state or is busy performing a health check (ORA-16610). If the show configuration command returns any other Oracle error code, the takeover fails.

If the original primary cluster, cluster-paris, can be reached, the software takes the application resource groups offline and places them in an Unmanaged state.

On the original standby cluster, cluster-newyork, the software performs the following operations:

If the command completes successfully, the standby cluster, cluster-newyork, becomes the new primary cluster for the protection group. Databases that are associated with the Oracle Data Guard Broker configurations of the protection group have their role reversed according to the role of the protection group on the local cluster. The Oracle shadow RAC server proxy resource group and any other application resource group are online on the new primary cluster. If the original primary cluster can be reached, it becomes the new standby cluster of the protection group. Replication of all databases that are associated with the Oracle Data Guard Broker configurations of the protection group are stopped.


Caution – Caution –

After a successful takeover, data replication is stopped. If you want to continue to suspend replication, specify the -n option when you use the geopg start command. This option prevents the start of data replication from the new primary cluster to the new standby cluster.


If a previous operation fails, this command returns an error. Use the geoadm status command to view the status of each component. For example, the Configuration status of the protection group might be set to an Error state, depending on the cause of the failure. The protection group might be activated or deactivated.

If the Configuration status of the protection group is set to the Error state, revalidate the protection group by using the procedures that are described in How to Validate an Oracle Data Guard Protection Group.

If the configuration of the protection group is not the same on each partner cluster, you need to resynchronize the configuration by using the procedures described in How to Resynchronize an Oracle Data Guard Protection Group.

Recovering Oracle Data Guard Data After a Takeover

After a successful takeover operation, the standby cluster, cluster-newyork, becomes the primary for the protection group, and the services are online on the standby cluster. After the recovery of the original primary cluster, the services can be brought online again on the original primary cluster by using a process called failback.

Sun Cluster Geographic Edition software supports the following two kinds of failback:

If you want to leave the new primary, cluster-newyork, as the primary cluster and the original primary cluster, cluster-paris, as the standby cluster after the original primary cluster starts again , you can resynchronize and revalidate the protection group configuration. You can resynchronize and revalidate the protection group without performing a switchover or takeover.

This section describes how to perform the following procedures:

ProcedureHow to Resynchronize and Revalidate the Protection Group Configuration

Follow this procedure to resynchronize and revalidate data on the original primary cluster, cluster-paris, with the data on the current primary cluster cluster-newyork.

Before You Begin

Before you resynchronize and revalidate the protection group configuration, a takeover has occurred on cluster-newyork. The clusters now have the following roles:

  1. If the original primary cluster, cluster-paris, has been down, confirm that the cluster is booted and that the Sun Cluster Geographic Edition infrastructure is enabled on the cluster.

    For more information about booting a cluster, see Booting a Cluster in Sun Cluster Geographic Edition System Administration Guide.

  2. Resynchronize the original primary cluster, cluster-paris, with the current primary cluster cluster-newyork.

    The cluster cluster-paris forfeits its own configuration and replicates the cluster-newyork configuration locally. Resynchronize both the partnership and protection group configurations.

    1. On cluster-paris, deactivate the protection group on the local cluster.


      phys-paris-1# geopg stop -e local protectiongroupname
      
      -e local

      Specifies the scope of the command.

      By specifying a local scope, the command operates on the local cluster only.


      Note –

      The property values, such as global and local, are not case sensitive.


      protectiongroupname

      Specifies the name of the protection group.

      If the protection group is already deactivated, the state of the resource group in the protection group is probably Error because the application resource groups are managed and offline.

      If you deactivate the protection group, the application resource groups are no longer managed, clearing the Error state.

    2. On cluster-paris, resynchronize the partnership.


      phys-paris-1# geops update partnershipname
      

      Note –

      You need to perform this step only once, even if you are resynchronizing multiple protection groups.


      For more information about synchronizing partnerships, see Resynchronizing a Partnership in Sun Cluster Geographic Edition System Administration Guide.

    3. On cluster-paris, resynchronize each protection group.

      Because the role of the protection group on cluster-newyork is primary, this step ensures that the role of the protection group on cluster-paris is secondary.


      phys-paris-1# geopg update protectiongroupname
      

      For more information about synchronizing protection groups, see Resynchronizing an Oracle Data Guard Protection Group.

  3. On cluster-paris, validate the configuration for each protection group.


    phys-paris-1# geopg validate protectiongroupname
    

    For more information, see How to Validate an Oracle Data Guard Protection Group.

  4. On cluster-paris, activate each protection group.

    When you activate a protection group, the protection group's application resource groups are also brought online.


    phys-paris-1# geopg start -e global protectiongroupname
    
    -e global

    Specifies the scope of the command.

    By specifying a global scope, the command operates on both clusters where the protection group is located.


    Note –

    The property values, such as global and local, are not case sensitive.


    protectiongroupname

    Specifies the name of the protection group.


    Caution – Caution –

    Do not use the -n option because the data needs to be synchronized from the current primary cluster, cluster-newyork, to the current standby cluster, cluster-paris.

    Because the protection group has a role of secondary, the data is synchronized from the current primary cluster, cluster-newyork, to the current standby cluster, cluster-paris.

    For more information about the geopg start command, see How to Activate an Oracle Data Guard Protection Group.


  5. Confirm that all data is synchronized.

    1. Confirm that the state of the protection group on cluster-newyork is OK.


      phys-newyork-1# geoadm status
      

      Refer to the Protection Group section of the output.

    2. Confirm that all resources in the replication resource group, ODGprotectiongroupname-odg-rep-rg, report a status of OK.


      phys-newyork-1# clresource status -v ODGprotectiongroupname-odg-rep-rs
      

ProcedureHow to Perform a Failback Switchover on a System That Uses Oracle Data Guard Replication

Follow this procedure to restart an application on the original primary cluster, cluster-paris, after the data on the cluster has been resynchronized with the data on the current primary cluster, cluster-newyork.

The failback procedures apply only to clusters in a partnership. You need to perform the following procedure only once for each partnership.

Before You Begin

Before you perform a failback switchover, a takeover has occurred on cluster-newyork. The clusters now have the following roles:

  1. If the original primary cluster, cluster-paris, failed, confirm that the cluster is restarted and that the Sun Cluster Geographic Edition infrastructure is enabled on the cluster.

    For more information about restarting a cluster, see Booting a Cluster in Sun Cluster Geographic Edition System Administration Guide.

  2. Recover and restore the failed Oracle Data Guard primary database as the new standby.

    Refer to the Oracle documentation, which describes how to perform this step.

  3. Ensure that the original primary cluster, cluster-paris, is working correctly as part of the Oracle Data Guard configuration.


    oracle (phys-paris-1)$ dgmgrl sys/sysdba_password@sales-svc
    DGMGRL> show configuration;
    

    If the original primary cluster, cluster-paris, is working correctly, the show configuration command displays the SUCCESS state.

    If the original primary cluster was down at the point of failure, it is marked as a deactivated primary cluster. If the original primary cluster was up at the point of failure, it is marked as a deactivated secondary.

  4. Resynchronize the original primary cluster, cluster-paris, with the current primary cluster cluster-newyork.

    The cluster cluster-paris forfeits its own configuration and replicates the cluster-newyork configuration locally. Resynchronize both the partnership and protection group configurations.

    1. On cluster-paris, resynchronize the partnership.


      phys-paris-1# geops update partnershipname
      

      Note –

      You need to perform this step only once for each partnership, even if you are performing a failback switchover for multiple protection groups in the partnership.


      For more information about synchronizing partnerships, see Resynchronizing a Partnership in Sun Cluster Geographic Edition System Administration Guide.

    2. Determine whether the protection group on the original primary cluster, cluster-paris, is active.


      phys-paris-1# geoadm status
      
    3. If the protection group on the original primary cluster is active, stop the protection group.


      phys-paris-1# geopg stop -e local protectiongroupname
      
      -e local

      Specifies the scope of the command.

      By specifying a local scope, the command operates on the local cluster only.


      Note –

      The property values, such as global and local, are not case sensitive.


      protectiongroupname

      Specifies the name of the protection group.

      If the protection group is already deactivated, the state of the resource group in the protection group is probably Error because the application resource groups are managed and offline.

      If you deactivate the protection group, the application resource groups are no longer managed, clearing the Error state.

    4. Verify that the protection group is stopped.


      phys-paris-1# geoadm status
      
    5. On cluster-paris, resynchronize each protection group.

      Because the local role of the protection group on the cluster-newyork cluster is now primary, this step ensures that the role of the protection group on the cluster-paris cluster becomes secondary.


      phys-paris-1# geopg update protectiongroupname
      

      For more information about synchronizing protection groups, see Resynchronizing an Oracle Data Guard Protection Group.

  5. On cluster-paris, validate the configuration for each protection group.

    A protection group cannot be started when it is in an Error state. Ensure that the protection group is not in an Error state.


    phys-paris-1# geopg validate protectiongroupname
    

    For more information, see How to Validate an Oracle Data Guard Protection Group.

  6. On cluster-paris, activate each protection group.

    When you activate a protection group, its application resource groups are also brought online.


    phys-paris-1# geopg start -e global protectiongroupname
    
    -e global

    Specifies the scope of the command.

    By specifying a global scope, the command operates on both clusters where the protection group is located.


    Note –

    The property values, such as global and local, are not case sensitive.


    protectiongroupname

    Specifies the name of the protection group.

  7. Confirm that the data is completely synchronized.

    1. Confirm that the state of the protection group on cluster-newyork is OK.


      phys-newyork-1# geoadm status
      

      Refer to the Protection Group section of the output.

    2. Confirm that all resources in the replication resource group, ODGprotectiongroupname-odg-rep-rg, report a status of OK.


      phys-newyork-1# clresource status -v ODGprotectiongroupname-odg-rep-rs
      
  8. On both partner clusters, ensure that the protection group is activated.


    phys-paris-1# geoadm status
    …
    phys-newyork-1# geoadm status
  9. For each protection group, on either cluster, perform a switchover from cluster-newyork to cluster-paris.


    phys-node-n# geopg switchover [-f] -m cluster-paris protectiongroupname
    

    For more information, see How to Switch Over an Oracle Data Guard Protection Group From the Primary to the Standby Cluster.

    The cluster-paris cluster resumes its original role as primary cluster for the protection group.

  10. Ensure that the switchover was performed successfully.


    phys-node-n# geoadm status
    

    Verify that the protection group is now primary on cluster-paris and secondary on cluster-newyork and that the states that are shown for the Data replication and the Resource groups properties are OK on both clusters.

  11. Check the runtime status of the application resource group and data replication for each Oracle Data Guard protection group.


    phys-node-n# clresourcegroup status -v resourcegroupname
    # clresource status -v ODGConfigurationName-odg-rep-rs
    

    Refer to the Status and Status Message fields that are presented for the Oracle Data Guard Broker configuration that you want to check. For more information about these fields, see Table 2–1.

    For more information about the runtime status of data replication, see Checking the Runtime Status of Oracle Data Guard Data Replication.

ProcedureHow to Perform a Failback Takeover on a System That Uses Oracle Data Guard Replication

Follow this procedure to restart an application on the original primary cluster, cluster-paris, and to use the current data on the original primary cluster.


Note –

Any updates that occurred on the standby cluster, cluster-newyork, while it was acting as primary are discarded.


The failback procedures apply only to clusters in a partnership. You need to perform the following procedure only once for each partnership.


Note –

Conditionally, you can resume using the data on the original primary cluster-paris. However, you must not have replicated data from the new primary, cluster-newyork, to the original primary cluster, cluster-paris, at any point after the takeover operation on cluster-newyork.


Before You Begin

Before you begin the failback takeover procedure, the clusters must have the following roles:

  1. If the original primary cluster, cluster-paris, failed, confirm that the cluster is restarted and that the Sun Cluster Geographic Edition infrastructure is enabled on the cluster.

    For more information about restarting a cluster, see Booting a Cluster in Sun Cluster Geographic Edition System Administration Guide.

  2. Recover and revert the new Oracle Data Guard primary database to a standby for the original primary to a point in time before the original primary failed.

    Refer to the Oracle documentation, which describes how to perform this step.


    Note –

    You might need to use the dgmgrl command to remove and recreate the Oracle Data Guard Broker configuration.


  3. Ensure that the original primary cluster, cluster-paris, is again working correctly as the primary as part of the Oracle Data Guard configuration.


    oracle (phys-paris-1)$ dgmgrl sys/sysdba_password@sales-svc
    DGMGRL> show configuration;
    

    If the original primary cluster, cluster-paris, is working correctly, the show configuration command displays the SUCCESS state.

    If the original primary cluster was up at the point of failure, it is marked as a deactivated secondary cluster. Also, the original standby cluster is marked as an activated primary.

    If the original primary cluster was down at the point of failure, it is marked as a deactivated primary cluster. Also, the original standby cluster is marked as an activated primary.

  4. Was the original primary cluster, cluster-paris, up or down at the point of failure?

    • If the original primary cluster, cluster-paris, was down at the point of failure, update the original standby cluster, cluster-newyork, to a secondary.

      1. On the original standby cluster, that is, the cluster that has become the new primary cluster, stop the protection group.


        phys-newyork-1# geopg stop -e local protectiongroupname
        
      2. On the original standby cluster, that is, the cluster that has become the new primary cluster, update the protection group.


        phys-newyork-1# geopg update protectiongroupname
        

        The roles are now correct, but both clusters are marked as deactivated.

        For more information about synchronizing protection groups, see How to Resynchronize an Oracle Data Guard Protection Group.

      3. On cluster-paris and on cluster-newyork, locally validate the configuration for each protection group.

        Ensure that the protection group is not in an Error state. You cannot start a protection group when the protection group is in an Error state.


        phys-paris-1# geopg validate protectiongroupname
        phys-newyork-1# geopg validate protectiongroupname
        

        For more information, see How to Validate an Oracle Data Guard Protection Group.

      4. From any node in either cluster, globally activate the protection group on both clusters.


        # geopg start -e global protectiongroupname
        

        Once the protection groups are activated on both clusters, you have successfully performed the failback takeover.

    • If the original primary cluster, cluster-paris, was up at the point of failure, determine the status of the secondary (that is, the original primary) configuration.


      phys-newyork-1# geoadm status
      
      • If the Configuration status is set to OK, synchronize the configurations.

        1. Initiate a takeover for each protection group on the original primary cluster-paris.


          phys-paris-1# geopg takeover [-f] protectiongroupname
          
        2. If the configuration for the original standby cluster, cluster-newyork, is marked as Error, validate the configuration for each protection group.


          cluster-newyork# geopg validate protectiongroupname
          

          For more information, see How to Validate an Oracle Data Guard Protection Group.

        3. Globally activate the protection groups on both clusters.


          cluster-newyork# geopg start -e global protectiongroupname
          

          Once the protection groups are activated on both clusters, you have successfully performed the failback takeover.

      • If the Configuration status is set to Error, resolve the problem.

        1. Deactivate the secondary (that is, the original primary) configuration that is in the Error state.


          phys-newyork-1#  geopg stop -e local protectiongroupname
          
        2. Force a takeover to make the secondary configuration a primary configuration again and to match the underlying Oracle dgmgrl configuration.


          phys-newyork-1# geopg takeover -f protectiongroupname
          
        3. On both the cluster-paris and on the cluster-newyork clusters, locally validate the configuration for each protection group.


          phys-paris-1# geopg validate protectiongroupname
          phys-newyork-1# geopg validate protectiongroupname
          

          For more information, see How to Validate an Oracle Data Guard Protection Group.

        4. From any node in either cluster, globally activate the protection group on both clusters.


          # geopg start -e global protectiongroupname
          

          Once the protection groups are activated on both clusters, you have successfully performed the failback takeover.

Recovering From an Oracle Data Guard Data Replication Error

When an error occurs at the data replication level, the error is reflected in the status of the resource in the replication resource group of the relevant the Oracle Data Guard Broker configuration.

For example, suppose that Oracle Data Guard Broker configuration sales-pg, which contains the replicated database sales, is changed from protection mode MaxAvailability to MaxPerformance. The state changes for FAULTED are reflected in the following resource status:


Resource Status = "FAULTED"
Resource status message = "FAULTED - Protection mode "MaxAvailability" given 
for local database sales does not match configured value "MaxPerformance""

Note –

The Resource State remains Online because the probe is still running correctly.


Because the resource status has changed, the protection group status also changes. In this case, the local Data Replication state, the Protection Group state on the local cluster, and the overall Protection Group state all become Error.

To recover from an error state, perform the following procedure.

ProcedureHow to Recover From a Data Replication Error

  1. Use the procedures in the Oracle Data Guard documentation to determine the causes of the FAULTED state.

  2. Recover from the faulted state by following the Oracle Data Guard procedures.

    If the recovery procedures change the state of the Oracle Data Guard Broker configuration, this state is automatically detected by the resource and is reported as a new protection group state. If the replication mode does not match the Sun Cluster Geographic Edition settings, type:


    phys-paris-1# geopg modify-replication-component -p replication_mode=New-protection-mode \
    ODGConfigurationName protectiongroupname
    
  3. Revalidate the protection group configuration.


    phys-paris-1# geopg validate protectiongroupname
    

    where protectiongroupname specifies the name of the Oracle Data Guard protection group.

  4. Review the status of the protection group configuration.


    phys-paris-1# geopg list protectiongroupname
    

    where protectiongroupname specifies the name of the Oracle Data Guard protection group.