This chapter provides information about migrating services for maintenance or as a result of cluster failure. This chapter contains the following sections:
Detecting Cluster Failure on a System That Uses EMC Symmetrix Remote Data Facility Data Replication
Migrating Services That Use EMC Symmetrix Remote Data Facility Data Replication With a Switchover
Forcing a Takeover on a System That Uses EMC Symmetrix Remote Data Facility Data Replication
Recovering From an EMC Symmetrix Remote Data Facility Data Replication Error
This section describes the internal processes that occur when failure is detected on a primary or a secondary cluster.
When the primary cluster for a protection group fails, the secondary cluster in the partnership detects the failure. The cluster that fails might be a member of more than one partnership, resulting in multiple failure detections.
The following actions take place when a primary cluster failure occurs. During a failure, the appropriate protection groups are in the Unknown state on the cluster that failed.
Heartbeat failure is detected by a partner cluster.
The heartbeat is activated in emergency mode to verify that the heartbeat loss is not transient and that the primary cluster has failed. The heartbeat remains in the Online state during this default time-out interval, while the heartbeat mechanism continues to retry the primary cluster.
This query interval is set by using the Query_interval heartbeat property. If the heartbeat still fails after the interval you configured, a heartbeat-lost event is generated and logged in the system log. When you use the default interval, the emergency-mode retry behavior might delay heartbeat-loss notification for about nine minutes. Messages are displayed in the graphical user interface (GUI) and in the output of the geoadm status command.
For more information about logging, see Viewing the Sun Cluster Geographic Edition Log Messages in Sun Cluster Geographic Edition System Administration Guide.
If the partnership is configured for heartbeat-loss notification, then one or both of the following actions occurs:
An email is sent to the address that is configured by the Notification_emailaddrs property.
The script defined in Notification_actioncmd is executed.
For more information about configuring heartbeat-loss notification, see Configuring Heartbeat-Loss Notification in Sun Cluster Geographic Edition System Administration Guide.
When a secondary cluster for a protection group fails, a cluster in the same partnership detects the failure. The cluster that failed might be a member of more than one partnership, resulting in multiple failure detections.
During failure detection, the following actions take place:
Heartbeat failure is detected by a partner cluster.
The heartbeat is activated in emergency mode to verify that the secondary cluster is dead.
When a failure is confirmed by the Sun Cluster Geographic Editionproduct, the cluster notifies the administrator. The system detects all protection groups for which the cluster that failed was acting as secondary. The state of the appropriate protection groups is marked Unknown.
Perform a switchover of an EMC Symmetrix Remote Data Facility protection group when you want to migrate services to the partner cluster in an orderly fashion. Basic Sun Cluster Geographic Edition operations such as geopg switchover, perform a symrdf swap operation. The symrdf swap operation requires significantly more time for static RDF than dynamic RDF. Therefore, you might need to increase the value of the timeout property of the protection group when using static RDF.
A switchover consists of the following:
Application services are brought offline on the former primary cluster, cluster-paris.
For a reminder of which cluster is cluster-paris, see Example Sun Cluster Geographic Edition Cluster Configuration in Sun Cluster Geographic Edition System Administration Guide.
The data replication role is reversed and now continues to run from the new primary, cluster-newyork, to the former primary, cluster-paris.
Application services are brought online on the new primary cluster, cluster-newyork.
You cannot perform personality swaps if you are running EMC Symmetrix Remote Data Facility/Asynchronous data replication.
This section contains information about the following topics:
When a switchover is initiated by using the geopg switchover command, the data replication subsystem runs several validations on both clusters. The switchover is performed only if the validation step succeeds on both clusters.
First, the replication subsystem checks that the EMC Symmetrix Remote Data Facility device group is in a valid aggregate RDF pair state. Then, it checks that the local device group type on the target primary cluster, cluster-newyork, is RDF2. The symrdf -g device-group-name -query command returns the local device group's state. These values correspond to a RDF1 or RDF2 state. The following table describes the EMC Symmetrix Remote Data Facility command that is run on the new primary cluster, cluster-newyork.
Table 3–1 EMC Symmetrix Remote Data Facility Switchover Validations on the New Primary Cluster
RDF Pair State |
EMC Symmetrix Remote Data Facility Switchover Command That Is Run on cluster-newyork |
---|---|
Synchronized |
Suspends the RDF link. |
R1Updated, Failedover, Suspended |
The symrdf swap command switches the role. |
Other RDF pair states |
No command is run. |
After a successful switchover, at the data replication level the roles of the primary and secondary volumes have been switched. The pre-switchover RDF1 volumes become the RDF2 volumes. The pre-switchover RDF2 volumes become the RDF1 volumes. Data replication continues from the new RDF1 volumes to the new RDF2 volumes.
The Local-role property of the protection group is also switched regardless of whether the application could be brought online on the new primary cluster as part of the switchover operation. On the cluster on which the protection group had a Local role of Secondary, the Local-role property of the protection group becomes Primary. On the cluster on which the protection group had a Local-role of Primary, the Local-role property of the protection group becomes Secondary.
For a successful switchover, data replication must be active between the primary and the secondary clusters and data volumes on the two clusters must be synchronized.
Before you switch over a protection group from the primary cluster to the secondary cluster, ensure that the following conditions are met:
The Sun Cluster Geographic Edition software is up and running on the both clusters.
The secondary cluster is a member of a partnership.
Both cluster partners can be reached.
The protection group is in the OK state.
If you have configured the Cluster_dgs property, only applications that belong to the protection group can write to the device groups specified in the Cluster_dgs property.
Log in to a cluster node.
You must be assigned the Geo Management RBAC rights profile to complete this procedure. For more information about RBAC, see Sun Cluster Geographic Edition Software and RBAC in Sun Cluster Geographic Edition System Administration Guide.
Initiate the switchover.
The application resource groups that are a part of the protection group are stopped and started during the switchover.
# geopg switchover [-f] -m newprimarycluster protectiongroupname |
Forces the command to perform the operation without asking you for confirmation
Specifies the name of the cluster that is to be the new primary cluster for the protection group
Specifies the name of the protection group
This example performs a switchover to the secondary cluster.
# geopg switchover -f -m cluster-newyork srdfpg |
Perform a takeover when applications need to be brought online on the secondary cluster regardless of whether the data is completely consistent between the primary volume and the secondary volume. The information in this section assumes that the protection group has been started.
The following steps occur after a takeover is initiated:
If the former primary cluster, cluster-paris, can be reached and the protection group is not locked for notification handling or some other reason, the application services are taken offline on the former primary cluster.
For a reminder of which cluster is cluster-paris, see Example Sun Cluster Geographic Edition Cluster Configuration in Sun Cluster Geographic Edition System Administration Guide.
Data volumes of the former primary cluster, cluster-paris, are taken over by the new primary cluster, cluster-newyork.
This data might be inconsistent with the original primary volumes. After the takeover, data replication from the new primary cluster, cluster-newyork, to the former primary cluster, cluster-paris, is stopped.
Application services are brought online on the new primary cluster, cluster-newyork.
For more details about takeover and the effects of the geopg takeover command, see Overview of Disaster Recovery Administration in Sun Cluster Geographic Edition System Administration Guide.
For details about the possible conditions of the primary and secondary cluster before and after takeover, see Appendix C, Takeover Postconditions, in Sun Cluster Geographic Edition System Administration Guide.
The following sections describe the steps you must perform to force a takeover by a secondary cluster.
When a takeover is initiated by using the geopg takeover command, the data replication subsystem runs several validations on both clusters. These step are conducted on the original primary cluster only if the primary cluster can be reached. If validation on the original primary cluster fails, the takeover still occurs.
First, the replication subsystem checks that the EMC Symmetrix Remote Data Facility device group is in a valid aggregate RDF pair state. The EMC Symmetrix Remote Data Facility commands that are used for the takeover are described in the following table.
Table 3–2 EMC Symmetrix Remote Data Facility Takeover Validations on the New Primary Cluster
Aggregate RDF Pair State |
Protection Group Local Role |
EMC Symmetrix Remote Data Facility Takeover Commands That Are Run on cluster-newyork |
---|---|---|
FailedOver |
Primary |
symrdf $option $dg write_disable r2 symrdf -g dg suspend symrdf $option $dg rw_enable r1 |
FailedOver |
Secondary |
No command is run. |
Synchronized, Suspended, R1 Updated, Partitioned |
All |
symrdf -g dg failover |
From a replication perspective, after a successful takeover, the Local-role property of the protection group is changed to reflect the new role, regardless of whether the application could be brought online on the new primary cluster as part of the takeover operation. On cluster-newyork, where the protection group had a Local-role of Secondary, the Local-role property of the protection group becomes Primary. On cluster-paris, where the protection group had a Local-role of Primary, the following might occur:
If the cluster can be reached, the Local-role property of the protection group becomes Secondary.
If the cluster cannot be reached, the Local-role property of the protection group remains Primary.
If the takeover is successful, the applications are brought online. You do not need to run a separate geopg start command.
After a successful takeover, data replication between the new primary cluster, cluster-newyork, and the old primary cluster, cluster-paris, is stopped. If you want to run a geopg start command, you must use the -n option to prevent replication from resuming.
Before you force the secondary cluster to assume the activity of the primary cluster, ensure that the following conditions are met:
Sun Cluster Geographic Edition software is up and running on the cluster.
The cluster is a member of a partnership.
The Configuration status of the protection group is OK on the secondary cluster.
Log in to a node in the secondary cluster.
You must be assigned the Geo Management RBAC rights profile to complete this procedure. For more information about RBAC, see Sun Cluster Geographic Edition Software and RBAC in Sun Cluster Geographic Edition System Administration Guide.
Initiate the takeover.
# geopg takeover [-f] protectiongroupname |
Forces the command to perform the operation without your confirmation
Specifies the name of the protection group
This example forces the takeover of srdfpg by the secondary cluster cluster-newyork.
The phys-newyork-1 cluster is the first node of the secondary cluster. For a reminder of which node is phys-newyork-1, see Example Sun Cluster Geographic Edition Cluster Configuration in Sun Cluster Geographic Edition System Administration Guide.
phys-newyork-1# geopg takeover -f srdfpg |
For information about the state of the primary and secondary clusters after a takeover, see Appendix C, Takeover Postconditions, in Sun Cluster Geographic Edition System Administration Guide.
After a successful takeover operation, the secondary cluster, cluster-newyork, becomes the primary for the protection group and the services are online on the secondary cluster. After the recovery of the original primary cluster, cluster-paris, the services can be brought online again on the original primary by using a process called failback.
Sun Cluster Geographic Edition software supports the following two kinds of failback:
Failback-switchover. During a failback-switchover, applications are brought online again on the original primary cluster, cluster-paris, after the data of the original primary cluster was resynchronized with the data on the secondary cluster, cluster-newyork.
For a reminder of which clusters are cluster-paris and cluster-newyork, see Example Sun Cluster Geographic Edition Cluster Configuration in Sun Cluster Geographic Edition System Administration Guide.
Failback-takeover. During a failback-takeover, applications are brought online again on the original primary cluster, cluster-paris, and use the current data on the original primary cluster. Any updates that occurred on the secondary cluster, cluster-newyork, while it was acting as primary, are discarded.
If you want to leave the new primary, cluster-newyork, as the primary cluster and the original primary cluster, cluster-paris, as the secondary after the original primary restarts, you can resynchronize and revalidate the protection group configuration without performing a switchover or takeover.
This section contains information about the following topics:
Use this procedure to resynchronize and revalidate data on the original primary cluster, cluster-paris, with the data on the current primary cluster, cluster-newyork.
Before you resynchronize and revalidate the protection group configuration, a takeover has occurred on cluster-newyork. The clusters now have the following roles:
If the original primary cluster, cluster-paris, has been down, confirm that the cluster is booted and that the Sun Cluster Geographic Edition infrastructure is enabled on the cluster. For more information about booting a cluster, see Booting a Cluster in Sun Cluster Geographic Edition System Administration Guide.
The protection group on cluster-newyork has the primary role.
The protection group on cluster-paris has either the primary role or secondary role, depending on whether the protection group could be reached during the takeover.
Resynchronize the original primary cluster, cluster-paris, with the current primary cluster, cluster-newyork.
cluster-paris forfeits its own configuration and replicates the cluster-newyork configuration locally. Resynchronize both the partnership and protection group configurations.
On cluster-paris, resynchronize the partnership.
phys-paris-1# geops update partnershipname |
Specifies the name of the partnership
You need to perform this step only once, even if you are resynchronizing multiple protection groups.
For more information about synchronizing partnerships, see Resynchronizing a Partnership in Sun Cluster Geographic Edition System Administration Guide.
On cluster-paris, resynchronize each protection group.
Because the role of the protection group on cluster-newyork is primary, this step ensures that the role of the protection group on cluster-paris is secondary.
phys-paris-1# geopg update protectiongroupname |
Specifies the name of the protection group
For more information about synchronizing protection groups, see Resynchronizing an EMC Symmetrix Remote Data Facility Protection Group.
On cluster-paris, validate the cluster configuration for each protection group.
phys-paris-1# geopg validate protectiongroupname |
Specifies a unique name that identifies a single protection group
For more information, see How to Validate an EMC Symmetrix Remote Data Facility Protection Group.
On cluster-paris, activate each protection group.
Because the protection group on cluster-paris has a role of secondary, the geopg start command does not restart the application on cluster-paris.
phys-paris-1# geopg start -n -e local protectiongroupname |
Specifies the scope of the command.
By specifying a local scope, the command operates on the local cluster only.
Specifies that data replication should not be used for this protection group. If this option is omitted, data replication starts at the same time as the protection group.
Specifies the name of the protection group.
Because the protection group has a role of secondary, the data is synchronized from the current primary, cluster-newyork, to the current secondary, cluster-paris.
For more information about the geopg start command, see How to Activate an EMC Symmetrix Remote Data Facility Protection Group.
Confirm that the protection group configuration is OK.
First, confirm that the state of the protection group on cluster-newyork is OK. The protection group has a local state of OK when the EMC Symmetrix Remote Data Facility device groups on cluster-newyork have a Synchronized EMC Symmetrix Remote Data Facility pair state.
phys-newyork-1# geoadm status |
Refer to the Protection Group section of the output.
Next, confirm that all resources in the replication resource group, protectiongroupname-rep-rg, report a status of OK.
phys-newyork-1# clresource status -g protectiongroupname-rep-rg |
Use this procedure to restart an application on the original primary cluster, cluster-paris, after the data on this cluster has been resynchronized with the data on the current primary cluster, cluster-newyork.
The failback procedures apply only to clusters in a partnership. You need to perform the following procedure only once per partnership.
Before you perform a failback-switchover, a takeover has occurred on cluster-newyork. The clusters have the following roles:
If the original primary cluster, cluster-paris, has been down, confirm that the cluster is booted and that the Sun Cluster Geographic Edition infrastructure is enabled on the cluster. For more information about booting a cluster, see Booting a Cluster in Sun Cluster Geographic Edition System Administration Guide.
The protection group on cluster-newyork has the primary role.
The protection group on cluster-paris has either the primary role or secondary role, depending on whether cluster-paris can be reached during the takeover from cluster-newyork.
Resynchronize the original primary cluster, cluster-paris, with the current primary cluster, cluster-newyork.
cluster-paris forfeits its own configuration and replicates the cluster-newyork configuration locally. Resynchronize both the partnership and protection group configurations.
On cluster-paris, resynchronize the partnership.
phys-paris-1# geops update partnershipname |
Specifies the name of the partnership
You need to perform this step only once per partnership, even if you are performing a failback-switchover for multiple protection groups in the partnership.
For more information about synchronizing partnerships, see Resynchronizing a Partnership in Sun Cluster Geographic Edition System Administration Guide.
Determine whether the protection group on the original primary cluster, cluster-paris, is active.
phys-paris-1# geoadm status |
If the protection group on the original primary cluster is active, stop it.
phys-paris-1# geopg stop -e local protectiongroupname |
Verify that the protection group is stopped.
phys-paris-1# geoadm status |
On cluster-paris, resynchronize each protection group.
Because the local role of the protection group on cluster-newyork is now primary, this steps ensures that the role of the protection group on cluster-paris becomes secondary.
phys-paris-1# geopg update protectiongroupname |
Specifies the name of the protection group
For more information about synchronizing protection groups, see Resynchronizing an EMC Symmetrix Remote Data Facility Protection Group.
On cluster-paris, validate the cluster configuration for each protection group.
Ensure that the protection group is not in an error state. A protection group cannot be started when it is in a error state.
phys-paris-1# geopg validate protectiongroupname |
Specifies a unique name that identifies a single protection group
For more information, see How to Validate an EMC Symmetrix Remote Data Facility Protection Group.
On cluster-paris, activate each protection group.
Because the protection group on cluster-paris has a role of secondary, the geopg start command does not restart the application on cluster-paris.
phys-paris-1# geopg start -e local protectiongroupname |
Specifies the scope of the command.
By specifying a local scope, the command operates on the local cluster only.
Specifies the name of the protection group.
Do not use the -n option when performing a failback-switchover because the data needs to be synchronized from the current primary, cluster-newyork, to the current secondary, cluster-paris.
Because the protection group has a role of secondary, the data is synchronized from the current primary, cluster-newyork, to the current secondary, cluster-paris.
For more information about the geopg start command, see How to Activate an EMC Symmetrix Remote Data Facility Protection Group.
Confirm that the data is completely synchronized.
The data is completely synchronized when the state of the protection group on cluster-newyork is OK. The protection group has a local state of OK when the EMC Symmetrix Remote Data Facility device groups on cluster-newyork have a Synchronized RDF pair state.
To confirm that the state of the protection group on cluster-newyork is OK, use the following command:
phys-newyork-1# geoadm status |
Refer to the Protection Group section of the output.
On both partner clusters, ensure that the protection group is activated.
# geoadm status |
On either cluster, perform a switchover from cluster-newyork to cluster-paris for each protection group.
# geopg switchover [-f] -m clusterparis protectiongroupname |
For more information, see How to Switch Over an EMC Symmetrix Remote Data Facility Protection Group From Primary to Secondary.
cluster-paris resumes its original role as primary cluster for the protection group.
Ensure that the switchover was performed successfully.
Verify that the protection group is now primary on cluster-paris and secondary on cluster-newyork and that the state for “Data replication” and “Resource groups” is OK on both clusters.
# geoadm status |
Check the runtime status of the application resource group and data replication for each EMC Symmetrix Remote Data Facility protection group.
# clresourcegroup status -v protectiongroupname |
Refer to the Status and Status Message fields that are presented for the data replication device group you want to check. For more information about these fields, see Table 2–1.
For more information about the runtime status of data replication, see Checking the Runtime Status of EMC Symmetrix Remote Data Facility Data Replication.
Use this procedure to restart an application on the original primary cluster, cluster-paris and use the current data on the original primary cluster. Any updates that occurred on the secondary cluster, cluster-newyork, while it was acting as primary are discarded.
The failback procedures apply only to clusters in a partnership. You need to perform the following procedure only once per partnership.
To resume using the data on the original primary, cluster-paris, you must not have replicated data from the new primary, cluster-newyork, to the original primary cluster, cluster-paris, at any point after the takeover operation on cluster-newyork. To prevent data replication between the new primary and the original primary, you must have used the -n option whenever you used the geopg start command.
Ensure that the clusters have the following roles:
If the original primary cluster, cluster-paris, has been down, confirm that the cluster is booted and that the Sun Cluster Geographic Edition infrastructure is enabled on the cluster. For more information about booting a cluster, see Booting a Cluster in Sun Cluster Geographic Edition System Administration Guide.
The protection group on cluster-newyork has the primary role.
The protection group on cluster-paris has either the primary role or secondary role, depending on whether cluster-paris can be reached during the takeover from cluster-newyork.
Resynchronize the original primary cluster, cluster-paris, with the original secondary cluster, cluster-newyork.
cluster-paris forfeits its own configuration and replicates the cluster-newyork configuration locally.
On cluster-paris, resynchronize the partnership.
phys-paris-1# geops update partnershipname |
Specifies the name of the partnership
You need to perform this step only once per partnership, even if you are performing a failback-takeover for multiple protection groups in the partnership.
For more information about synchronizing partnerships, see Resynchronizing a Partnership in Sun Cluster Geographic Edition System Administration Guide.
Determine whether the protection group on the original primary cluster, cluster-paris, is active.
phys-paris-1# geoadm status |
If the protection group on the original primary cluster is active, stop it.
phys-paris-1# geopg stop -e local protectiongroupname |
Verify that the protection group is stopped.
phys-paris-1# geoadm status |
On cluster-paris, resynchronize each protection group.
Because the local role of the protection group on cluster-newyork is now primary, this steps ensures that the role of the protection group on cluster-paris becomes secondary.
phys-paris-1# geopg update protectiongroupname |
Specifies the name of the protection group
For more information about resynchronizing protection groups, see How to Resynchronize a Protection Group.
On cluster-paris, validate the configuration for each protection group.
Ensure that the protection group is not in an error state. A protection group cannot be started when it is in a error state.
phys-paris-1# geopg validate protectiongroupname |
Specifies a unique name that identifies a single protection group
For more information, see How to Validate an EMC Symmetrix Remote Data Facility Protection Group.
On cluster-paris, activate each protection group in the secondary role without data replication.
Because the protection group on cluster-paris has a role of secondary, the geopg start command does not restart the application on cluster-paris.
You must use the -n option which specifies that data replication should not be used for this protection group. If this option is omitted, data replication starts at the same time as the protection group.
phys-paris-1# geopg start -e local -n protectiongroupname |
Specifies the scope of the command.
By specifying a local scope, the command operates on the local cluster only.
Specifies that data replication should not be used for this protection group. If this option is omitted, data replication starts at the same time as the protection group.
Specifies the name of the protection group
For more information, see How to Activate an EMC Symmetrix Remote Data Facility Protection Group.
Replication from cluster-newyork to cluster-paris is not started because the -n option is used on cluster-paris.
On cluster-paris, initiate a takeover for each protection group.
phys-paris-1# geopg takeover [-f] protectiongroupname |
Forces the command to perform the operation without your confirmation
Specifies the name of the protection group
For more information about the geopg takeover command, see How to Force Immediate Takeover of EMC Symmetrix Remote Data Facility Services by a Secondary Cluster.
The protection group on cluster-paris now has the primary role, and the protection group on cluster-newyork has the role of secondary. The application services are now online on cluster-paris.
On cluster-newyork, activate each protection group.
At the end of step 4, the local state of the protection group on cluster-newyork is Offline. To start monitoring the local state of the protection group, you must activate the protection group on cluster-newyork.
Because the protection group on cluster-newyork has a role of secondary, the geopg start command does not restart the application on cluster-newyork.
phys-newyork-1# geopg start -e local [-n] protectiongroupname |
Specifies the scope of the command.
By specifying a local scope, the command operates on the local cluster only.
Prevents the start of data replication at protection group startup.
If you omit this option, the data replication subsystem starts at the same time as the protection group.
Specifies the name of the protection group.
For more information about the geopg start command, see How to Activate an EMC Symmetrix Remote Data Facility Protection Group.
Ensure that the takeover was performed successfully.
Verify that the protection group is now primary on cluster-paris and secondary on cluster-newyork and that the state for “Data replication” and “Resource groups” is OK on both clusters.
# geoadm status |
If you used the -n option in step 5 to prevent data replication from starting, the “Data replication” status will not be in the OK state.
Check the runtime status of the application resource group and data replication for each EMC Symmetrix Remote Data Facility protection group.
# clresourcegroup status -v protectiongroupname |
Refer to the Status and Status Message fields that are presented for the data replication device group you want to check. For more information about these fields, see Table 2–1.
For more information about the runtime status of data replication, see Checking the Runtime Status of EMC Symmetrix Remote Data Facility Data Replication.
Basic Sun Cluster Geographic Edition operations such as geopg switchover, perform a symrdf swap operation at the EMC Symmetrix Remote Data Facility data replication level. In EMC Symmetrix Remote Data Facility terminology, a switchover is called a swap. The symrdf swap operation requires significantly more time for static RDF than dynamic RDF. Therefore, you might need to increase the value of the timeout property of the protection group when using static RDF.
If all of the EMC Symmetrix Remote Data Facility commands return a value of 0, the switchover is successful. In some cases, a command might return an error code (a value other than 0). These cases are considered switchover failures.
If a switchover failure occurs, the secondary volumes might not be fully synchronized with the primary volumes. Sun Cluster Geographic Edition software does not start the applications on the new intended primary cluster in a switchover failure scenario.
The remainder of this section describes the initial conditions that lead to a switchover failure and how to recover from a switchover failure.
This section contains information about the following topics:
This section describes a switchover failure scenario. In this scenario, cluster-paris is the original primary cluster and cluster-newyork is the original secondary cluster.
A switchover switches the services from cluster-paris to cluster-newyork as follows:
phys-newyork-1# geopg switchover -f -m cluster-newyork srdfpg |
While processing the geopg switchover command, the symrdf swap command runs and returns errors for the EMC Symmetrix Remote Data Facility device group, devgroup1. As a result, the geopg switchover command returns the following failure message:
Processing operation.... this may take a while .... "Switchover" failed for the following reason: Switchover failed for SRDF DG devgroup1 |
After this failure message has been issued, the two clusters are in the following states:
cluster-paris: srdfpg role: Secondary cluster-newyork: srdfpg role: Secondary phys-newyork-1# symdg list D E V I C E G R O U P S Number of Name Type Valid Symmetrix ID Devs GKs BCVs VDEVs devgroup1 RDF1 Yes 000187401215 2 0 0 0 devgroup2 RDF2 Yes 000187401215 6 0 0 0 |
This section describes procedures to recover from the failure scenario described in the previous section. These procedures bring the application online on the appropriate cluster.
Place the EMC Symmetrix Remote Data Facility device group, devgroup1, in the Split state.
Use the symrdf split commands to place the device groups that are in the protection group on both cluster-paris and cluster-newyork in the Split state.
phys-newyork-1# symrdf -g devgroup1 split |
Make one of the clusters Primary for the protection group.
Make the original primary cluster, cluster-paris, Primary for the protection group if you intend to start the application on the original primary cluster. The application uses the current data on the original primary cluster.
Make the original secondary cluster, cluster-newyork, Primary for the protection group if you intend to start the application on the original secondary cluster. The application uses the current data on the original secondary cluster.
Because the symrdf swap command did not perform a swap, the data volumes on cluster-newyork might not be synchronized with the data volumes on cluster-paris. If you intend to start the application with the same data as appears on the original primary cluster, you must not make the original secondary cluster Primary.
Deactivate the protection group on the original primary cluster.
phys-paris-1# geopg stop -e Local srdfpg |
Resynchronize the configuration of the protection group.
This command updates the configuration of the protection group on cluster-paris with the configuration information of the protection group on cluster-newyork.
phys-paris-1# geopg update srdfpg |
After the geopg update command run successfully, srdfpg has the following role on each cluster:
cluster-paris: srdfpg role: Primary cluster-newyork: srdfpg role: secondary |
Determine whether the device group has the RDF1 role on the original primary cluster.
phys-paris-1# symdg list | grep devgroup1 |
If the device group does not have the RDF1 role on the original primary cluster, run the symrdf swap command so that the device group, devgroup1, resumes the RDF1 role.
phys-paris-1# symrdf -g devgroup1 failover phys-paris-1# symrdf -g devgroup1 swap |
Confirm that the swap was successful by using the symrdf list command to view the device group information.
phys-paris-1# symdg list D E V I C E G R O U P S Number of Name Type Valid Symmetrix ID Devs GKs BCVs VDEVs devgroup1 RDF1 Yes 000187401215 6 0 0 0 devgroup2 RDF1 Yes 000187401215 2 0 0 0 |
Activate the protection group on both clusters in the partnership.
phys-paris-1# geopg start -e Global srdfpg |
This command starts the application on cluster-paris. Data replication starts from cluster-paris to cluster-newyork.
Resynchronize the configuration of the protection group.
This command updates the configuration of the protection group on cluster-newyork with the configuration information of the protection group on cluster-paris.
phys-newyork-1# geopg update srdfpg |
After the geopg update command runs successfully, srdfpg has the following role on each cluster:
cluster-paris: srdfpg role: Secondary cluster-newyork: srdfpg role: Primary |
Run the symrdf swap command so that the device group, devgroup2, has the RDF2 role.
phys-paris-1# symrdf -g devgroup2 failover phys-paris-1# symrdf -g devgroup2 swap |
Confirm that the swap was successful by using the symrdf list command to view the device group information.
phys-paris-1# symdg list D E V I C E G R O U P S Number of Name Type Valid Symmetrix ID Devs GKs BCVs VDEVs devgroup1 RDF2 Yes 000187401215 6 0 0 devgroup2 RDF2 Yes 000187401215 2 0 0 0 |
Activate the protection group on both clusters in the partnership.
phys-newyork-1# geopg start -e Global srdfpg |
This command starts the application on cluster-newyork. Data replication starts from cluster-newyork to cluster-paris.
This command overwrites the data on cluster-paris.
When an error occurs at the data replication level, the error is reflected in the status of the resource in the replication resource group of the relevant device group. This changed status appears in the Data Replication status field in the output of the geoadm status command for that protection group.
This section contains information about the following topics:
Check the status of the replication resources by using the scstat -g command.
# clresource status -v sc_geo_dr-SRDF-protectiongroupname-srdf dgname |
For information about how different Resource status values map to actual replication pair states, see Table 2–4.
Running the clresource status command might return the following:
… -- Resources -- Resource Name Node Name State Status Message ------------- --------- ----- -------------- Resource: sc_geo_dr-SRDF-srdfpg-devgroup1 pemc1 Online Online - Partitioned Resource: sc_geo_dr-SRDF-srdfpg-devgroup1 pemc2 Offline Offline … |
Display the aggregate resource status for all device groups in the protection group by using the geoadm status command.
For example, the output of the clresource status command in the preceding example indicates that the EMC Symmetrix Remote Data Facility device group, devgroup1, is in the Suspended state on cluster-paris. Table 2–4 indicates that the Suspended state corresponds to a resource status of FAULTED. So, the data replication state of the protection group is also FAULTED. This state is reflected in the output of the geoadm status command, which displays the state of the protection group as Error.
phys-paris-1# geoadm status Cluster: cluster-paris Partnership "paris-newyork-ps" : OK Partner clusters : cluster-newyork Synchronization : OK ICRM Connection : OK Heartbeat "paris-to-newyork" monitoring "cluster-newyork": OK Heartbeat plug-in "ping_plugin" : Inactive Heartbeat plug-in "tcp_udp_plugin" : OK Protection group "srdfpg" : Error Partnership : paris-newyork-ps Synchronization : OK Cluster cluster-paris : Error Role : Primary PG activation state : Activated Configuration : OK Data replication : Error Resource groups : OK Cluster cluster-newyork : Error Role : Secondary PG activation state : Activated Configuration : OK Data replication : Error Resource groups : OK |
To recover from an error state, you might perform some or all of the steps in the following procedure.
Use the procedures in the EMC Symmetrix Remote Data Facility documentation to determine the causes of the FAULTED state.
Recover from the faulted state by using the EMC Symmetrix Remote Data Facility procedures.
If the recovery procedures change the state of the device group, this state is automatically detected by the resource and is reported as a new protection group state.
Revalidate the protection group configuration.
phys-paris-1# geopg validate protectiongroupname |
Specifies the name of the EMC Symmetrix Remote Data Facility protection group
If the geopg validate command determines if the configuration is valid, the state of the protection group changes to reflect that fact. If the configuration is not valid, geopg validate returns a failure message.
Review the status of the protection group configuration.
phys-paris-1# geopg list protectiongroupname |
Specifies the name of the EMC Symmetrix Remote Data Facility protection group
Review the runtime status of the protection group.
phys-paris-1# geoadm status |