Skip Navigation Links | |
Exit Print View | |
Oracle Solaris Cluster Geographic Edition Remote Replication Guide for Sun ZFS Storage Appliance Oracle Solaris Cluster 4.1 |
1. Configuring and Administering Sun ZFS Storage Appliance Protection Groups
2. Migrating Services That Use Sun ZFS Storage Appliance Remote Replication
Detecting Cluster Failure on a System That Uses Sun ZFS Storage Appliance Remote Replication
Detecting Primary Cluster Failure
Detecting Secondary Cluster Failure
Migrating Services That Use Sun ZFS Storage Appliance Remote Replication With a Switchover
Actions Performed by the Geographic Edition Software During a Switchover
Forcing a Takeover on a System That Uses Sun ZFS Storage Appliance Remote Replication
How to Force Immediate Takeover of Sun ZFS Storage Appliance Services by a Secondary Cluster
Recovering Services to a Cluster on a System That Uses Sun ZFS Storage Appliance Replication
Overview of Recovering Services
How to Resynchronize and Revalidate the Protection Group Configuration
How to Perform a Failback-Switchover on a System That Uses Sun ZFS Storage Appliance Replication
How to Perform a Failback-Takeover on a System That Uses Sun ZFS Storage Appliance Replication
Recovering From a Sun ZFS Storage Appliance Remote Replication Error
How to Detect Remote Replication Errors
How to Recover From a Sun ZFS Storage Appliance Remote Replication Error
This section contains the following information:
How to Resynchronize and Revalidate the Protection Group Configuration
How to Perform a Failback-Switchover on a System That Uses Sun ZFS Storage Appliance Replication
How to Perform a Failback-Takeover on a System That Uses Sun ZFS Storage Appliance Replication
After a successful takeover operation, the secondary cluster becomes the primary for the protection group and the services are online on the secondary cluster. After the recovery of the original primary cluster the services can be brought online again on the original primary by using a process called failback.
The Geographic Edition software supports the following kinds of failback:
Failback-switchover. During a failback-switchover, applications are brought online again on the original primary cluster after the data of the original primary cluster was resynchronized with the data on the secondary cluster.
Failback-takeover. During a failback-takeover, applications are brought online again on the original primary cluster and use the current data on the original primary cluster. Any updates that occurred on the secondary cluster while it was acting as primary are discarded.
If you want to leave the new primary as the primary cluster and the original primary cluster as the secondary after the original primary restarts, you can resynchronize and revalidate the protection group configuration without performing a switchover or takeover.
Use this procedure to resynchronize and revalidate data on the original primary cluster with the data on the current primary cluster.
Before You Begin
Before you resynchronize and revalidate the protection group configuration, a takeover has occurred on the current primary cluster. Ensure that the clusters now have the following roles:
If the original primary cluster had been down, the cluster has been booted and the Geographic Edition infrastructure is enabled on the cluster. For more information about booting a cluster, see Booting a Cluster in Oracle Solaris Cluster Geographic Edition System Administration Guide.
The protection group on the current primary cluster has the primary role.
The protection group on the original primary cluster has either the primary role or secondary role, depending on whether the protection group could be reached during the takeover.
This procedure uses the example names cluster-paris for the original primary cluster and cluster-newyork for the current primary cluster.
The cluster-paris cluster forfeits its own configuration and replicates the cluster-newyork configuration locally. Resynchronize both the partnership and protection group configurations.
phys-paris-1# geops update ps-name
Specifies the name of the partnership.
Note - You need to perform this step only once, even if you are resynchronizing multiple protection groups.
For more information about synchronizing partnerships, see Resynchronizing a Partnership in Oracle Solaris Cluster Geographic Edition System Administration Guide.
Because the role of the protection group on cluster-newyork is primary, this step ensures that the role of the protection group on cluster-paris is secondary.
phys-paris-1# geopg update pg-name
Specifies the name of the protection group.
For more information about synchronizing protection groups, see Resynchronizing a Sun ZFS Storage Appliance Protection Group.
phys-paris-1# geopg validate pg-name
Specifies a unique name that identifies a single protection group.
For more information, see How to Validate a Sun ZFS Storage Appliance Protection Group.
When you activate a protection group, the protection group's application resource groups are also brought online.
phys-paris-1# geopg start -e global pg-name
Specifies the scope of the command.
By specifying a global scope, the command operates on both clusters where the protection group is located.
Note - The property values, such as global and local, are not case sensitive.
Specifies the name of the protection group.
Caution - Do not use the -n option because the data needs to be synchronized from the current primary cluster, cluster-newyork, to the current standby cluster, cluster-paris. |
Because the protection group has a role of secondary, the data is synchronized from the current primary, cluster-newyork, to the current secondary, cluster-paris.
For more information about the geopg start command, see How to Activate a Sun ZFS Storage Appliance Protection Group.
phys-newyork-1# geoadm status
The protection group has a local state of OK when the Sun ZFS Storage Appliance components on cluster-newyork have a Synchronized pair state.
Refer to the Protection Group section of the output.
phys-newyork-1# clresource status -g zfssa-rep-rg
Use this procedure to restart an application on the original primary cluster, cluster-paris, after the data on this cluster has been resynchronized with the data on the current primary cluster, cluster-newyork.
Note - The failback procedures apply only to clusters in a partnership. You need to perform the following procedure only once per partnership.
Before You Begin
Before you perform a failback-switchover, a takeover has occurred on cluster-newyork. Ensure that the clusters have the following roles:
If the original primary cluster had been down, the cluster has been booted and that the Geographic Edition infrastructure is enabled on the cluster. For more information about booting a cluster, see Booting a Cluster in Oracle Solaris Cluster Geographic Edition System Administration Guide.
The protection group on the current primary cluster has the primary role.
The protection group on the original primary cluster has either the primary role or secondary role, depending on whether the original primary cluster can be reached during the takeover from the current primary cluster.
This procedure uses the example names cluster-paris for the original primary cluster and cluster-newyork for the current primary cluster.
This task is necessary to finish recovery if the cluster had experienced a complete site failure or a takeover. Data stores at cluster-newyork will have changed and will need to be replicated back to cluster-paris when it is put back in service.
Perform these steps for each project that is replicated.
This executes a manual replication to synchronize the two sites.
phys-paris-1# geoadm status
phys-paris-1# geopg stop -e local pg-name
Specifies the name of the protection group
phys-paris-1# geoadm status
The cluster-paris cluster forfeits its own configuration and replicates the cluster-newyork configuration locally. Resynchronize both the partnership and protection group configurations.
phys-paris-1# geops update ps-name
Specifies the name of the partnership
Note - Perform this step only once per partnership, even if you are performing a failback-switchover for multiple protection groups in the partnership.
For more information about synchronizing partnerships, see Resynchronizing a Partnership in Oracle Solaris Cluster Geographic Edition System Administration Guide.
Because the local role of the protection group on cluster-newyork is now primary, this steps ensures that the role of the protection group on cluster-paris becomes secondary.
phys-paris-1# geopg update pg-name
For more information about synchronizing protection groups, see Resynchronizing a Sun ZFS Storage Appliance Protection Group.
Ensure that the protection group is not in an error state. A protection group cannot be started when it is in a error state.
phys-paris-1# geopg validate pg-name
Specifies a unique name that identifies a single protection group
For more information, see How to Validate a Sun ZFS Storage Appliance Protection Group.
Because the protection group on cluster-paris has a role of secondary, the geopg start command does not restart the application on cluster-paris.
phys-paris-1# geopg start -e global pg-name
Specifies the scope of the command. By specifying a global scope, the command operates on both clusters.
Specifies the name of the protection group.
Note - Do not use the -n option when performing a failback-switchover. The data must be synchronized from the current primary, cluster-newyork, to the current secondary, cluster-paris.
Because the protection group has a role of secondary, the data is synchronized from the current primary, cluster-newyork, to the current secondary, cluster-paris.
For more information about the geopg start command, see How to Activate a Sun ZFS Storage Appliance Protection Group.
The data is completely synchronized when the state of the protection group on cluster-newyork is OK. The protection group has a local state of OK when the appliance data store on cluster-newyork is being updated to the cluster-paris cluster.
To confirm that the state of the protection group on cluster-newyork is OK, use the following command:
phys-newyork-1# geoadm status
Refer to the Protection Group section of the output.
# geoadm status
# geopg switchover [-f] -m cluster-paris pg-name
For more information, see How to Switch Over Sun ZFS Storage Appliance Remote Replication From the Primary Cluster to the Secondary Cluster.
cluster-paris resumes its original role as primary cluster for the protection group.
Verify that the protection group is now primary on cluster-paris and secondary on cluster-newyork and that the state for “Data replication” and “Resource groups” is OK on both clusters.
# geoadm status
Check the runtime status of the application resource group and replication for each protection group.
# clresourcegroup status -v pg-name
Refer to the Status and Status Message fields that are presented for the replication component you want to check.
For more information about the runtime status of replication, see Checking the Runtime Status of Sun ZFS Storage Appliance Remote Replication.
Use this procedure to restart an application on the original primary cluster and use the current data on the original primary cluster. Any updates that occurred on the secondary cluster while it was acting as primary are discarded.
The failback procedures apply only to clusters in a partnership. Perform the following procedure only once per partnership.
Note - To resume using the data on the original primary you must not have replicated data from the new primary to the original primary cluster, cluster-paris, at any point after the takeover operation on the current primary cluster. To prevent replication between the current primary and the original primary, you must have used the -n option whenever you used the geopg start command.
Before You Begin
Ensure that the clusters have the following roles:
If the original primary cluster had been down, the cluster is booted and the Geographic Edition infrastructure is enabled on the cluster. For more information about booting a cluster, see Booting a Cluster in Oracle Solaris Cluster Geographic Edition System Administration Guide.
The protection group on the current primary cluster has the primary role.
The protection group on the original primary cluster has either the primary role or secondary role, depending on whether the original primary can be reached during the takeover from the current primary.
This procedure uses the example names cluster-paris for the original primary cluster and cluster-newyork for the current primary cluster.
The package is created on the original primary appliance, paris. The corresponding package is created on the original secondary appliance, newyork.
phys-newyork-1# geopg stop -e local pg-name
Specifies the scope of the command. By specifying a local scope, the command operates on the local cluster only.
Specifies the name of the protection group.
Note - Wait for the replica package to appear on cluster-newyork before you continue to the next step.
phys-paris-1# geopg takeover pg-name
phys-newyork-1# geopg update pg-name
Ensure that the protection group is not in an error state. A protection group cannot be started when it is in a error state.
phys-paris-1# geopg validate pg-name
For more information, see How to Validate a Sun ZFS Storage Appliance Protection Group.
phys-paris-1# geopg start -e global pg-name
The protection group on cluster-paris now has the primary role, and the protection group on cluster-newyork has the role of secondary. The application services are now online on cluster-paris.
For more information, see How to Activate a Sun ZFS Storage Appliance Protection Group.
# geoadm status