Recovery Strategy After a Takeover of a MySQL Protection Group

Language:

When an old primary cluster is restarting for the first time after a successful takeover, the MySQL database does not detect that the cluster should no longer act as a master and the disaster recovery framework still keeps the primary role, but leaves it deactivated. The goal for the recovery is to configure the old master to run as a slave and to update the disaster recovery framework configuration to reflect this role change.

You can check for the status with the following command:

# geoadm status

The recovery strategy after a takeover involves the following actions:

Configuring the old master to run as a slave
Manually starting the slave threads on the old master
Resynchronizing the protection group to switch the role

How to Recover After a Takeover

Note - You can also accomplish Step 8 and Step 9 by using the Oracle Solaris Cluster Manager browser interface. Click Partnerships, click the partnership name, and highlight the protection group name. Click Update Protection Group and when the update is completed click Start Protection Group. For more information about Oracle Solaris Cluster Manager, see Chapter 12, Using the Oracle Solaris Cluster Manager Browser Interface in Administering an Oracle Solaris Cluster 4.4 Configuration.

Log in to a node of the old primary cluster.
You must be assigned the Geo Management rights profile to complete this procedure. For more information, see Securing Disaster Recovery Framework Software in Installing and Configuring the Disaster Recovery Framework for Oracle Solaris Cluster 4.4.
Allow the MySQL slave threads to be started if the database resource performs a restart or similar action.
1. Remove the skip-slave-start keyword from the appropriate my.cnf file.
2. If, when the protection group was created, the mysql_geo_config registration file contained READONLY=true, remove the read-only=true entry from the appropriate my.cnf file.
Log in to MySQL as the root role, then start the slave.
```
mysql> start slave;
```
Verify that the slave is running, and wait until it is synchronized with the master.
```
mysql> show slave status\G
```
If the slave status shows that at least one slave thread is not running, fix the root cause, and retry the operation. As a last resort, you could take a backup from the current master and perform a fresh slave setup.
From a node of the old primary cluster, update the protection group to change the role from a deactivated primary cluster to a secondary cluster.
```
# geopg update protection-group
```
Prepare the protection group on the original primary cluster for recovery.
1. View the status of the protection group.
```
phys-paris-1# geopg status protection-group
```
2. If the protection group is still active, stop it.
```
phys-paris-1# geopg stop -e local protection-group
```
3. If the protection group in in an Error state, validate it.
```
phys-paris-1# geopg validate protection-group
```
Ensure that the protection group is valid.

Resynchronize the protection group.
```
# geopg update protection-group
```

Start the protection group locally.
```
# geopg start -e local protection-group
```
For more information, see Resynchronizing a Protection Group in Administering the Disaster Recovery Framework for Oracle Solaris Cluster 4.4 and Activating and Deactivating a Protection Group in Administering the Disaster Recovery Framework for Oracle Solaris Cluster 4.4.