Recovering After a Dual Failure of Both Active and Standby Databases

If both the active and standby databases fail at around the same time and if you can reconnect to both of them almost immediately, then restart the replication agents (and cache agents if applicable) and continue.

  1. Connect to the failed active database. This triggers recovery from the local transaction logs. If you are replicating a read-only cache group, the autorefresh state is automatically set to PAUSED.
  2. Verify that the replication agent for the failed active database has restarted. If it has not restarted, then start the replication agent. See Starting and Stopping the Replication Agents.
  3. Call ttRepStateSet('ACTIVE') on the newly recovered database. If you are replicating a read-only cache group, this action automatically causes the autorefresh state to change from PAUSED to ON for this database.
  4. Verify that the cache agent for the failed database has restarted. If it has not restarted, then start the cache agent.
  5. Connect to the failed standby master database. This triggers recovery from the local transaction logs. If you are replicating a read-only cache group, the autorefresh state is automatically set to PAUSED.
  6. Verify that the replication agent for the failed standby database has restarted. If it has not restarted, then start the replication agent. See Starting and Stopping the Replication Agents.
  7. Verify that the cache agent for the failed standby database has restarted. If it has not restarted, then start the cache agent.

Alternatively, consider the following scenarios where both the active and standby master databases fail:

  • The standby database fails. The active database fails before the standby comes back up or before the standby has been synchronized with the active database.

  • The active database fails. The standby database becomes ACTIVE, and the rest of the recovery process begins. (See Recovering From a Failure of the Active Database.) The new active database fails before the new standby database is fully synchronized with it.

In these scenarios, the subscribers may have had more changes applied than the standby database.

In this case, you could potentially perform one of the following options:

Recover the Active Database and Duplicate a New Standby Database

You can recover an active database and then duplicate it to a new standby database.

  1. Connect to the failed active database. This triggers recovery from the local transaction logs. If you are replicating a read-only cache group, the autorefresh state is automatically set to PAUSED.

    Note:

    If this fails, perform the steps listed in Restore the Active Master From a Backup..

  2. Verify that the replication agent for the failed active database has restarted. If it has not restarted, then start the replication agent. See Starting and Stopping the Replication Agents.
  3. Call ttRepStateSet('ACTIVE') on the newly recovered database. If you are replicating a read-only cache group, this action automatically causes the autorefresh state to change from PAUSED to ON for this database.
  4. Verify that the cache agent for the failed database has restarted. If it has not restarted, then start the cache agent.
  5. Duplicate the active database to the standby database. You can use either the ttRepAdmin -duplicate utility or the ttRepDuplicateEx C function to duplicate a database. Use the -keepCG command line option with ttRepAdmin to preserve the cache group. See Duplicating a Database.
  6. Set up the replication agent policy on the standby database and start the replication agent. See Starting and Stopping the Replication Agents.
  7. Wait for the standby database to enter the STANDBY state. Use the ttRepStateGet built-in procedure to check the state.
  8. Start the cache agent for on the standby database using the ttCacheStart built-in procedure or the ttAdmin -cacheStart utility.
  9. Duplicate all of the subscribers from the standby database. See Duplicating a Master Database to a Subscriber. Use the -noKeepCG command line option with ttRepAdmin in order to convert the cache group to regular TimesTen tables on the subscribers.
  10. Set up the replication agent policy on the subscribers and start the agent on each of the subscriber databases. See Starting and Stopping the Replication Agents.

Recover the Standby Database to Be the New Active Master

  1. Connect to the failed standby master database. This triggers recovery from the local transaction logs. If you are replicating a read-only cache group, the autorefresh state is automatically set to PAUSED.

    Note:

    If this fails, perform the steps listed in Restore the Active Master From a Backup.

  2. If the replication agent for the failed standby master has automatically restarted, stop the replication agent. See Starting and Stopping the Replication Agents.
  3. If the cache agent has automatically restarted, stop the cache agent.
  4. Drop the replication configuration using the DROP ACTIVE STANDBY PAIR statement.
  5. Drop and re-create all cache groups using the DROP CACHE GROUP and CREATE CACHE GROUP statements.
  6. Re-create the replication configuration using the CREATE ACTIVE STANDBY PAIR statement.
  7. Call ttRepStateSet('ACTIVE') on the master database, giving it the ACTIVE role. If you are replicating a read-only cache group, this action automatically causes the autorefresh state to change from PAUSED to ON for this database.
  8. Set up the replication agent policy and start the replication agent on the new active database. See Starting and Stopping the Replication Agents.
  9. Start the cache agent on the new active database.
  10. Duplicate the active database to the standby database. You can use either the ttRepAdmin -duplicate utility or the ttRepDuplicateEx C function to duplicate a database. Use the -keepCG command line option with ttRepAdmin to preserve the cache group. See Duplicating a Database.
  11. Set up the replication agent policy on the standby database and start the replication agent on the new standby database. See Starting and Stopping the Replication Agents.
  12. Wait for the standby database to enter the STANDBY state. Use the ttRepStateGet built-in procedure to check the state.
  13. Start the cache agent for the standby database using the ttCacheStart built-in procedure or the ttAdmin -cacheStart utility.
  14. Duplicate all of the subscribers from the standby database. See Duplicating a Master Database to a Subscriber. Use the -noKeepCG command line option with ttRepAdmin in order to convert the cache group to regular TimesTen tables on the subscribers.
  15. Set up the replication agent policy on the subscribers and start the agent on each of the subscriber databases. See Starting and Stopping the Replication Agents.

Restore the Active Master From a Backup

If both the active and standby masters fail and neither comes up, you can restore the active master if you have a backup.

  1. Restore the active master from a backup, as described in Backing Up and Restoring a TimesTen Classic Database With Cache Groups in the Oracle TimesTen In-Memory Database Cache Guide.
  2. Drop the replication configuration using the DROP ACTIVE STANDBY PAIR statement.
  3. Drop and re-create all AWT cache groups using the DROP CACHE GROUP and CREATE CACHE GROUP statements.
  4. Start the replication agent and the cache agent, since the cache agent needs to be active to refresh any read-only cache groups and both must be active in order to load the AWT cache groups.
  5. Refresh all read-only cache groups using the REFRESH CACHE GROUP statement to upload most current committed data from the cached Oracle database tables. Use the REFRESH CACHE GROUP ... PARALLEL n clause to concurrently load these cache groups over multiple threads.
  6. Load all AWT cache groups using the LOAD CACHE GROUP statement to begin the autorefresh process. Use the LOAD CACHE GROUP ... PARALLEL n clause to concurrently load these cache groups over multiple threads.
  7. Stop both the replication agent and the cache agent in preparation to re-create the active standby pair.
  8. Re-create the replication configuration using the CREATE ACTIVE STANDBY PAIR statement.
  9. Call ttRepStateSet('ACTIVE') on the active master database, giving it the ACTIVE role. If you are replicating a read-only cache group, this action automatically causes the autorefresh state to change from PAUSED to ON for this database.
  10. Set up the replication agent policy and start the replication agent on the active database. See Starting and Stopping the Replication Agents.
  11. Start the cache agent on the active database.
  12. Duplicate the active database to the standby database. You can use either the ttRepAdmin -duplicate utility or the ttRepDuplicateEx C function to duplicate a database. Use the -keepCG command line option with ttRepAdmin to preserve the cache group. See Duplicating a Database.
  13. Set up the replication agent policy on the standby database and start the replication agent on the new standby database. See Starting and Stopping the Replication Agents.
  14. Wait for the standby database to enter the STANDBY state. Use the ttRepStateGet built-in procedure to check the state.
  15. Start the cache agent for the standby database using the ttCacheStart built-in procedure or the ttAdmin -cacheStart utility.
  16. Duplicate all of the subscribers from the standby database. See Duplicating a Master Database to a Subscriber. Use the -noKeepCG command line option with ttRepAdmin in order to convert the cache group to regular TimesTen tables on the subscribers.
  17. Set up the replication agent policy on the subscribers and start the agent on each of the subscriber databases. See Starting and Stopping the Replication Agents.