5 Administering an Active Standby Pair without Cache Groups

This chapter describes how to administer an active standby pair that does not replicate cache groups. For information about administering active standby pairs that replicate cache groups, see Chapter 6, "Administering an Active Standby Pair with Cache Groups".

For information about managing failover and recovery automatically, see Chapter 8, "Using Oracle Clusterware to Manage Active Standby Pairs".

This chapter includes the following topics:

Setting up an active standby pair with no cache groups

To set up an active standby pair, complete the tasks in this section. See "Configuring an active standby pair with one subscriber" for an example.

Note:

If you intend to replicate read-only cache groups or asynchronous writethrough (AWT) cache groups, see Chapter 6, "Administering an Active Standby Pair with Cache Groups".

Before you create a database, see the information in these sections:

  1. Create a database. See "Managing TimesTen Databases" in Oracle TimesTen In-Memory Database Operations Guide.

  2. Create the replication scheme using the CREATE ACTIVE STANDBY PAIR statement. See Chapter 3, "Defining an Active Standby Pair Replication Scheme".

  3. Call ttRepStateSet('ACTIVE') on the active database.

  4. Start the replication agent. See "Starting and stopping the replication agents".

  5. Create a user on the active database and grant the ADMIN privilege to the user.

  6. Duplicate the active database to the standby database.

  7. Start the replication agent on the standby database. See "Starting and stopping the replication agents".

  8. Wait for the standby database to enter the STANDBY state. Use the ttRepStateGet built-in procedure to check the state of the standby database.

  9. Duplicate all of the subscribers from the standby database. See "Duplicating a master database to a subscriber".

  10. Set up the replication agent policy and start the replication agent on each of the subscriber databases. See "Starting and stopping the replication agents".

Recovering from a failure of the active database

This section includes the following topics:

Recovering when the standby database is ready

This section describes how to recover the active database when the standby database is available and synchronized with the active database. It includes the following topics:

When replication is return receipt or asynchronous

Complete the following tasks:

  1. Stop the replication agent on the failed database if it has not already been stopped.

  2. On the standby database, call ttRepStateSet('ACTIVE'). This changes the role of the database from STANDBY to ACTIVE.

  3. On the new active database, call ttRepStateSave('FAILED', 'failed_database','host_name'), where failed_database is the former active database that failed. This step is necessary for the new active database to replicate directly to the subscriber databases. During normal operation, only the standby database replicates to the subscribers.

  4. Destroy the failed database.

  5. Duplicate the new active database to the new standby database.

  6. Set up the replication agent policy and start the replication agent on the new standby database. See "Starting and stopping the replication agents".

The standby database contacts the active database. The active database stops sending updates to the subscribers. When the standby database is fully synchronized with the active database, then the standby database enters the STANDBY state and starts sending updates to the subscribers.

Note:

You can verify that the standby database has entered the STANDBY state by using the ttRepStateGet built-in procedure.

When replication is return twosafe

Complete the following tasks:

  1. On the standby database, call ttRepStateSet('ACTIVE'). This changes the role of the database from STANDBY to ACTIVE.

  2. On the new active database, call ttRepStateSave('FAILED', 'failed_database','host_name'), where failed_database is the former active database that failed. This step is necessary for the new active database to replicate directly to the subscriber databases. During normal operation, only the standby database replicates to the subscribers.

  3. Connect to the failed database. This triggers recovery from the local transaction logs. If database recovery fails, you must continue from Step 5 of the procedure for recovering when replication is return receipt or asynchronous. See "When replication is return receipt or asynchronous".

  4. Verify that the replication agent for the failed database has restarted. If it has not restarted, then start the replication agent. See "Starting and stopping the replication agents".

When the active database determines that it is fully synchronized with the standby database, then the standby database enters the STANDBY state and starts sending updates to the subscribers.

Note:

You can verify that the standby database has entered the STANDBY state by using the ttRepStateSet built-in procedure.

Failing back to the original nodes

After a successful failover, you may want to fail back so that the active database and the standby database are on their original nodes. See "Reversing the roles of the active and standby databases" for instructions.

Recovering from a failure of the standby database

To recover from a failure of the standby database, complete the following tasks:

  1. Detect the standby database failure.

  2. If return twosafe service is enabled, the failure of the standby database may prevent a transaction in progress from being committed on the active database, resulting in error 8170, "Receipt or commit acknowledgement not returned in the specified timeout interval". If so, then call the ttRepSyncSet built-in procedure with a localAction parameter of 2 (COMMIT) and commit the transaction again. For example:

    Command> call ttRepSyncSet( null, null, 2);
    Command> commit;
    
  3. Call ttRepStateSave('FAILED','standby_database','host_name') on the active database. Afterwards, as long as the standby database is unavailable, updates to the active database are replicated directly to the subscriber databases. Subscriber databases may also be duplicated directly from the active.

  4. If the replication agent for the standby database has automatically restarted, stop the replication agent. See "Starting and stopping the replication agents".

  5. Recover the standby database in one of the following ways:

    • Connect to the standby database. This triggers recovery from the local transaction logs.

    • Duplicate the standby database from the active database.

    The amount of time that the standby database has been down and the amount of transaction logs that need to be applied from the active database determine the method of recovery that you should use.

  6. Set up the replication agent policy and start the replication agent on the new standby database. See "Starting and stopping the replication agents".

The standby database enters the STANDBY state and starts sending updates to the subscribers after the active database determines that the two master databases have been synchronized and stops sending updates to the subscribers.

Note:

You can verify that the standby database has entered the STANDBY state by using the ttRepStateGet built-in procedure.

Recovering after a dual failure of both active and standby databases

Consider the following scenarios:

  • The standby database fails. Then, the active database fails before the standby comes back up or before the standby has been synchronized with the active database.

  • The active database fails. The standby database becomes ACTIVE, and the rest of the recovery process begins. (See "Recovering from a failure of the active database".) The new active database fails before the new standby database is fully synchronized with it.

In both scenarios, the subscribers may have had more changes applied than the standby database.

When the active database fails and the standby database has not applied all of the changes that were last sent from the active database, there are two choices for recovery:

  • Recover the active database from the local transaction logs.

  • Recover the standby database from the local transaction logs.

The choice depends on which database is available and which is more up to date.

Recover the active database

  1. Connect to the failed active database. This triggers recovery from the local transaction logs.

  2. Verify that the replication agent for the failed active database has restarted. If it has not restarted, then start the replication agent. See "Starting and stopping the replication agents".

  3. Call ttRepStateSet('ACTIVE') on the newly recovered database.

  4. Continue with Step 6 in "Setting up an active standby pair with no cache groups".

Recover the standby database

  1. Connect to the failed standby database. This triggers recovery from the local transaction logs.

  2. If the replication agent for the standby database has automatically restarted, you must stop the replication agent. See "Starting and stopping the replication agents".

  3. Drop the replication configuration using the DROP ACTIVE STANDBY PAIR statement.

  4. Re-create the replication configuration using the CREATE ACTIVE STANDBY PAIR statement.

  5. Call ttRepStateSet('ACTIVE') on the master database, giving it the ACTIVE role.

  6. Set up the replication agent policy and start the replication agent on the new standby database. See "Starting and stopping the replication agents".

  7. Continue from Step 6 in "Setting up an active standby pair with no cache groups".

Recovering from the failure of a subscriber database

If a subscriber database fails, then you can recover it by one of the following methods:

  • Connect to the failed subscriber. This triggers recovery from the local transaction logs. Start the replication agent and let the subscriber catch up.

  • Duplicate the subscriber from the standby database.

If the standby database is down or in recovery, then duplicate the subscriber from the active database.

After the subscriber database has been recovered, then set up the replication agent policy and start the replication agent. See "Starting and stopping the replication agents".

Reversing the roles of the active and standby databases

To change the role of the active database to standby and vice versa:

  1. Pause any applications that are generating updates on the current active database.

  2. Call ttRepSubscriberWait on the active database, with the DSN and host of the current standby database as input parameters. It must return success (<00>). This ensures that all updates have been transmitted to the current standby database.

  3. Stop the replication agent on the current active database. See "Starting and stopping the replication agents".

  4. Call ttRepDeactivate on the current active database. This puts the database in the IDLE state.

  5. Call ttRepStateSet('ACTIVE') on the current standby database. This database now acts as the active database in the active standby pair.

  6. Set up the replication agent policy and start the replication agent on the old active database.

  7. Use the ttRepStateGet built-in procedure to determine when the database's state has changed from IDLE to STANDBY. The database now acts as the standby database in the active standby pair.

  8. Resume any applications that were paused in Step 1.

Detection of dual active databases

Ordinarily, the designation of the active and standby databases in an active standby pair is explicitly controlled by the user. However, in some circumstances the user may not have the ability to modify both the active and standby databases when changing the role of the standby database to active.

For example, if network communication to the site of an active database is interrupted, the user may need the standby database at a different site to take over the role of the active, but cannot stop replication on the current active or change its role manually. Changing the standby database to active without first stopping replication on the active leads to a situation where both masters are in the ACTIVE state and accepting transactions. In such a scenario, TimesTen can automatically negotiate the active/standby role of the master databases when network communication between the databases is restored.

If, during the initial handshake between the databases, TimesTen determines that the master databases in an active standby pair replication scheme are both in the ACTIVE state, TimesTen performs the following operations automatically:

  • The database which was set to the ACTIVE state most recently is left in the ACTIVE state and may continue to be connected to and updated by applications.

  • The database which was set to the ACTIVE state least recently is invalidated. All applications are disconnected.

  • When the invalidated database comes back up, TimesTen determines whether any transactions have occurred on the database that have not yet been replicated to the other master database. If such transactions have occurred, they are now trapped, and the database is left in the IDLE state. The database needs to be duplicated from the active in order to become a standby. If there are no trapped transactions, the database is safe to use as a standby database and is automatically set to the STANDBY state.