6 Administering an Active Standby Pair with Cache Groups

You can replicate tables within either a read-only cache group or an asynchronous writethrough (AWT) cache group as long as they are configured within an active standby pair.

Note:

For information about managing failover and recovery automatically, see Chapter 8, "Using Oracle Clusterware to Manage Active Standby Pairs".

The following sections describe how to administer an active standby pair that replicates cache groups:

Active standby pairs with cache groups

An active standby pair that replicates a read-only cache group or an asynchronous writethrough (AWT) cache group can change the role of the cache group automatically as part of failover and recovery. This helps ensure high availability of cache instances with minimal data loss. See "Replicating an AWT cache group" and "Replicating a read-only cache group".

Note:

TimesTen does not support replication of a user managed cache group or a synchronous writethrough (SWT) cache group in an active standby pair.

You can also create a special disaster recovery read-only subscriber when you set up active standby replication of an AWT cache group. This special subscriber, located at a remote disaster recovery site, can propagate updates to a second Oracle database, also located at the disaster recovery site. See "Using a disaster recovery subscriber in an active standby pair".

Setting up an active standby pair with a read-only cache group

This section describes how to set up an active standby pair that replicates cache tables in a read-only cache group. The active standby pair used as an example in this section is not a cache grid member.

Before you create a database, see the information in these sections:

To set up an active standby pair that replicates a local read-only cache group, complete the following tasks:

  1. Create a cache administration user in the Oracle database. See "Create users in the Oracle database" in Oracle TimesTen Application-Tier Database Cache User's Guide.

  2. Create a database. See "Create a DSN for the TimesTen database" in Oracle TimesTen Application-Tier Database Cache User's Guide.

  3. Set the cache administration user ID and password by calling the ttCacheUidPwdSet built-in procedure. See "Set the cache administration user name and password in the TimesTen database" in Oracle TimesTen Application-Tier Database Cache User's Guide. For example:

    Command> call ttCacheUidPwdSet('orauser','orapwd');
    
  4. Start the cache agent on the database. Use the ttCacheStart built-in procedure or the ttAdmin -cachestart utility.

    Command> call ttCacheStart;
    
  5. Use the CREATE CACHE GROUP statement to create the read-only cache group. For example:

    Command> CREATE READONLY CACHE GROUP readcache
           > AUTOREFRESH INTERVAL 5 SECONDS
           > FROM oratt.readtab
           > (keyval NUMBER NOT NULL PRIMARY KEY, str VARCHAR2(32));
    
  6. Ensure that the autorefresh state is set to PAUSED. The autorefresh state is PAUSED by default after cache group creation. You can verify the autorefresh state by executing the ttIsql cachegroups command:

    Command> cachegroups;
    
  7. Create the replication scheme using the CREATE ACTIVE STANDBY PAIR statement.

    For example, suppose master1 and master2 are defined as the master databases. sub1 and sub2 are defined as the subscriber databases. The databases reside on node1, node2, node3, and node4. The return service is RETURN RECEIPT. The replication scheme can be specified as follows:

    Command> CREATE ACTIVE STANDBY PAIR master1 ON "node1", master2 ON "node2"
           > RETURN RECEIPT
           > SUBSCRIBER sub1 ON "node3", sub2 ON "node4"
           > STORE master1 ON "node1" PORT 21000 TIMEOUT 30
           > STORE master2 ON "node2" PORT 20000 TIMEOUT 30;
    
  8. Set the replication state to ACTIVE by calling the ttRepStateSet built-in procedure on the active database (master1). For example:

    Command> call ttRepStateSet('ACTIVE');
    
  9. Set up the replication agent policy for master1 and start the replication agent. See "Starting and stopping the replication agents".

  10. Load the cache group by using the LOAD CACHE GROUP statement. This starts the autorefresh process. For example:

    Command> LOAD CACHE GROUP readcache COMMIT EVERY 256 ROWS;
    
  11. As the instance administrator, duplicate the active database (master1) to the standby database (master2). Use the ttRepAdmin -duplicate utility with the -keepCG option to preserve the cache group. Alternatively, you can use the ttRepDuplicateEx C function to duplicate the database. See "Duplicating a database". ttRepAdmin prompts for the values of -uid, -pwd, -cacheuid and -cachepwd.

    ttRepAdmin -duplicate -from master1 -host node1 -keepCG 
     -connStr "DSN=master2;UID=;PWD="
    
  12. Set up the replication agent policy on master2 and start the replication agent. See "Starting and stopping the replication agents".

  13. The standby database enters the STANDBY state automatically. Wait for master2 to enter the STANDBY state. Call the ttRepStateGet built-in procedure to check the state of master2. For example:

    Command> call ttRepStateGet;
    
  14. Start the cache agent for master2 using the ttCacheStart built-in procedure or the ttAdmin -cacheStart utility. For example:

    Command> call ttCacheStart;
    
  15. As the instance administrator, duplicate the subscribers (sub1 and sub2) from the standby database (master2). Use the -noKeepCG command line option with ttRepAdmin -duplicate to convert the cache tables to normal TimesTen tables on the subscribers. ttRepAdmin prompts for the values of -uid and -pwd. See "Duplicating a database". For example:

    ttRepAdmin -duplicate -from master2 -host node2 -nokeepCG
     -connStr "DSN=sub1;UID=;PWD="
    
  16. Set up the replication agent policy on the subscribers and start the replication agent on each of the subscriber databases. See "Starting and stopping the replication agents".

Setting up an active standby pair with an AWT cache group

For detailed instructions for setting up an active standby pair with a global AWT cache group, see "Replicating cache tables" in Oracle TimesTen Application-Tier Database Cache User's Guide. The active standby pair in that section is a cache grid member.

Changing user names or passwords used by replication

In the active standby pair, you can modify either the TimesTen user name or password or (if there are cache groups in the active standby pair) the user names and passwords for the TimesTen cache manager user, its companion Oracle user, or the cache administration user.

When the DDLReplicationLevel connection attribute is 2 or larger, changes to the user names or passwords executed on the active master are automatically replicated to the standby master and any subscribers. When the DDLReplicationLevel connection attribute is 1, changes to the user names or passwords executed on the active master are not automatically replicated to the standby master and any subscribers. In this case, you must manually execute each SQL statement on the active master, standby master, and any subscribers.

Note:

For more information on what DDL statements are automatically replicated for the different values of the DDLReplicationLevel connection attribute, see "Making DDL changes in an active standby pair".

Perform the following to change any of the user names or passwords for the TimesTen user or, if there are cache groups in the active standby pair, for the TimesTen cache manager user, its companion Oracle user, or the cache administration user:

  1. If you want to modify a password of a TimesTen user, use the ALTER USER statement on the active master database. If you want to change the TimesTen user name, you must first drop all objects that the TimesTen user owns before dropping the user name and creating a new user.

    To modify the password of the oratt user:

    Note:

    See "Creating or identifying users to the database" in Oracle TimesTen In-Memory Database Operations Guide.
    Command> ALTER USER oratt IDENTIFIED BY newpwd;
    
  2. If you want to modify any of the user names or passwords used for cache operations (such as the cache administration user, the cache manager user or its companion Oracle user), perform the instructions provided in "Changing cache user names or passwords" in the Oracle TimesTen Application-Tier Database Cache User's Guide.

Recovering from a failure of the active database

If the active master has failed and the standby database did not fail or has recovered after a failure, then the following sections describe how to recover the active standby pair by making the standby master the new active master. In addition, you can then swap the active and standby masters again so that they exist on the original nodes.

Note:

If both the active and standby masters fail, see "Recovering after a dual failure of both active and standby databases" for instructions on how to recover.

Recovering when the standby database is ready

The first two sections describe how to recover the active database when the standby database is available and synchronized with the active database. The last section describes what to do if following the instructions from either of the first two sections fails; the standby database is available, but the data is not fully synchronized.

When replication is return receipt or asynchronous

Complete the following tasks:

  1. On the standby database, stop the replication agent if it has not already been stopped.

  2. On the standby database, call ttRepStateSet('ACTIVE'). This changes the role of the database from STANDBY to ACTIVE. If you are replicating a read-only cache group, this action automatically causes the autorefresh state to change from PAUSED to ON for this database.

  3. On the new active database, call ttRepStateSave('FAILED', 'failed_database','host_name'), where failed_database is the former active database that failed. This step is necessary for the new active database to replicate directly to the subscriber databases. During normal operation, only the standby database replicates to the subscribers.

  4. On the new active database, start the replication agent and the cache agent.

  5. Destroy the failed database (the old active) with the ttDestroy utility.

  6. Duplicate the new active database to the new standby database. You can use either the ttRepAdmin -duplicate utility or the ttRepDuplicateEx C function to duplicate a database. Use the -keepCG -recoveringNode options with ttRepAdmin to recover and to preserve the cache group after the active master failure. See "Duplicating a database".

  7. Set up the replication agent policy on the new standby database and start the replication agent. See "Starting and stopping the replication agents".

  8. Start the cache agent on the new standby database.

Note:

If any of these steps failed, follow the directions in "When there is unsynchronized data in the cache groups".

The standby database contacts the active database. The active database stops sending updates to the subscribers. When the standby database is fully synchronized with the active database, then the standby database enters the STANDBY state and starts sending updates to the subscribers.The new standby database takes over processing of the cache group automatically when it enters the STANDBY state. If you are replicating an AWT cache group, the new standby database takes over processing of the cache group automatically when it enters the STANDBY state.

Note:

You can verify that the standby database has entered the STANDBY state by using the ttRepStateGet built-in procedure.

When replication is return twosafe

Complete the following tasks:

  1. Stop the replication agent on the standby database if it has not already been stopped.

  2. On the standby database, call ttRepStateSet('ACTIVE'). This changes the role of the database from STANDBY to ACTIVE. If you are replicating a read-only cache group, this action automatically causes the autorefresh state to change from PAUSED to ON for this database.

  3. On the new active database, call ttRepStateSave('FAILED', 'failed_database','host_name'), where failed_database is the former active database that failed. This step is necessary for the new active database to replicate directly to the subscriber databases. During normal operation, only the standby database replicates to the subscribers.

  4. On the new active database, start the replication agent and the cache agent.

  5. Connect to the failed database. This triggers recovery from the local transaction logs. If database recovery fails, you must continue from Step 5 of the procedure for recovering when replication is return receipt or asynchronous. See "When replication is return receipt or asynchronous". If you are replicating a read-only cache group, the autorefresh state is automatically set to PAUSED.

  6. Verify that the replication agent for the failed database has restarted. If it has not restarted, then start the replication agent. See "Starting and stopping the replication agents".

  7. Verify that the cache agent for the failed database has restarted. If it has not restarted, then start the cache agent.

Note:

If any of these steps failed, follow the directions in "When there is unsynchronized data in the cache groups".

When the active database determines that it is fully synchronized with the standby database, then the standby database enters the STANDBY state and starts sending updates to the subscribers. The new standby database takes over processing of the cache group automatically when it enters the STANDBY state. If you are replicating an AWT cache group, the new standby database takes over processing of the cache group automatically when it enters the STANDBY state.

Note:

You can verify that the standby database has entered the STANDBY state by using the ttRepStateSet built-in procedure.

When there is unsynchronized data in the cache groups

If the steps in either "When replication is return receipt or asynchronous" or "When replication is return twosafe" fail, then there could be unsynchronized data in the AWT cache groups that has not been propagated to the Oracle database. In addition, there could be unsynchronized data on the Oracle database that has not been uploaded to any read-only cache groups that are included in the active standby pair replication scheme.

If there is data in any AWT cache groups on the standby master that has not been propagated when the active database failed, then simply recovering the standby database as the new active database is not an option. In this case, perform the following:

  1. On the standby database, stop the replication agent and drop the replication configuration using the DROP ACTIVE STANDBY PAIR statement.

  2. Stop the cache agent to ensure that no more updates are applied to the AWT cache groups while performing this recovery operation and to ensure that you control when any read-only cache groups that were included in the replication scheme are refreshed.

  3. For any read-only cache groups that are included in the replication scheme, set the autorefresh state to pause with the ALTER CACHE GROUP ... SET AUTOREFRESH STATE PAUSED statement.

  4. On the standby database, flush any unpropagated committed inserts or updates on TimesTen cache tables for any AWT cache groups to the cached Oracle Database tables, as follows:

    1. Set autocommit to off.

    2. Call the ttCacheAllowFlushAwtSet built-in procedure with the parameter set to 1. This built-in procedure allows you to execute a FLUSH CACHE GROUP statement against an AWT cache group and should only be used in this recovery scenario.

      Command> call ttCacheAllowFlushAwtSet(1);
      
    3. Execute the FLUSH CACHE GROUP SQL statement against each AWT cache group to ensure that all data is propagated to the Oracle database.

      Note:

      Executing the FLUSH CACHE GROUP statement under these conditions on the AWT cache group only flushes the contents of the tables in the AWT cache group; that is, the data that was either inserted or updated. It does not take into account any delete operations. So, you may have rows that exist on the Oracle database that were deleted from the AWT cache group. It is up to the user to recover any delete operations.
    4. Call the ttCacheAllowFlushAwtSet built-in procedure with the parameter set to 0 to disallow any future execution of the FLUSH CACHE GROUP statement on an AWT cache group.

      Command> call ttCacheAllowFlushAwtSet(0);
      
    5. Commit after calling the ttCacheAllowFlushAwtSet built-in procedure with the parameter set to 0. You can also choose to reset autocommit to on, as it only needed to be off for the ttCacheAllowFlushAwtSet built-in procedure.

  5. Drop and re-create all AWT cache groups using the DROP CACHE GROUP and CREATE CACHE GROUP statements.

  6. Start the replication agent and the cache agent, since the cache agent needs to be active to refresh any read-only cache groups and both must be active in order to load the AWT cache groups.

  7. Refresh all read-only cache groups using the REFRESH CACHE GROUP statement to upload most current committed data from the cached Oracle database tables. Use the REFRESH CACHE GROUP ... PARALLEL n clause to concurrently load these cache groups over multiple threads.

  8. Load all AWT cache groups using the LOAD CACHE GROUP statement to begin the autorefresh process. Use the LOAD CACHE GROUP ... PARALLEL n clause to concurrently load these cache groups over multiple threads.

  9. Stop both the replication agent and the cache agent in preparation to re-create the active standby pair.

  10. Re-create the replication configuration on the standby database using the CREATE ACTIVE STANDBY PAIR statement.

  11. Set the old standby database as the new active database, destroy the failed old active database, perform a duplicate of the active to create a new standby database, and start the cache and replication agents on the standby as described in the steps listed in "When replication is return receipt or asynchronous".

Failing back to the original nodes

After a successful failover, you may want to fail back so that the active database and the standby database are on their original nodes. See "Reversing the roles of the active and standby databases" for instructions.

Recovering from a failure of the standby database

To recover from a failure of the standby database, complete the following tasks:

  1. If return twosafe service is enabled, the failure of the standby database may prevent a transaction in progress from being committed on the active database, resulting in error 8170, "Receipt or commit acknowledgement not returned in the specified timeout interval". If so, then call the ttRepSyncSet procedure with a localAction parameter of 2 (COMMIT) and commit the transaction again. For example:

    call ttRepSyncSet( null, null, 2);
    commit;
    
  2. Call ttRepStateSave('FAILED','standby_database','host_name') on the active database. Then, as long as the standby database is unavailable, updates to the active database are replicated directly to the subscriber databases. Additional subscriber databases may also be duplicated directly from the active.

  3. Recover the standby database in one of the following ways:

    1. Connect to the standby database. This triggers recovery from the local transaction logs. If the standby database recovers, go to Step 4; otherwise, continue to Step 3b.

    2. Destroy the current version of the standby database with the ttDestroy utility.

    3. Duplicate a new standby database from the active database. You can use either the ttRepAdmin -duplicate utility or the ttRepDuplicateEx C function to duplicate a database. Use the -keepCG -recoveringNode options with ttRepAdmin to recover and to preserve the cache group after the standby master failure. See "Duplicating a database".

  4. Set up the replication agent policy and start the replication agent on the standby database. See "Starting and stopping the replication agents".

  5. Start the cache agent on the standby database.

The standby database enters the STANDBY state and starts sending updates to the subscribers after the active database determines that the two master databases have been synchronized and stops sending updates to the subscribers.

Note:

You can verify that the standby database has entered the STANDBY state by using the ttRepStateGet procedure.

Recovering after a dual failure of both active and standby databases

If both the active and standby databases fail at around the same time and if you can reconnect to both of them almost immediately, then restart the replication agents (and cache agents if applicable) and continue.

  1. Connect to the failed active database. This triggers recovery from the local transaction logs. If you are replicating a read-only cache group, the autorefresh state is automatically set to PAUSED.

  2. Verify that the replication agent for the failed active database has restarted. If it has not restarted, then start the replication agent. See "Starting and stopping the replication agents".

  3. Call ttRepStateSet('ACTIVE') on the newly recovered database. If you are replicating a read-only cache group, this action automatically causes the autorefresh state to change from PAUSED to ON for this database.

  4. Verify that the cache agent for the failed database has restarted. If it has not restarted, then start the cache agent.

  5. Connect to the failed standby master database. This triggers recovery from the local transaction logs. If you are replicating a read-only cache group, the autorefresh state is automatically set to PAUSED.

  6. Verify that the replication agent for the failed standby database has restarted. If it has not restarted, then start the replication agent. See "Starting and stopping the replication agents".

  7. Verify that the cache agent for the failed standby database has restarted. If it has not restarted, then start the cache agent.

Alternatively, consider the following scenarios where both the active and standby master databases fail:

  • The standby database fails. The active database fails before the standby comes back up or before the standby has been synchronized with the active database.

  • The active database fails. The standby database becomes ACTIVE, and the rest of the recovery process begins. (See "Recovering from a failure of the active database".) The new active database fails before the new standby database is fully synchronized with it.

In these scenarios, the subscribers may have had more changes applied than the standby database.

In this case, you could potentially perform one of the following options:

Recover the active database and duplicate a new standby database

  1. Connect to the failed active database. This triggers recovery from the local transaction logs. If you are replicating a read-only cache group, the autorefresh state is automatically set to PAUSED.

    Note:

    If this fails, perform the steps listed in "Restore the active master from a backup.".
  2. Verify that the replication agent for the failed active database has restarted. If it has not restarted, then start the replication agent. See "Starting and stopping the replication agents".

  3. Call ttRepStateSet('ACTIVE') on the newly recovered database. If you are replicating a read-only cache group, this action automatically causes the autorefresh state to change from PAUSED to ON for this database.

  4. Verify that the cache agent for the failed database has restarted. If it has not restarted, then start the cache agent.

  5. Duplicate the active database to the standby database. You can use either the ttRepAdmin -duplicate utility or the ttRepDuplicateEx C function to duplicate a database. Use the -keepCG command line option with ttRepAdmin to preserve the cache group. See "Duplicating a database".

  6. Set up the replication agent policy on the standby database and start the replication agent. See "Starting and stopping the replication agents".

  7. Wait for the standby database to enter the STANDBY state. Use the ttRepStateGet procedure to check the state.

  8. Start the cache agent for on the standby database using the ttCacheStart procedure or the ttAdmin -cacheStart utility.

  9. Duplicate all of the subscribers from the standby database. See "Duplicating a master database to a subscriber". Use the -noKeepCG command line option with ttRepAdmin in order to convert the cache group to regular TimesTen tables on the subscribers.

  10. Set up the replication agent policy on the subscribers and start the agent on each of the subscriber databases. See "Starting and stopping the replication agents".

Recover the standby database to be the new active master

  1. Connect to the failed standby master database. This triggers recovery from the local transaction logs. If you are replicating a read-only cache group, the autorefresh state is automatically set to PAUSED.

    Note:

    If this fails, perform the steps listed in "Restore the active master from a backup.".
  2. If the replication agent for the failed standby master has automatically restarted, stop the replication agent. See "Starting and stopping the replication agents".

  3. If the cache agent has automatically restarted, stop the cache agent.

  4. Drop the replication configuration using the DROP ACTIVE STANDBY PAIR statement.

  5. Drop and re-create all cache groups using the DROP CACHE GROUP and CREATE CACHE GROUP statements.

  6. Re-create the replication configuration using the CREATE ACTIVE STANDBY PAIR statement.

  7. Call ttRepStateSet('ACTIVE') on the master database, giving it the ACTIVE role. If you are replicating a read-only cache group, this action automatically causes the autorefresh state to change from PAUSED to ON for this database.

  8. Set up the replication agent policy and start the replication agent on the new active database. See "Starting and stopping the replication agents".

  9. Start the cache agent on the new active database.

  10. Duplicate the active database to the standby database. You can use either the ttRepAdmin -duplicate utility or the ttRepDuplicateEx C function to duplicate a database. Use the -keepCG command line option with ttRepAdmin to preserve the cache group. See "Duplicating a database".

  11. Set up the replication agent policy on the standby database and start the replication agent on the new standby database. See "Starting and stopping the replication agents".

  12. Wait for the standby database to enter the STANDBY state. Use the ttRepStateGet procedure to check the state.

  13. Start the cache agent for the standby database using the ttCacheStart procedure or the ttAdmin -cacheStart utility.

  14. Duplicate all of the subscribers from the standby database. See "Duplicating a master database to a subscriber". Use the -noKeepCG command line option with ttRepAdmin in order to convert the cache group to regular TimesTen tables on the subscribers.

  15. Set up the replication agent policy on the subscribers and start the agent on each of the subscriber databases. See "Starting and stopping the replication agents".

Restore the active master from a backup

If both the active and standby masters fail and neither comes up and you have a backup, then perform the following:

  1. Restore the active master from a backup, as described in "Backing up and restoring a database with cache groups" in the Oracle TimesTen Application-Tier Database Cache User's Guide.

  2. Drop the replication configuration using the DROP ACTIVE STANDBY PAIR statement.

  3. Drop and re-create all AWT cache groups using the DROP CACHE GROUP and CREATE CACHE GROUP statements.

  4. Start the replication agent and the cache agent, since the cache agent needs to be active to refresh any read-only cache groups and both must be active in order to load the AWT cache groups.

  5. Refresh all read-only cache groups using the REFRESH CACHE GROUP statement to upload most current committed data from the cached Oracle database tables. Use the REFRESH CACHE GROUP ... PARALLEL n clause to concurrently load these cache groups over multiple threads.

  6. Load all AWT cache groups using the LOAD CACHE GROUP statement to begin the autorefresh process. Use the LOAD CACHE GROUP ... PARALLEL n clause to concurrently load these cache groups over multiple threads.

  7. Stop both the replication agent and the cache agent in preparation to re-create the active standby pair.

  8. Re-create the replication configuration using the CREATE ACTIVE STANDBY PAIR statement.

  9. Call ttRepStateSet('ACTIVE') on the active master database, giving it the ACTIVE role. If you are replicating a read-only cache group, this action automatically causes the autorefresh state to change from PAUSED to ON for this database.

  10. Set up the replication agent policy and start the replication agent on the active database. See "Starting and stopping the replication agents".

  11. Start the cache agent on the active database.

  12. Duplicate the active database to the standby database. You can use either the ttRepAdmin -duplicate utility or the ttRepDuplicateEx C function to duplicate a database. Use the -keepCG command line option with ttRepAdmin to preserve the cache group. See "Duplicating a database".

  13. Set up the replication agent policy on the standby database and start the replication agent on the new standby database. See "Starting and stopping the replication agents".

  14. Wait for the standby database to enter the STANDBY state. Use the ttRepStateGet procedure to check the state.

  15. Start the cache agent for the standby database using the ttCacheStart procedure or the ttAdmin -cacheStart utility.

  16. Duplicate all of the subscribers from the standby database. See "Duplicating a master database to a subscriber". Use the -noKeepCG command line option with ttRepAdmin in order to convert the cache group to regular TimesTen tables on the subscribers.

  17. Set up the replication agent policy on the subscribers and start the agent on each of the subscriber databases. See "Starting and stopping the replication agents".

Recovering from the failure of a subscriber database

If a subscriber database fails, then you can recover it by one of the following methods:

  • Connect to the failed subscriber. This triggers recovery from the local transaction logs. Start the replication agent and let the subscriber catch up.

  • Duplicate the subscriber from the standby database. You can use either the ttRepAdmin -duplicate utility or the ttRepDuplicateEx C function to duplicate a database. Use the -noKeepCG command line option with ttRepAdmin in order to convert the cache group to normal TimesTen tables on the subscriber.

If the standby database is down or in recovery, then duplicate the subscriber from the active database.

After the subscriber database has been recovered, then set up the replication agent policy and start the replication agent. See "Starting and stopping the replication agents".

Reversing the roles of the active and standby databases

To change the role of the active database to standby and vice versa:

  1. Pause any applications that are generating updates on the current active database.

  2. Call ttRepSubscriberWait on the active database, with the DSN and host of the current standby database as input parameters. It must return success (<00>). This ensures that all updates have been transmitted to the current standby database.

  3. Stop the replication agent on the current active database. See "Starting and stopping the replication agents".

  4. If global cache groups are not present, stop the cache agent on the current active database. When global cache groups are present, set the autorefresh state to PAUSED.

  5. Call ttRepDeactivate on the current active database. This puts the database in the IDLE state. If you are replicating a read-only cache group, this action automatically causes the autorefresh state to change from ON to PAUSED for this database.

  6. Call ttRepStateSet('ACTIVE') on the current standby database. This database now acts as the active database in the active standby pair. If you are replicating a read-only cache group, this automatically causes the autorefresh state to change from PAUSED to ON for this database.

  7. Start the replication agent on the former master database.

  8. Configure the replication agent policy as needed and start the replication agent on the former active database. Use the ttRepStateGet procedure to determine when the database's state has changed from IDLE to STANDBY. The database now acts as the standby database in the active standby pair.

  9. Start the cache agent on the former active database if it is not already running.

  10. Resume any applications that were paused in Step 1.

Detecting dual active databases

See "Detection of dual active databases". There is no difference for active standby pairs that replicate cache groups.

Using a disaster recovery subscriber in an active standby pair

TimesTen active standby pair replication provides high availability by allowing for fast switching between databases within a data center. This includes the ability to automatically change which database propagates changes to an Oracle database using AWT cache groups. However, for additional high availability across data centers, you may require the ability to recover from a failure of an entire site, which can include a failure of both TimesTen master databases in the active standby pair as well as the Oracle database used for the cache groups.

You can recover from a complete site failure by creating a special disaster recovery read-only subscriber as part of the active standby pair replication scheme. The standby database sends updates to cache group tables on the read-only subscriber. This special subscriber is located at a remote disaster recovery site and can propagate updates to a second Oracle database, also located at the disaster recovery site. The disaster recovery subscriber can take over as the active in a new active standby pair at the disaster recovery site if the primary site suffers a complete failure. Any applications may then connect to the disaster recovery site and continue operating, with minimal interruption of service.

Requirements for using a disaster recovery subscriber with an active standby pair

To use a disaster recovery subscriber, you must:

  • Use an active standby pair configuration with AWT cache groups at the primary site. The active standby pair can also include read-only cache groups in the replication scheme. The read-only cache groups are converted to regular tables on the disaster recovery subscriber. The AWT cache group tables remain AWT cache group tables on the disaster recovery subscriber.

  • Have a continuous WAN connection from the primary site to the disaster recovery site. This connection should have at least enough bandwidth to guarantee that the normal volume of transactions can be replicated to the disaster recovery subscriber at a reasonable pace.

  • Configure an Oracle database at the disaster recovery site to include tables with the same schema as the database at the primary site. Note that this database is intended only for capturing the replicated updates from the primary site, and if any data exists in tables written to by the cache groups when the disaster recovery subscriber is created, that data is deleted.

  • Have the same cache group administrator user ID and password at both the primary and the disaster recovery site.

Though it is not absolutely required, you should have a second TimesTen database configured at the disaster recovery site. This database can take on the role of a standby database, in the event that the disaster recovery subscriber is promoted to an active database after the primary site fails.

Rolling out a disaster recovery subscriber

To create a disaster recovery subscriber, follow these steps:

  1. Create an active standby pair with AWT cache groups at the primary site. The active standby pair can also include read-only cache groups. The read-only cache groups are converted to regular tables when the disaster recovery subscriber is rolled out.

  2. Create the disaster recovery subscriber at the disaster recovery site using the ttRepAdmin utility with the -duplicate and -initCacheDR options. You must also specify the cache group administrator and password for the Oracle database at the disaster recovery site using the -cacheUid and -cachePwd options.

    If your database includes multiple cache groups, you may improve the efficiency of the duplicate operation by using the -nThreads option to specify the number of threads that are spawned to flush the cache groups in parallel. Each thread flushes an entire cache group to the Oracle database and then moves on to the next cache group, if any remain to be flushed. If a value is not specified for -nThreads, only one flushing thread is spawned.

    For example, duplicate the standby database mast2, on the system with the host name primary and the cache user ID system and password manager, to the disaster recovery subscriber drsub, and using two cache group flushing threads. ttRepAdmin prompts for the values of -uid, -pwd, -cacheUid and -cachePwd.

    ttRepAdmin -duplicate -from mast2 -host primary -initCacheDR -nThreads 2 
     -connStr "DSN=drsub;UID=;PWD=;"
    

    If you use the ttRepDuplicateEx function in C, you must set the TT_REPDUP_INITCACHEDR flag in ttRepDuplicateExArg.flags and may optionally specify a value for ttRepDuplicateExArg.nThreads4InitDR:

    int                 rc;
    ttUtilHandle        utilHandle;
    ttRepDuplicateExArg arg;
    memset( &arg, 0, sizeof( arg ) );
    arg.size = sizeof( ttRepDuplicateExArg );
    arg.flags = TT_REPDUP_INITCACHEDR;
    arg.nThreads4InitDR = 2;
    arg.uid="ttuser"
    arg.pwd="ttuser"
    arg.cacheuid = "system";
    arg.cachepwd = "manager";
    arg.localHost = "disaster";
    rc = ttRepDuplicateEx( utilHandle, "DSN=drsub",
                           "mast2", "primary", &arg );
    

    After the subscriber is duplicated, TimesTen automatically configures the replication scheme that propagates updates from the AWT cache groups to the Oracle database, truncates the tables in the Oracle database that correspond to the cache groups in TimesTen, and then flushes all of the data in the cache groups to the Oracle database.

  3. If you want to set the failure threshold for the disaster recovery subscriber, call the ttCacheAWTThresholdSet built-in procedure and specify the number of transaction log files that can accumulate before the disaster recovery subscriber is considered either dead or too far behind to catch up.

    If one or both master databases had a failure threshold configured before the disaster recovery subscriber was created, then the disaster recovery subscriber inherits the failure threshold value when it is created with the ttRepAdmin -duplicate -initCacheDR command. If the master databases have different failure thresholds, then the higher value is used for the disaster recovery subscriber.

    For more information about the failure threshold, see "Setting the transaction log failure threshold".

  4. Start the replication agent for the disaster recovery subscriber using the ttRepStart procedure or the ttAdmin utility with the -repstart option. For example:

    ttAdmin -repstart drsub
    

    Updates are now replicated from the standby database to the disaster recovery subscriber, which then propagates the updates to the Oracle database at the disaster recovery site.

Switching over to the disaster recovery site

When the primary site has failed, you can switch over to the disaster recovery site in one of two ways. If your goal is to minimize risk of data loss at the disaster recovery site, you may roll out a new active standby pair using the disaster recovery subscriber as the active database. If the goal is to absolutely minimize the downtime of your applications, at the risk of data loss if the disaster recovery database later fails, you may instead choose to drop the replication scheme from the disaster recovery subscriber and use it as a single non-replicating database. You may deploy an active standby pair at the disaster recovery site later.

Creating a new active standby pair after switching to the disaster recovery site

  1. Any read-only applications may be redirected to the disaster recovery subscriber immediately. Redirecting applications that make updates to the database must wait until Step 7.

  2. Ensure that all of the recent updates to the cache groups have been propagated to the Oracle database using the ttRepSubscriberWait procedure or the ttRepAdmin command with the -wait option.

    ttRepSubscriberWait( null, null, '_ORACLE', null, 600 );
    

    It must return success (<00>). If ttRepSubscriberWait returns 0x01, indicating a timeout, investigate to determine why the cache groups are not finished propagating before continuing to Step 3.

  3. Stop the replication agent on the disaster recovery subscriber using the ttRepStop procedure or the ttAdmin command with the -repstop option. For example, to stop the replication agent for the subscriber drsub, use:

    call ttRepStop;
    
  4. Drop the active standby pair replication scheme on the subscriber using the DROP ACTIVE STANDBY PAIR statement. For example:

    DROP ACTIVE STANDBY PAIR;
    
  5. If there are tables on the disaster recovery subscriber that were converted from read-only cache group tables on the active database, drop the tables on the disaster recovery subscriber.

  6. Create the read-only cache groups on the disaster recovery subscriber. Ensure that the autorefresh state is set to PAUSED.

  7. Create a new active standby pair replication scheme using the CREATE ACTIVE STANDBY PAIR statement, specifying the disaster recovery subscriber as the active database. For example, to create a new active standby pair with the former subscriber drsub as the active and the new database drstandby as the standby, and using the return twosafe return service, use:

    CREATE ACTIVE STANDBY PAIR drsub, drstandby RETURN TWOSAFE;
    
  8. Set the new active standby database to the ACTIVE state using the ttRepStateSet procedure. For example, on the database drsub in this example, call:

    call ttRepStateSet( 'ACTIVE' );
    
  9. Any applications which must write to the TimesTen database may now be redirected to the new active database.

  10. If you are replicating a read-only cache group, load the cache group using the LOAD CACHE GROUP statement to begin the autorefresh process. You may also load the cache group if you are replicating an AWT cache group, although it is not required.

  11. Duplicate the active database to the standby database. You can use either the ttRepAdmin -duplicate utility or the ttRepDuplicateEx C function to duplicate a database. Use the -keepCG command line option with ttRepAdmin to preserve the cache group. See "Duplicating a database".

  12. Set up the replication agent policy on the standby database and start the replication agent. See "Starting and stopping the replication agents".

  13. Wait for the standby database to enter the STANDBY state. Use the ttRepStateGet procedure to check the state.

  14. Start the cache agent for the standby database using the ttCacheStart procedure or the ttAdmin -cacheStart utility.

  15. Duplicate all of the subscribers from the standby database. See "Duplicating a master database to a subscriber". Use the -noKeepCG command line option with ttRepAdmin in order to convert the cache group to regular TimesTen tables on the subscribers.

  16. Set up the replication agent policy on the subscribers and start the agent on each of the subscriber databases. See "Starting and stopping the replication agents".

Switching over to a single database

  1. Any read-only applications may be redirected to the disaster recovery subscriber immediately. Redirecting applications that make updates to the database must wait until Step 5.

  2. Stop the replication agent on the disaster recovery subscriber using the ttRepStop procedure or the ttAdmin command with the -repstop option. For example, to stop the replication agent for the subscriber drsub, use:

    call ttRepStop;
    
  3. Drop the active standby pair replication scheme on the subscriber using the DROP ACTIVE STANDBY PAIR statement. For example:

    DROP ACTIVE STANDBY PAIR;
    
  4. If there are tables on the disaster recovery subscriber that were converted from read-only cache group tables on the active database, drop the tables on the disaster recovery subscriber.

  5. Create the read-only cache groups on the disaster recovery subscriber.

  6. Although there is no longer an active standby pair configured, AWT cache groups require the replication agent to be started. Start the replication agent on the database using the ttRepStart procedure or the ttAdmin command with the -repstart option. For example, to start the replication agent for the database drsub, use:

    call ttRepStart;
    
  7. Any applications which must write to a TimesTen database may now be redirected to the this database.

    Note:

    You may choose to roll out an active standby pair at the disaster recovery site at a later time. You may do this by following the steps in "Creating a new active standby pair after switching to the disaster recovery site", starting at Step 2 and skipping Step 4.

Returning to the original configuration at the primary site

When the primary site is usable again, you may want to move the working active standby pair from the disaster recovery site back to the primary site. You can do this with a minimal interruption of service by reversing the process that was used to create and switch over to the original disaster recovery site. Follow these steps:

  1. Destroy original active database at the primary site, if necessary, using the ttDestroy utility. For example, to destroy a database called mast1, use:

    ttDestroy mast1
    
  2. Create a disaster recovery subscriber at the primary site, following the steps detailed in "Rolling out a disaster recovery subscriber". Use the original active database for the new disaster recovery subscriber.

  3. Switch over to the new disaster recovery subscriber at primary site, as detailed in "Switching over to the disaster recovery site". Roll out the standby database as well.

  4. Roll out a new disaster recovery subscriber at the disaster recovery site, as detailed in "Rolling out a disaster recovery subscriber".