Using a Disaster Recovery Subscriber in an Active Standby Pair

TimesTen active standby pair replication provides high availability by allowing for fast switching between databases within a data center.

This includes the ability to automatically change which database propagates changes to an Oracle database using AWT cache groups. However, for additional high availability across data centers, you may require the ability to recover from a failure of an entire site, which can include a failure of both TimesTen master databases in the active standby pair as well as the Oracle database used for the cache groups.

You can recover from a complete site failure by creating a special disaster recovery read-only subscriber as part of the active standby pair replication scheme. The standby database sends updates to cache group tables on the read-only subscriber. This special subscriber is located at a remote disaster recovery site and can propagate updates to a second Oracle database, also located at the disaster recovery site. The disaster recovery subscriber can take over as the active in a new active standby pair at the disaster recovery site if the primary site suffers a complete failure. Any applications may then connect to the disaster recovery site and continue operating, with minimal interruption of service.

Requirements for Using a Disaster Recovery Subscriber With an Active Standby Pair

To use a disaster recovery subscriber, you must:

  • Use an active standby pair configuration with AWT cache groups at the primary site. The active standby pair can also include read-only cache groups in the replication scheme. The read-only cache groups are converted to regular tables on the disaster recovery subscriber. The AWT cache group tables remain AWT cache group tables on the disaster recovery subscriber.

  • Have a continuous WAN connection from the primary site to the disaster recovery site. This connection should have at least enough bandwidth to guarantee that the normal volume of transactions can be replicated to the disaster recovery subscriber at a reasonable pace.

  • Configure an Oracle database at the disaster recovery site to include tables with the same schema as the database at the primary site. Note that this database is intended only for capturing the replicated updates from the primary site, and if any data exists in tables written to by the cache groups when the disaster recovery subscriber is created, that data is deleted.

  • Have the same cache group administrator user ID and password at both the primary and the disaster recovery site.

Though it is not absolutely required, you should have a second TimesTen database configured at the disaster recovery site. This database can take on the role of a standby database, in the event that the disaster recovery subscriber is promoted to an active database after the primary site fails.

Rolling Out a Disaster Recovery Subscriber

To create a disaster recovery subscriber, follow these steps:

  1. Create an active standby pair with AWT cache groups at the primary site. The active standby pair can also include read-only cache groups. The read-only cache groups are converted to regular tables when the disaster recovery subscriber is rolled out.

  2. Create the disaster recovery subscriber at the disaster recovery site using the ttRepAdmin utility with the -duplicate and -initCacheDR options. You must also specify the cache group administrator and password for the Oracle database at the disaster recovery site using the -cacheUid and -cachePwd options.

    If your database includes multiple cache groups, you may improve the efficiency of the duplicate operation by using the -nThreads option to specify the number of threads that are spawned to flush the cache groups in parallel. Each thread flushes an entire cache group to the Oracle database and then moves on to the next cache group, if any remain to be flushed. If a value is not specified for -nThreads, only one flushing thread is spawned.

    For example, duplicate the standby database mast2, on the system with the host name primary and the cache user ID system and password manager, to the disaster recovery subscriber drsub, and using two cache group flushing threads. ttRepAdmin prompts for the values of -uid, -pwd, -cacheUid and -cachePwd.

    ttRepAdmin -duplicate -from mast2 -host primary -initCacheDR -nThreads 2 
     -connStr "DSN=drsub;UID=;PWD=;"

    If you use the ttRepDuplicateEx function in C, you must set the TT_REPDUP_INITCACHEDR flag in ttRepDuplicateExArg.flags and may optionally specify a value for ttRepDuplicateExArg.nThreads4InitDR:

    int                 rc;
    ttUtilHandle        utilHandle;
    ttRepDuplicateExArg arg;
    memset( &arg, 0, sizeof( arg ) );
    arg.size = sizeof( ttRepDuplicateExArg );
    arg.flags = TT_REPDUP_INITCACHEDR;
    arg.nThreads4InitDR = 2;
    arg.uid="ttuser"
    arg.pwd="ttuser"
    arg.cacheuid = "system";
    arg.cachepwd = "manager";
    arg.localHost = "disaster";
    rc = ttRepDuplicateEx( utilHandle, "DSN=drsub",
                           "mast2", "primary", &arg );

    After the subscriber is duplicated, TimesTen automatically configures the replication scheme that propagates updates from the AWT cache groups to the Oracle database, truncates the tables in the Oracle database that correspond to the cache groups in TimesTen, and then flushes all of the data in the cache groups to the Oracle database.

  3. If you want to set the failure threshold for the disaster recovery subscriber, call the ttCacheAWTThresholdSet built-in procedure and specify the number of transaction log files that can accumulate before the disaster recovery subscriber is considered either dead or too far behind to catch up.

    If one or both master databases had a failure threshold configured before the disaster recovery subscriber was created, then the disaster recovery subscriber inherits the failure threshold value when it is created with the ttRepAdmin -duplicate -initCacheDR command. If the master databases have different failure thresholds, then the higher value is used for the disaster recovery subscriber.

    See Setting the Transaction Log Failure Threshold.

  4. Start the replication agent for the disaster recovery subscriber using the ttRepStart built-in procedure or the ttAdmin utility with the -repstart option. For example:

    ttAdmin -repstart drsub

    Updates are now replicated from the standby database to the disaster recovery subscriber, which then propagates the updates to the Oracle database at the disaster recovery site.

    See Starting and Stopping the Replication Agents.

Switching Over to the Disaster Recovery Site

When the primary site has failed, you can switch over to the disaster recovery site.

There are one of two ways to switch over to the disaster recovery site.

  • Creating a New Active Standby Pair After Switching to the Disaster Recovery Site: If your goal is to minimize risk of data loss at the disaster recovery site, you may roll out a new active standby pair using the disaster recovery subscriber as the active database.

  • Switching Over to a Single Database: If the goal is to absolutely minimize the downtime of your applications, at the risk of data loss if the disaster recovery database later fails, you may instead choose to drop the replication scheme from the disaster recovery subscriber and use it as a single non-replicating database. You may deploy an active standby pair at the disaster recovery site later.

Creating a New Active Standby Pair After Switching to the Disaster Recovery Site

  1. Any read-only applications may be redirected to the disaster recovery subscriber immediately. Redirecting applications that make updates to the database must wait until Step 7.
  2. Ensure that all of the recent updates to the cache groups have been propagated to the Oracle database using the ttRepSubscriberWait built-in procedure or the ttRepAdmin command with the -wait option.
    Command> call ttRepSubscriberWait( null, null, '_ORACLE', null, 600 );

    It must return success (<00>). If ttRepSubscriberWait returns 0x01, indicating a timeout, investigate to determine why the cache groups are not finished propagating before continuing to Step 3.

  3. Stop the replication agent on the disaster recovery subscriber using the ttRepStop built-in procedure or the ttAdmin command with the -repstop option. For example, to stop the replication agent for the subscriber drsub, use:
    call ttRepStop;
  4. Drop the active standby pair replication scheme on the subscriber using the DROP ACTIVE STANDBY PAIR statement. For example:
    DROP ACTIVE STANDBY PAIR;
  5. If there are tables on the disaster recovery subscriber that were converted from read-only cache group tables on the active database, drop the tables on the disaster recovery subscriber.
  6. Create the read-only cache groups on the disaster recovery subscriber. Ensure that the autorefresh state is set to PAUSED.
  7. Create a new active standby pair replication scheme using the CREATE ACTIVE STANDBY PAIR statement, specifying the disaster recovery subscriber as the active database. For example, to create a new active standby pair with the former subscriber drsub as the active and the new database drstandby as the standby, and using the return twosafe return service, use:
    CREATE ACTIVE STANDBY PAIR drsub, drstandby RETURN TWOSAFE;
  8. Set the new active standby database to the ACTIVE state using the ttRepStateSet built-in procedure. For example, on the database drsub in this example, call:
    call ttRepStateSet( 'ACTIVE' );
  9. Any applications which must write to the TimesTen database may now be redirected to the new active database.
  10. If you are replicating a read-only cache group, load the cache group using the LOAD CACHE GROUP statement to begin the autorefresh process. You may also load the cache group if you are replicating an AWT cache group, although it is not required.
  11. Duplicate the active database to the standby database. You can use either the ttRepAdmin -duplicate utility or the ttRepDuplicateEx C function to duplicate a database. Use the -keepCG command line option with ttRepAdmin to preserve the cache group. See Duplicating a Database.
  12. Set up the replication agent policy on the standby database and start the replication agent. See Starting and Stopping the Replication Agents.
  13. Wait for the standby database to enter the STANDBY state. Use the ttRepStateGet built-in procedure to check the state.
  14. Start the cache agent for the standby database using the ttCacheStart built-in procedure or the ttAdmin -cacheStart utility.
  15. Duplicate all of the subscribers from the standby database. See Duplicating a Master Database to a Subscriber. Use the -noKeepCG command line option with ttRepAdmin in order to convert the cache group to regular TimesTen tables on the subscribers.
  16. Set up the replication agent policy on the subscribers and start the agent on each of the subscriber databases. See Starting and Stopping the Replication Agents.

Switching Over to a Single Database

  1. Any read-only applications may be redirected to the disaster recovery subscriber immediately. Redirecting applications that make updates to the database must wait until Step 5.
  2. Stop the replication agent on the disaster recovery subscriber using the ttRepStop built-in procedure or the ttAdmin command with the -repstop option. For example, to stop the replication agent for the subscriber drsub, use:
    call ttRepStop;
  3. Drop the active standby pair replication scheme on the subscriber using the DROP ACTIVE STANDBY PAIR statement. For example:
    DROP ACTIVE STANDBY PAIR;
  4. If there are tables on the disaster recovery subscriber that were converted from read-only cache group tables on the active database, drop the tables on the disaster recovery subscriber.
  5. Create the read-only cache groups on the disaster recovery subscriber.
  6. Although there is no longer an active standby pair configured, AWT cache groups require the replication agent to be started. Start the replication agent on the database using the ttRepStart built-in procedure or the ttAdmin command with the -repstart option. For example, to start the replication agent for the database drsub, use:
  7. Any applications which must write to a TimesTen database may now be redirected to the this database.

    Note:

    You may choose to roll out an active standby pair at the disaster recovery site at a later time. You may do this by following the steps in Creating a New Active Standby Pair After Switching to the Disaster Recovery Site, starting at Step 2 and skipping Step 4.

Returning to the Original Configuration at the Primary Site

When the primary site is usable again, you may want to move the working active standby pair from the disaster recovery site back to the primary site.

You can do this with a minimal interruption of service by reversing the process that was used to create and switch over to the original disaster recovery site. Follow these steps:

  1. Destroy original active database at the primary site, if necessary, using the ttDestroy utility. For example, to destroy a database called mast1, use:

    ttDestroy mast1
    
  2. Create a disaster recovery subscriber at the primary site, following the steps detailed in Rolling Out a Disaster Recovery Subscriber. Use the original active database for the new disaster recovery subscriber.

  3. Switch over to the new disaster recovery subscriber at primary site, as detailed in Switching Over to the Disaster Recovery Site. Roll out the standby database as well.

  4. Roll out a new disaster recovery subscriber at the disaster recovery site, as detailed in Rolling Out a Disaster Recovery Subscriber.