|Oracle® TimesTen In-Memory Database TimesTen to TimesTen Replication Guide
Part Number E13072-06
This chapter provides an overview of TimesTen replication. It includes these topics:
Replication is the process of maintaining copies of data in multiple data stores. The purpose of replication is to make data highly available to applications with minimal performance impact. TimesTen recommends the active standby pair configuration for highest availability. In an active standby pair replication scheme, the data is copied from the active data store to the standby data store before being copied to read-only subscribers.
In addition to providing recovery from failures, replication schemes can also distribute application workloads across multiple databases for maximum performance and facilitate online upgrades and maintenance.
Replication is the process of copying data from a master data store to a subscriber data store. Replication is controlled by replication agents for each data store. The replication agent on the master data store reads the records from the transaction log for the master data store. It forwards changes to replicated elements to the replication agent on the subscriber data store. The replication agent on the subscriber data store then applies the updates to its data store. If the subscriber replication agent is not running when the updates are forwarded by the master, the master retains the updates in its transaction log until they can be applied at the subscriber data store.
An entity that is replicated between data stores is called a replication element. TimesTen supports data stores, cache groups, tables and sequences as replication elements. TimesTen also replicates XLA bookmarks. An active standby pair is the only supported replication scheme for data stores with cache groups.
The master and subscriber data stores must reside on machines that have the same operating system, CPU type, and word size. Although you can replicate between data stores that reside on the same machine, replication is generally used for copying updates into a data store that resides on another machine. This helps prevent data loss from node failure.
The data stores must have DSNs with identical
Note:If replication is configured between a data store from the current release of TimesTen and a data store from a TimesTen release previous to 7.0, then there are additional restrictions for replication compatibility. A data store may only replicate to a TimesTen release previous to 7.0 if it is configured with a
DatabaseCharacterSetattribute of TIMESTEN8 and may only replicate tables with columns that use the original TimesTen data types (data types with the prefix TT_ or the data types BINARY_FLOAT and BINARY_DOUBLE). See "Types supported for backward compatibility in Oracle type mode" in Oracle TimesTen In-Memory Database SQL Reference for more information.
A data store name derived from the file system's path name for the data store
A host name
The replication agent on the master data store reads the records from the transaction log and forwards any detected changes to replicated elements to the replication agent on the subscriber data store. The replication agent on the subscriber data store then applies the updates to its data store. If the subscriber agent is not running when the updates are forwarded by the master, the master retains the updates in the log until they can be transmitted.
The replication agents communicate through TCP/IP stream sockets. The replication agents obtain the TCP/IP address, host name, and other configuration information from the replication tables described in Oracle TimesTen In-Memory Database System Tables and Limits Reference.
Updates are copied between data stores in asynchronously by default. Asynchronous replication provides the best performance, but it does not provide the application with confirmation that the replicated updates have been committed on the subscriber data stores. For applications that need higher levels of confidence that the replicated data is consistent between the master and subscriber data stores, you can enable either return receipt or return twosafe service.
The return receipt service loosely synchronizes the application with the replication mechanism by blocking the application until replication confirms that the update has been received by the subscriber. The return twosafe service provides a fully synchronous option by blocking the application until replication confirms that the update has been both received and committed on the subscriber.
Return receipt replication has less performance impact than return twosafe at the expense of less synchronization. The operational details for asynchronous, return receipt, and return twosafe replication are discussed in these sections:
When using default TimesTen replication, an application updates a master data store and continues working without waiting for the updates to be received and applied by the subscribers. The master and subscriber data stores have internal mechanisms to confirm that the updates have been successfully received and committed by the subscriber. These mechanisms ensure that updates are applied at a subscriber only once, but they are completely independent of the application.
Default TimesTen replication provides maximum performance, but the application is completely decoupled from the receipt process of the replicated elements on the subscriber.
Figure 1-1 Basic asynchronous replication cycle
The default TimesTen replication cycle is:
The application commits a local transaction to the master data store and is free to continue with other transactions.
During the commit, TimesTen Data Manager writes the transaction update records to the transaction log buffer.
The replication agent on the master data store directs the Data Manager to flush a batch of update records for the committed transactions from the log buffer to a transaction log file. This step ensures that, if the master fails and you need to recover the data store from the checkpoint and transaction log files, the recovered master contains all the data it replicated to the subscriber.
The master replication agent forwards the batch of transaction update records to the subscriber replication agent, which applies them to the subscriber data store. Update records are flushed to disk and forwarded to the subscriber in batches of 256K or less, depending on the master data store's transaction load. A batch is created when there is no more log data in the transaction log buffer or when the current batch is roughly 256K bytes.
The subscriber replication agent sends an acknowledgement back to the master replication agent that the batch of update records was received. The acknowledgement includes information on which batch of records the subscriber last flushed to disk. The master replication agent is now free to purge from the transaction log the update records that have been received, applied, and flushed to disk by all subscribers and to forward another batch of update records, while the subscriber replication agent asynchronously continues on to Step 6.
The replication agent at the subscriber updates the data store and directs its Data Manager to write the transaction update records to the transaction log buffer.
The replication agent at the subscriber data store uses a separate thread to direct the Data Manager to flush the update records to a transaction log file.
The return receipt service provides a level of synchronization between the master and a subscriber data store by blocking the application after commit on the master until the updates of the committed transaction have been received by the subscriber.
An application requesting return receipt updates the master data store in the same manner as in the basic asynchronous case. However, when the application commits a transaction that updates a replicated element, the master data store blocks the application until it receives confirmation that the updates for the completed transaction have been received by the subscriber.
Return receipt replication trades some performance in order to provide applications with the ability to ensure higher levels of data integrity and consistency between the master and subscriber data stores. In the event of a master failure, the application has a high degree of confidence that a transaction committed at the master persists in the subscribing data store.
Figure 1-2 Return receipt replication
Figure 1-2 shows that the return receipt replication cycle is the same as shown for the basic asynchronous cycle in Figure 1-1, only the master replication agent blocks the application thread after it commits a transaction (Step 1) and retains control of the thread until the subscriber acknowledges receipt of the update batch (Step 5). Upon receiving the return receipt acknowledgement from the subscriber, the master replication agent returns control of the thread to the application (Step 6), freeing it to continue executing transactions.
If the subscriber is unable to acknowledge receipt of the transaction within a configurable timeout period (default is 10 seconds), the master replication agent returns a warning stating that it did not receive acknowledgement of the update from the subscriber and returns control of the thread to the application. The application is then free to commit another transaction to the master, which continues replication to the subscriber as before. Return receipt transactions may time out for many reasons. The most likely causes for timeout are the network, a failed replication agent, or the master replication agent may be so far behind with respect to the transaction load that it cannot replicate the return receipt transaction before its timeout expires. For information on how to manage return-receipt timeouts, see "Managing return service timeout errors and replication state changes".
See "RETURN RECEIPT" for information on how to configure replication for return receipt.
The return twosafe service provides fully synchronous replication between the master and subscriber. Unlike the previously described replication modes, where transactions are transmitted to the subscriber after being committed on the master, transactions in twosafe mode are first committed on the subscriber before they are committed on the master.
Note:The return twosafe service can be used only in a bidirectional replication scheme where there is a single master and subscriber and the replication element is the entire data store.
Figure 1-3 Return twosafe replication
The following describes the replication behavior between a master and subscriber configured for return twosafe replication:
The application commits the transaction on the master data store.
The master replication agent writes the transaction records to the log and inserts a special precommit log record before the commit record. This precommit record acts as a place holder in the log until the master replication receives an acknowledgement that indicates the status of the commit on the subscriber.
Note:Transmission of return twosafe transactions are nondurable, so the master replication agent does not flush the log records to disk before sending them to the subscriber, as it does by default when replication is configured for asynchronous or return receipt replication.
The master replication agent transmits the batch of update records to the subscriber.
The subscriber replication agent commits the transaction on the subscriber data store.
The subscriber replication agent returns an acknowledgement back to the master replication agent with notification of whether the transaction was committed on the subscriber and whether the commit was successful.
If the commit on the subscriber was successful, the master replication agent commits the transaction on the master data store.
The master replication agent returns control to the application.
If the subscriber is unable to acknowledge commit of the transaction within a configurable timeout period (default is 10 seconds) or if the acknowledgement from the subscriber indicates the commit was unsuccessful, the replication agent returns control to the application without committing the transaction on the master data store. The application can then to decide whether to unconditionally commit or retry the commit. You can optionally configure your replication scheme to direct the master replication agent to commit all transactions that time out.
See "RETURN TWOSAFE" for information on how to configure replication for return twosafe.
You create a replication scheme to define a specific configuration of master and subscriber data stores. This section describes the possible relationships you can define between master and subscriber data stores when creating a replication scheme.
When defining a relationship between a master and subscriber, consider these replication schemes:
Figure 1-4 shows an active standby pair replication scheme with an active data store, a standby data store, and four read-only subscriber data stores.
Figure 1-4 Active standby pair
The active standby pair can replicate a whole data store or select elements like tables and cache groups.
In an active standby pair, two data stores are defined as masters. One is an active data store, and the other is a standby data store. The application updates the active data store directly. The standby data store cannot be updated directly. It receives the updates from the active data store and propagates the changes to as many as 127 read-only subscriber data stores. This arrangement ensures that the standby data store is always ahead of the subscriber data stores and enables rapid failover to the standby data store if the active data store fails.
Only one of the master data stores can function as an active data store at a specific time. You can manage failover and recovery of an active standby pair with Oracle Clusterware. See Chapter 6, "Using Oracle Clusterware to Manage Active Standby Pairs". You can also manage failover and recovery manually. See Chapter 4, "Administering an Active Standby Pair Without Cache Groups".
If the standby data store fails, the active data store can replicate changes directly to the read-only subscribers. After the standby data store has been recovered, it contacts the active data store to receive any updates that have been sent to the subscribers while the standby was down or was recovering. When the active and the standby data stores have been synchronized, then the standby resumes propagating changes to the subscribers.
For details about setting up an active standby pair, see "Setting up an active standby pair with no cache groups".
Figure 1-5 illustrates a full replication scheme in which the entire master data store is replicated to the subscriber.
Figure 1-5 Replicating the entire master data store
You can also configure your master and subscriber data stores to selectively replicate some elements in a master data store to subscribers. Figure 1-6 shows examples of selective replication. The left side of the figure shows a master data store that replicates the same selected elements to multiple subscribers, while the right side shows a master that replicates different elements to each subscriber.
Figure 1-6 Replicating selected elements to multiple subscribers
So far in this chapter, we have described unidirectional replication, where a master data store sends updates to one or more subscriber data stores. However, you can also configure data stores to operate bidirectionally, where each data store is both a master and a subscriber.
These are basic ways to use bidirectional replication:
Consider the example shown in Figure 1-7, where the accounts for Chicago are processed on data store A while the accounts for New York are processed on data store B.
Figure 1-7 Split workload bidirectional replication
In a distributed workload replication scheme, user access is distributed across duplicate application/data store combinations that replicate any update on any element to each other. In the event of a failure, the affected users can be quickly shifted to any application/data store combination.The distributed workload configuration is shown in Figure 1-8. Users access duplicate applications on each data store, which serves as both master and subscriber for the other data store.
Figure 1-8 Distributed workload configuration
When data stores are replicated in a distributed workload configuration, it is possible for separate users to concurrently update the same rows and replicate the updates to one another. Your application should ensure that such conflicts cannot occur, that they be acceptable if they do occur, or that they can be successfully resolved using the conflict resolution mechanism described in Chapter 13, "Resolving Replication Conflicts".
Note:Do not use a distributed workload configuration with the return twosafe return service.
Propagators are useful for optimizing replication performance over lower-bandwidth network connections, such as those between servers in an intranet. For example, consider the direct replication configuration illustrated in Figure 1-9, where a master directly replicates to four subscribers over an intranet connection. Replicating to each subscriber over a network connection in this manner is an inefficient use of network bandwidth.
Figure 1-9 Master replicating directly to multiple subscribers over a network
For optimum performance, consider the configuration shown in Figure 1-10, where the master replicates to a single propagator over the network connection. The propagator in turn forwards the updates to each subscriber on its local area network.
Figure 1-10 Master replicating to a single propagator over a network
Propagators are also useful for distributing replication loads in configurations that involve a master data store that must replicate to a large number of subscribers. For example, it is more efficient for the master to replicate to three propagators, rather than directly to the 12 subscribers as shown in Figure 1-11.
Figure 1-11 Using propagators to replicate to many subscribers
Note:Each propagator is one-hop, which means that you can forward an update only once. You cannot have a hierarchy of propagators where propagators forward updates to other propagators.
As described in Oracle In-Memory Database Cache User's Guide, a cache group is a group of tables stored in a central Oracle database that are cached in a local Oracle In-Memory Database Cache (IMDB Cache). This section describes how cache groups can be replicated between TimesTen data stores. You can achieve high availability by using an active standby pair to replicate asynchronous writethrough cache groups or read-only cache groups.
This section describes the following ways to replicate cache groups:
See Chapter 5, "Administering an Active Standby Pair with Cache Groups" for details about configuring replication of cache groups.
An ASYNCHRONOUS WRITETHROUGH (AWT) cache group can be configured as part of an active standby pair with optional read-only subscribers to ensure high availability and to distribute the application workload. Figure 1-12 shows this configuration.
Figure 1-12 AWT cache group replicated by an active standby pair
Application updates are made to the active data store, the updates are replicated to the standby data store, and then the updates are asynchronously written to the Oracle database by the standby. At the same time, the updates are also replicated from the standby to the read-only subscribers, which may be used to distribute the load from reading applications. The tables on the read-only subscribers are not in cache groups.
When there is no standby data store, the active both accepts application updates and writes the updates asynchronously to the Oracle database and the read-only subscribers. This situation can occur when the standby has not yet been created, or when the active fails and the standby becomes the new active. TimesTen reconfigures the AWT cache group when the standby becomes the new active.
If a failure occurs on the node where the active data store resides, the standby node becomes the new active node. TimesTen automatically reconfigures the AWT cache group so that it can be updated directly by the application and continue to propagate the updates to Oracle asynchronously.
You can recover from a complete failure of a site by creating a special disaster recovery read-only subscriber on a remote site as part of the active standby pair replication configuration. Figure 1-13 shows this configuration.
Figure 1-13 Disaster recovery configuration with active standby pair
The standby data store sends updates to cache group tables on the read-only subscriber. This special subscriber is located at a remote disaster recovery site and can propagate updates to a second Oracle database, also located at the disaster recovery site. You can set up more than one disaster recovery site with read-only subscribers and Oracle databases. See "Using a disaster recovery subscriber in an active standby pair".
A read-only cache group enforces caching behavior in which committed updates on the Oracle tables are automatically refreshed to the corresponding TimesTen cache tables. Figure 1-14 shows a read-only cache group replicated by an active standby pair.
Figure 1-14 Read-only cache group replicated by an active standby pair
When the read-only cache group is replicated by an active standby pair, the cache group on the active data store is autorefreshed from the Oracle database and replicates the updates to the standby, where AUTOREFRESH is also configured on the cache group but is in the PAUSED state. In the event of a failure of the active, TimesTen automatically reconfigures the standby to be autorefreshed when it takes over for the failed master data store by setting the AUTOREFRESH STATE to ON.TimesTen also tracks whether updates that have been autorefreshed from the Oracle database to the active data store have been replicated to the standby. This ensures that the autorefresh process picks up from the correct point after the active fails, and no autorefreshed updates are lost.This configuration may also include read-only subscriber data stores.This allows the read workload to be distributed across many data stores. The cache groups on the standby data store replicate to regular (non-cache) tables on the subscribers.
In some replication configurations, you may find a need to keep sequences synchronized between two or more data stores. For example, you may have a master data store containing a replicated table that uses a sequence to fill in the primary key value for each row. The subscriber data store is used as a hot backup for the master data store. If updates to the sequence's current value are not replicated, insertions of new rows on the subscriber after the master has failed could conflict with rows that were originally inserted on the master.
TimesTen replication allows the incremented sequence value to be replicated to subscriber data stores, ensuring that rows in this configuration inserted on either data store does not conflict. See "Replicating sequences" for details on writing a replication scheme to replicate sequences.
If a table with a foreign key configured with ON DELETE CASCADE is replicated, then the matching foreign key on the subscriber must also be configured with ON DELETE CASCADE. In addition, you must replicate any other table with a foreign key relationship to that table. This requirement prevents foreign key conflicts from occurring on subscriber tables when a cascade deletion occurs on the master data store.
TimesTen replicates a cascade deletion as a single operation, rather than replicating to the subscriber each individual row deletion which occurs on the child table when a row is deleted on the parent. As a result, any row on the child table on the subscriber data store, which contains the foreign key value that was deleted on the parent table, is also deleted, even if that row did not exist on the child table on the master data store.
The aging configuration on replicated tables and cache groups must be identical on every peer data store.
If the replication scheme is an active standby pair, then aging is performed only on the active data store. Deletes that result from aging are then replicated to the standby data store. The aging configuration must be set to ON on both the active and standby data stores. TimesTen automatically determines which data store is actually performing the aging based on its current role as active or standby.
In a replication scheme that is not an active standby pair, aging is performed individually in each data store. Deletes performed by aging are not replicated to other data stores.
When an asynchronous writethrough cache group is in a data store that is replicated by an active standby pair, delete operations that result from aging are not propagated to the Oracle database.