Recovering from a Down Replica Set

If all elements in a single replica set are down or failed, the data stored in the down replica set is unavailable. In order to guard against full replica set failure, distribute your elements in a way that reduces the chances of full replica set failure.

See Assigning Hosts to Data Space Groups.

As described in Table 13-3, if you have a down or failed replica set, the outcome of preserving your data successfully may depend on how you set the Durability connection attribute. See Durability Settings.

Table 13-3 Potential for Transaction Recovery Based on Durability Value

Durability Value Affect on Transactions When a Replica Set Fails

1

Participants synchronously write a prepare-to-commit or commit log record to the transaction log for distributed transactions. This ensures that committed transactions have the best possible chance of being preserved. If a replica set goes down, all transaction log records have been durably committed to the file system and can be recovered by TimesTen Scaleout.

0

Participants asynchronously write prepare-to-commit and commit log records for distributed transactions. If an entire replica set goes down, transaction log records are not guaranteed to be durably committed to the file system. There is a chance for data loss, depending on how the elements within the replica set fail or go down.

The following sections describe what happens with new transactions after a replica set goes down or how the replica set recovers depends on the Durability connection attribute value.

Transaction Behavior with a Down Replica Set

The following list describes what occurs for your transaction when there is a down replica set.

  • Transactions with queries that access rows only within active replica sets (and no rows within a down replica set) succeed. Queries that try to access data within a down replica set fail. Your application should retry the transaction when the replica set has recovered.

    A global read with a partial results hint that does not require data from the down replica set succeeds.

    For example, if all elements in replica set 1 failed and the queries within the transaction require data from replica set 1, then the transaction fails. Your application should perform the transaction again.

  • Transactions with any DDL statement fail when there is a down replica set as DDL statements require all replica sets to be available. Your application should roll back the transaction.

  • Transactions with any DML statements fail if the transaction tries to update at least one row on elements in a down replica set. Your application should roll back the transaction. When Durability=0, this scenario may encounter data loss. See Recovering a Failed Replica Set When Durability=0.

  • When Durability=1, transactions with DML that do not require data from the down replica set succeeds. For example, if all elements in replica set 1 failed, then the transaction succeeds only if any SELECT, INSERT, INSERT...SELECT, UPDATE or DELETE statements do not depend on data that was stored in replica set 1.

Durably Recovering a Failed Replica Set When Durability=1

The following sections describe the process for recovery of a failed replica set when Durability=1.

If all elements in the replica set go down, even temporarily, TimesTen Scaleout might be able to automatically recover the full replica set (if the initial issue is resolved) by:

  1. Determining and recovering the seed element. The element that failed with the latest changes, known as the seed element, is recovered first. The seed element is recovered to the latest transaction in the checkpoint and transaction log files.

  2. After recovery of the element is complete, TimesTen Scaleout checks for in-doubt transactions.

    When an element is loaded from the file system (from checkpoint and transaction log files) to recover after a transient failure or unexpected termination, any two-phase commit transactions that were prepared, but not committed, are left pending. This is referred to as an in-doubt transaction. When a transaction has been interrupted, there may be a doubt of whether the entire transaction was committed with the two-phase commit protocol.

    • If there are no in-doubt transactions, operation proceeds as usual.

    • If there are in-doubt transactions, standard processing that includes this replica set does not continue until all in-doubt transactions are resolved. If there are any in-doubt transactions, TimesTen Scaleout checks the transaction log to determine whether the transaction committed or was prepared to commit on any of the participants. The transaction log records contain information about other participants in the transaction. See Table 13-4 for how TimesTen Scaleout resolves in-doubt transactions.

      If an element fails during this process and then comes back up after the transaction commits or rolls back, the element recovers itself by requesting the result of the other participating elements.

  3. After the seed element is recovered, the other elements in the replica set are recovered from the seed element using the duplicate and log-based catch up methods. See Recovering a Replica Set After an Element Goes Down for details on the duplicate and log-based catch up methods.

Table 13-4 How TimesTen Scaleout Resolves an In-Doubt Transaction

Failure Action

At least one participant received the commit log record; all other participants at least receive the prepare-to-commit log record.

The transaction commits on all participants

All participants in the transaction received the prepare-to-commit log record.

The transaction commits on all participants.

At least one participant did not receive the prepare-to-commit log record.

The transaction manager notifies all participants to undo the prepare-to-commit, which is a prelude to a roll back of the transaction.

  • If the transaction was processed with autocommit 1, then the transaction manager rolls back the transaction.

  • If the transaction was processed with autocommit 0, then the transaction manager throws an error informing the application that it must roll back the transaction.

However, if you cannot recover the elements in a down replica set, then you may need to either remove and replace one of the elements or evict the entire replica set. See Recovering When the Replica Set Has a Permanently Failed Element.

Recovering a Failed Replica Set When Durability=0

The following describes the process for recovery of a failed replica set when Durability=0.

If you set Durability=0, you are acknowledging that there is a chance of data loss when a replica set fails. However, TimesTen Scaleout attempts to avoid data loss if the elements fail at separate times.

  • If all but one element of the replica set fails, then TimesTen Scaleout attempts to switch the last remaining element in the replica set (when k >= 2) into durable mode. That is, in order to limit data loss (which would occur if the last remaining element fails when Durability=0), TimesTen Scaleout changes the durability behavior of the element as if it was configured with Durability=1.

    If TimesTen Scaleout can switch the last remaining element in the replica set into durable mode, then the participating element synchronously writes prepare-to-commit log records to the file system for distributed transactions. Then, if this element also fails so that the entire replica set is down, TimesTen Scaleout recovers the replica set from the transaction log records. Thus, no transaction is lost in this scenario and TimesTen Scaleout automatically recovers the replica set as when you have set Durability=1. See Durably Recovering a Failed Replica Set When Durability=1 for details on recovering after the single element is recovered.

  • If TimesTen Scaleout cannot switch the replica set into durable mode before the last remaining element fails, then you may encounter data loss depending on whether the replica set encounters a temporary or permanent failure.

    • Temporary replica set failure when elements are non-durable: Since no elements in the replica set synchronously wrote prepare-to-commit log records for distributed transactions that the replica set was involved in before going down, then any transactions that committed after the last successful epoch transaction are lost.

      If all elements show the waiting for seed status, then there was no switch into durable mode before the replica set went down. If this is the case, epoch recovery is necessary and any transactions committed after latest successful epoch transaction are lost. When the elements in this replica set recover, they may remain in the waiting for seed status, since none of the elements are able to recover with the transaction logs. Instead, you must perform epoch recovery by either recovering or evicting the replica set, followed by unloading and reloading the database. See Process When Replica Set Fails When in a Non-Durable State.

    • Permanent replica set failure: If you cannot recover any of the elements in the replica set, you may have to evict all elements. This results in a loss of the data on that replica set. See Recovering When the Replica Set Has a Permanently Failed Element.

Process When Replica Set Fails When in a Non-Durable State

When a replica set goes down and the state is non-durable, transactions may continue to commit into the database until TimesTen Scaleout realizes that the replica set is down. Once TimesTen Scaleout realizes that a replica set is down (after a failed epoch transaction execution), then the database is switched to read-only to minimize the number of lost transactions. During epoch recovery, the database is reloaded to the last successful epoch transaction, effectively losing any transactions that committed after that last successful epoch transaction. In this scenario, the value of the EpochInterval connection attribute not only determines the amount of time between the epoch transactions, but also determines the approximate amount of time during which you can lose committed transactions.

Note:

The database is set to read-only when the epoch transaction fails due to a down replica set; TimesTen Scaleout does not set the database to read-only if the epoch transaction fails for other reasons.

Figure 13-3 shows the actions across a time span of eight intervals.

Figure 13-3 Durability=0 and a Replica Set Fails

Description of Figure 13-3 follows
Description of "Figure 13-3 Durability=0 and a Replica Set Fails"
  1. An epoch transaction commits successfully.

  2. Transactions may continue after the successful epoch transaction. Any committed transactions after the last successful epoch transaction are lost after epoch recovery as neither element in the down replica set was able to durably flush the transaction logs.

  3. Replica set 1 goes down without either element switching to durable mode.

    Note:

    Sequences may be incremented while the replica set is down.

  4. Transactions may continue after the replica set goes down if the database has not yet been set to read-only. Any transactions that commit after the last successful epoch transaction are lost after epoch recovery as neither element in the down replica set was able to durably flush the transaction logs.

    Note:

    The behavior of transactions after a replica set goes down depends on the type of statements within the transactions, as described in Transaction Behavior with a Down Replica Set.

  5. The next epoch transaction fails since not all replica sets are up. TimesTen Scaleout informs all data instances that the database is now read-only. All applications will fail when executing a DML, DDL, or commit statements within open transactions. You must roll back each transaction.

    Note:

    The ttGridAdmin dbStatus command shows the state of the database, including if it is in read-only or read-write mode.

  6. The replica set must be recovered or evicted.

    • Recover the down replica set. If multiple replica sets are down, the database cannot enter read-write mode until all replica sets are recovered or replaced.

    • If you cannot recover any of the elements in the replica set, you may have to evict the replica set, which results in a loss of the data on that replica set. See Recovering When the Replica Set Has a Permanently Failed Element.

  7. You perform an epoch recovery by unloading and reloading the database to the last successful epoch transaction to recover the database consistently with only a partial data loss. Any transactions that commit after the last successful epoch are lost when the database is unloaded and reloaded to the last successful epoch transaction. See Load a Database into Memory (dbLoad) in Oracle TimesTen In-Memory Database Reference for information on the ttGridAdmin dbLoad command and Unload a Database (dbUnload) in Oracle TimesTen In-Memory Database Reference for information on the ttGridAdmin dbUnload command.

  8. A new epoch transaction is successful. Database is set to read-write. Usual transaction behavior resumes.

Note:

If you want to ensure that the data for a transaction is always recovered, you can promote a transaction to be an epoch transaction. See Epoch Transactions.