Managing Data Guarantees

Durability
Managing Data Consistency

All replicated applications are first transactional applications. This means that you have the standard data guarantee issues to consider, all of which have to do with how durable and consistent you want your data to be. Of course, considerations of this nature also play a role in your application's performance. These issues are even more important for replicated applications because replication adds additional dimensions to them.

Notably, in a replicated application you must decide how durable your data is, by deciding how careful the Master will be to make sure a data write has been written to disk on its various Replica nodes before completing the transaction.

Consistency also adds an additional dimension in a replicated application, because now you must decide how consistent the various nodes in the replication group will be relative to the Master at any given time. If no writes are being performed on the Master, all Replicas will eventually catch up to the Master and so be completely consistent with it. But for most HA applications, writes are occurring on the Master, and so it is possible for some number of your Replicas to lag behind the Master. What you have to decide, then, is how sensitive your application is to this kind of temporary inconsistency.

Note that your consistency requirements can be gated by your durability requirements. Durability, in turn, can be gated by any concerns you might have on write throughput. At the same time, your consistency requirement can have an affect on the read performance of your Replicas. It is therefore a mistake to think about any one of these requirements in the absence of the others.

Durability

One of the reasons you might be writing a replicated application is to achieve a higher durability guarantee than you can get with a traditional transactional application. In a traditional application, your data's durability is a function of how you perform your transactional commits, and how frequently you perform your backups. For this class of application, the strongest durability guarantee you can have is to use synchronous commits (the commit does not complete until the data is written to disk), coupled with very frequent backups of your environment.

The problem with a stand-alone application in which you are seeking a very high durability guarantee is that your write throughput will suffer. Synchronous commits require disk writes, and disk I/O is one of the most expensive operations you can ask a database to perform.

In order to increase write throughput in your transactional application, you may decide to use asynchronous commits that do not require the disk I/O to complete before the transaction commit completes. The problem with this is that your application can potentially crash before a transaction has been completely written to disk. This represents a loss of data, which is to say the data is not durable.

Replication can help with your data durability in a couple of ways. Most importantly, replication allows you to commit to the network. This means that when your Master commits a transaction, the results of that commit are sent to one or more nodes available over the network. Consequently, multiple disks, disk controllers, power supplies, and CPUs are used to ensure the data modification makes it to stable storage.

Usually JE makes the commit operation on the Master wait until it receives acknowledgements from some number of electable nodes before returning from the operation. However, if you want to increase write throughput, you can configure your Master to proceed without acknowledgements, and so return immediately from the commit operation (once the commit operation has met the local durability requirement). The price that you pay for this is a reduced durability guarantee. How reduced the guarantee is, is a function of the number of electable nodes in your replication group (the more you have, the higher your durability guarantee is) and the quality and stability of your network.

Alternatively, you can obtain an extremely high durability guarantee by configuring the Master to wait for all electable nodes to acknowledge a commit operation before returning from the operation. The price you pay for this very high guarantee is greatly reduced write throughput.

For information on configuring and managing durability guarantees for your replicated application, see Managing Durability.

Managing Data Consistency

Data consistency means that the data you thought you wrote to your environment is in fact written to your environment. It also means that you will never find partial records written to your environment.

In a replicated application, consistency also means that data which is available on the Master is also available on the Replicas.

A simple transactional application offers consistency guarantees that are enforced when you commit a transaction. Your replicated application also offers this consistency guarantee (because it is also a transactional application). For this reason, the environment on the Master is always absolutely consistent. But beyond that, you need to manage consistency for data across all the nodes in your replication group.

When you commit a transaction on the Master, your Replica nodes may or may not have the data changes performed by that transaction at the end of the commit. Whether they do depends on how high a durability guarantee you implemented for your Master (see the previous section). If, for example, you configured your Master to require acknowledgements from all electable nodes before returning from the commit, then the data will be consistently available across all of those nodes in the replication group, although not necessarily by secondary nodes. However, if you configured the Master such that no acknowledgements are necessary, then your data is probably not consistent across the replication group.

To ensure that read transactions on the Replicas see a sufficiently consistent view of the environment, you can set a consistency policy for each transaction. This policy describes how current the Replica must be before a transaction can be initiated on it. If the Replica is not current enough, the start of the transaction is delayed until the Replica has caught up.

There are two possible consistency policies. First, there is a time-based policy that describes how far back in time the Replica is allowed to lag behind the Master. Secondly, you can use a commit-based consistency policy that is based on the commit of a specified transaction. This policy is used to ensure the Replica is at least current enough to have the changes made by a specific transaction, and by all transactions committed prior to the specified transaction. The start of a transaction on a Replica can be delayed until the Replica can meet the consistency policy defined for that transaction.

This means that a stringent consistency policy can affect your Replica's read throughput. Transactions, even read-only transactions, cannot begin until the Replica is consistent enough. So if you have a Replica that has lagged far behind the Master, and which is having trouble catching up due to network latency or other issues, then read requests may stall, and perhaps even time out, which will affect the latency of your Replica's read requests, and perhaps even its overall availability for read requests. For this reason, give careful consideration to how well you want your Replica to perform on reads, versus how consistent you want the Replica to be with other nodes in the replication group.

For more information on managing consistency in your replicated application, see Managing Consistency.