Chapter 12. Durability Guarantees

Table of Contents

Setting Acknowledgment-Based Durability Policies
Setting Synchronization-Based Durability Policies
Setting Durability Guarantees

Writes are performed in the KV store by performing the write operation (be it a creation, update, or delete operation) on a master node. As a part of performing the write operation, the master node will usually make sure that the operation has made it to stable storage before considering the operation complete.

The master node will also transmit the write operation to the replica nodes in its replication group. It is possible to ask the master node to wait for acknowledgments from its replicas before considering the operation complete.

The replicas, in turn, will not acknowledge the write operation until they have applied the operation to their own database.

The sum total of all this write activity is called the durability guarantee. That is, a durability guarantee is a policy which describes how strongly persistent your data is in the event of some kind of catastrophic failure within the store. (Examples of a catastrophic failure are power outages, disk crashes, physical memory corruption, or even fatal application programming errors.)

A high durability guarantee means that there is a very high probability that the write operation will be retained in the event of a catastrophic failure within the store. A low durability guarantee means that the write is very unlikely to be retained in the event of a catastrophic failure.

The higher your durability guarantee, the slower your write-throughput will be in the store. This is because a high durability guarantee requires a great deal of disk and network activity.

Usually you want some kind of a durability guarantee, although if you have highly transient data that changes from run-time to run-time, you might want to set the durability guarantee for that data to the lowest possible level.

Durability guarantees include two types of information: acknowledgment guarantees and synchronization guarantees. These two types of guarantees are described in the next sections. We then show how to set a durability guarantee.

Setting Acknowledgment-Based Durability Policies

Whenever a master node performs a write operation (create, update or delete), it must send that operation to its various replica nodes. The replica nodes then apply the write operation(s) to their local databases so that the replicas are consistent relative to the master node.

Upon successfully applying write operations to their local databases, replicas send an acknowledgment message back to the master node. This message simply says that the write operation was received and successfully applied to the replica's local database.

An acknowledgment-based durability policy describes whether the master node will wait for these acknowledgments before considering the write operation to have completed successfully. You can require the master node to wait for no acknowledgments, acknowledgments from a simple majority of replica nodes, or acknowledgments from all replica nodes.

The more acknowledgments the master requires, the slower its write performance will be. Waiting for acknowledgments means waiting for a write message to travel from the master to the replicas, then for the write operation to be performed at the replica (this may mean disk I/O), then for an acknowledgment message to travel from the replica back to the master. From a computer application's point of view, this can all take a long time.

When setting an acknowledgment-based durability policy, you can require acknowledgment from:

  • All replicas. That is, all of the replica nodes in the replication group. Remember that your store has more than one replication group, so the master node is not waiting for acknowledgments from every machine in the store.

  • No replicas. In this case, the master returns with normal status from the write operation as soon as it has met its synchronization-based durability policy. These are described in the next section.

  • A simple majority of replicas. That is, if the replication group has 5 replica nodes, then the master will wait for acknowledgments from 3 nodes.