Linux Page Cache Tuning

Tune your page cache to permit the OS to write asynchronously to disk whenever possible. This allows background writes, which minimize the latency resulting from serial write operations such as fsync. This also helps with write stalls which occur when the file system cache is full and needs to be flushed to disk to make room for new writes. We have observed significant speedups (15-20%) on insert-intensive benchmarks when these parameters are tuned as described below.

Place the following commands in /etc/sysctl.conf. Run

sysctl -p

to load the new settings so they can take effect without needing to reboot the machine.

# Set vm.dirty_background_bytes to 10MB to ensure that
# on a 40MB/sec hard disk a fsync never takes more than 250ms and takes
# just 125ms on average. The value of vm.dirty_background_bytes 
# should be increased on faster SSDs or I/O subsytems with higher
# throughput. You should increase this setting by the same proportion 
# as the relative increase in throughput. For example, for a typical SSD 
# with a throughput of 160MB/sec, vm.dirty_background_bytes should be set 
# to 40MB so fsync takes ~250ms. In this case, the value was increased by
# a factor of 4.

# IO calls effectively become synchronous(waiting for the underlying
# device to complete them). This setting helps minimize the
# possibility of a write request stalling in JE while holding the
# write log latch. 

# Ensures that data does not hang around in memory longer than
# necessary. Given JE's append-only style of writing, there is
# typically little benefit from having an intermediate dirty page
# hanging around, because it is never going to be modified. By
# evicting the dirty page earlier, its associated memory is readily
# available for reading or writing new pages, should that become
# necessary.

Earlier versions of the Linux kernel may not support vm.dirty_background_bytes. On these older kernels you can use vm.dirty_background_ratio instead. Pick the ratio that gets you closest to 10MB. On some systems with a lot of memory this may not be possible due to the large granularity associated with this configuration knob. A further impediment is that a ratio of 5 is the effective minimum in some kernels.


Use sysctl -a to verify that the parameters described here are set as expected.