You can use the deduplication (dedup) property to remove redundant data from your ZFS file systems. If a file system has the dedup property enabled, duplicate data blocks are removed synchronously. The result is that only unique data is stored, and common components are shared between files. For example:
# zfs set dedup=on tank/home
Do not enable the dedup property on file systems that reside on production systems until you perform the following steps to determine if your system can support data deduplication.
Determine if your data would benefit from deduplication space savings. If your data is not dedup-able, there is no point in enabling dedup. Running the following command is very memory intensive:
# zdb -S tank Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 2.27M 239G 188G 194G 2.27M 239G 188G 194G 2 327K 34.3G 27.8G 28.1G 698K 73.3G 59.2G 59.9G 4 30.1K 2.91G 2.10G 2.11G 152K 14.9G 10.6G 10.6G 8 7.73K 691M 529M 529M 74.5K 6.25G 4.79G 4.80G 16 673 43.7M 25.8M 25.9M 13.1K 822M 492M 494M 32 197 12.3M 7.02M 7.03M 7.66K 480M 269M 270M 64 47 1.27M 626K 626K 3.86K 103M 51.2M 51.2M 128 22 908K 250K 251K 3.71K 150M 40.3M 40.3M 256 7 302K 48K 53.7K 2.27K 88.6M 17.3M 19.5M 512 4 131K 7.50K 7.75K 2.74K 102M 5.62M 5.79M 2K 1 2K 2K 2K 3.23K 6.47M 6.47M 6.47M 8K 1 128K 5K 5K 13.9K 1.74G 69.5M 69.5M Total 2.63M 277G 218G 225G 3.22M 337G 263G 270G dedup = 1.20, compress = 1.28, copies = 1.03, dedup * compress / copies = 1.50
If the estimated dedup ratio is greater than 2, then you might see dedup space savings.
In this example, the dedup ratio (dedup = 1.20) is less than 2, so enabling dedup is discouraged.
Make sure your system has enough memory to support dedup as follows:
Each in-core dedup table entry is approximately 320 bytes.
Multiply the number of allocated blocks times 320. For example:
in-core DDT size = 2.63M x 320 = 841.60M
Dedup performance is best when the deduplication table fits into memory. If the dedup table has to be written to disk, then performance will decrease. If you enable deduplication on your file systems without sufficient memory resources, system performance might degrade during file system related operations. For example, removing a large dedup-enabled file system without sufficient memory resources might impact system performance.