The dedup
Property
The dedup
property controls whether duplicate data is removed from a file system. If a file system has the dedup
property enabled, duplicate data blocks are removed synchronously. The result is that only unique data is stored and common components are shared between files.
Do not enable the dedup
property on file systems that reside on production systems until you review the following considerations:
-
Determine if your data would benefit from deduplication space savings. You can run the
zdb -S
command to simulate the potential space savings of enabling dedup on your pool. This command must be run on a quiet pool. If your data is not dedup-able, then there's not point in enabling dedup. For example:$ zdb -S tank Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 2.27M 239G 188G 194G 2.27M 239G 188G 194G 2 327K 34.3G 27.8G 28.1G 698K 73.3G 59.2G 59.9G 4 30.1K 2.91G 2.10G 2.11G 152K 14.9G 10.6G 10.6G 8 7.73K 691M 529M 529M 74.5K 6.25G 4.79G 4.80G 16 673 43.7M 25.8M 25.9M 13.1K 822M 492M 494M 32 197 12.3M 7.02M 7.03M 7.66K 480M 269M 270M 64 47 1.27M 626K 626K 3.86K 103M 51.2M 51.2M 128 22 908K 250K 251K 3.71K 150M 40.3M 40.3M 256 7 302K 48K 53.7K 2.27K 88.6M 17.3M 19.5M 512 4 131K 7.50K 7.75K 2.74K 102M 5.62M 5.79M 2K 1 2K 2K 2K 3.23K 6.47M 6.47M 6.47M 8K 1 128K 5K 5K 13.9K 1.74G 69.5M 69.5M Total 2.63M 277G 218G 225G 3.22M 337G 263G 270G dedup = 1.20, compress = 1.28, copies = 1.03, dedup * compress / copies = 1.50
If the estimated dedup ratio is greater than 2, then you might see dedup space savings.
In the above example, the dedup ratio is less than 2, so enabling dedup is not recommended.
-
Make sure your system has enough memory to support dedup.
-
Each in-core dedup table entry is approximately 320 bytes
-
Multiply the number of allocated blocks times 320. For example:
in-core DDT size = 2.63M x 320 = 841.60M
-
-
Dedup performance is best when the deduplication table fits into memory. If the dedup table has to be written to disk, then performance will decrease. For example, removing a large file system with dedup enabled will severely decrease system performance if the system does not meet the memory requirements described above.
-
You cannot use deduplication in the case of datasets with encryption. For example, a filesystem and a volume are two different datasets and deduplication cannot match the two together.
When dedup
is enabled, the dedup
checksum algorithm overrides the checksum
property. Setting the property value to verify
is equivalent to specifying sha256,verify
. If the property is set to verify
and two blocks have the same signature, ZFS does a byte-by-byte comparison with the existing block to ensure that the contents are identical.
This property can be enabled per file system. For example:
$ zfs set dedup=on tank/home
You can use the zfs get
command to determine if the dedup
property is set.
Although deduplication is set as a file system property, the scope is pool-wide. For example, you can identify the deduplication ratio. For example:
$ zpool list tank
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
rpool 136G 55.2G 80.8G 40% 2.30x ONLINE -
The DEDUP
column indicates how much deduplication has occurred. If the dedup
property is not enabled on any file system or if the dedup
property was just enabled on the file system, the DEDUP
ratio is 1.00x
.
You can use the zpool get
command to determine the value of the dedupratio
property. For example:
$ zpool get dedupratio export
NAME PROPERTY VALUE SOURCE
rpool dedupratio 3.00x -
This pool property illustrates how much data deduplication this pool has achieved.