The dedup Property

The dedup property controls whether duplicate data is removed from a file system. If a file system has the dedup property enabled, duplicate data blocks are removed synchronously. The result is that only unique data is stored and common components are shared between files.

Do not enable the dedup property on file systems that reside on production systems until you review the following considerations:

  1. Determine if your data would benefit from deduplication space savings. You can run the zdb -S command to simulate the potential space savings of enabling dedup on your pool. This command must be run on a quiet pool. If your data is not dedup-able, then there's not point in enabling dedup. For example:

    $ zdb -S tank
    Simulated DDT histogram:
    bucket              allocated                       referenced
    ______   ______________________________   ______________________________
    refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
    ------   ------   -----   -----   -----   ------   -----   -----   -----
    1         2.27M    239G    188G    194G    2.27M    239G    188G    194G
    2          327K   34.3G   27.8G   28.1G     698K   73.3G   59.2G   59.9G
    4         30.1K   2.91G   2.10G   2.11G     152K   14.9G   10.6G   10.6G
    8         7.73K    691M    529M    529M    74.5K   6.25G   4.79G   4.80G
    16          673   43.7M   25.8M   25.9M    13.1K    822M    492M    494M
    32          197   12.3M   7.02M   7.03M    7.66K    480M    269M    270M
    64           47   1.27M    626K    626K    3.86K    103M   51.2M   51.2M
    128          22    908K    250K    251K    3.71K    150M   40.3M   40.3M
    256           7    302K     48K   53.7K    2.27K   88.6M   17.3M   19.5M
    512           4    131K   7.50K   7.75K    2.74K    102M   5.62M   5.79M
    2K            1      2K      2K      2K    3.23K   6.47M   6.47M   6.47M
    8K            1    128K      5K      5K    13.9K   1.74G   69.5M   69.5M
    Total     2.63M    277G    218G    225G    3.22M    337G    263G    270G
    
    dedup = 1.20, compress = 1.28, copies = 1.03, dedup * compress / copies = 1.50

    If the estimated dedup ratio is greater than 2, then you might see dedup space savings.

    In the above example, the dedup ratio is less than 2, so enabling dedup is not recommended.

  2. Make sure your system has enough memory to support dedup.

    • Each in-core dedup table entry is approximately 320 bytes

    • Multiply the number of allocated blocks times 320. For example:

      in-core DDT size = 2.63M x 320 = 841.60M
  3. Dedup performance is best when the deduplication table fits into memory. If the dedup table has to be written to disk, then performance will decrease. For example, removing a large file system with dedup enabled will severely decrease system performance if the system does not meet the memory requirements described above.

  4. You cannot use deduplication in the case of datasets with encryption. For example, a filesystem and a volume are two different datasets and deduplication cannot match the two together.

When dedup is enabled, the dedup checksum algorithm overrides the checksum property. Setting the property value to verify is equivalent to specifying sha256,verify. If the property is set to verify and two blocks have the same signature, ZFS does a byte-by-byte comparison with the existing block to ensure that the contents are identical.

This property can be enabled per file system. For example:

$ zfs set dedup=on tank/home

You can use the zfs get command to determine if the dedup property is set.

Although deduplication is set as a file system property, the scope is pool-wide. For example, you can identify the deduplication ratio. For example:

$ zpool list tank
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
rpool   136G  55.2G  80.8G    40%  2.30x  ONLINE  -

The DEDUP column indicates how much deduplication has occurred. If the dedup property is not enabled on any file system or if the dedup property was just enabled on the file system, the DEDUP ratio is 1.00x.

You can use the zpool get command to determine the value of the dedupratio property. For example:

$ zpool get dedupratio export
NAME   PROPERTY    VALUE  SOURCE
rpool  dedupratio  3.00x  -

This pool property illustrates how much data deduplication this pool has achieved.