Deduplicated Replication

Deduplicated replication provides the ability to reduce the amount of data sent over the network by replication jobs. This feature is useful for reducing the on-the-wire data bandwidth requirements of replication, especially when using a high-latency, low-bandwidth, high-cost network.

Note:

This feature imposes a cost in the form of pre-processing and increased memory overhead. The effectiveness of deduplication is highly data dependent, so it is strongly recommended to verify the deduplication savings with representative datasets prior to using this feature in a production environment. Deduplicated replication is more efficient when there is more duplicate data.

Deduplicated replication is disabled by default. To enable deduplicated replication for individual replication actions, click Enable deduplication in the Add Replication Action dialog box in the BUI, or set the dedup property to true in the CLI.

Deduplicated Replication Statistics

In the CLI, each replication update action has a stats node. The stats node records information about the most recent replication update, as well as the accumulated statistics over the lifetime of the replication action. To view statistics for a specific update that was not the most recent update, see the finish alert for the update as described in Start and Finish Alerts.

These replication action stats node properties quantify:

  • On-disk compression benefits

  • Deduplication benefits

  • Replication data stream compression benefits

  • Replication update duration

  • Deduplication tables construction time (before sending data)

  • Deduplication tables maximum memory consumption

Table "Replication Action stats Node Properties (CLI Read-Only)" in Replication Action Properties describes the action stats node properties of a deduplicated replication stream. See especially properties with dedup and dd_ in their names.

Measuring Deduplicated Replication Statistics

When deduplication is enabled for a replication stream, the data is transformed through several layers of deduplication and compression. Data rates are measured and recorded as the data is transformed.

To determine whether deduplication was effective for the replication action, examine the replication statistics in the stats node of a replication action in the CLI, or in finish alerts in the BUI or the CLI.

Single Deduplicated Replication Update Benefits Comparison

  • In the BUI, use the replication finish alerts to compare the phys_bytes and after_dedup statistics to evaluate the benefit of deduplicated replication. For information about replication finish alerts, see Start and Finish Alerts.

  • In the CLI, use the replication finish alerts to compare the phys_bytes and after_dedup statistics or use the replication action stats node to compare last_phys_bytes and last_after_dedup statistics to evaluate the benefit of deduplicated replication. For information about statistics in the stats node, see table "Replication Action stats Node Properties (CLI Read-Only)" in Replication Action Properties.

Averaged Deduplicated Replications Updates Benefits Comparison

To determine the average benefit of all deduplicated replication updates performed by this replication action, use the replication action stats node to compare statistics dd_total_phys_bytes and dd_total_after_dedup. For information about statistics in the stats node, see table "Replication Action stats Node Properties (CLI Read-Only)" in Replication Action Properties.