Go to main content
Oracle® ZFS Storage Appliance Administration Guide, Release OS8.7.0

Exit Print View

Updated: July 2017
 
 

Deduplicated Replication

Deduplicated Replication provides the ability to reduce the amount of data sent over the network by replication jobs. This feature is useful for reducing the on-the-wire data bandwidth requirements of replication, especially when using a high-latency, low-bandwidth, high-cost network.


Note -  This feature imposes a cost in the form of pre-processing and increased memory overhead. The effectiveness of deduplication is highly data dependent, so it is strongly recommended to verify the deduplication savings with representative datasets prior to using this feature in a production environment. Deduplicated Replication is more efficient when there is more duplicate data.

Deduplicated Replication is disabled by default. It can be enabled for individual replication actions, as shown in the following BUI figure.

image:Image showing Dedupe property in Replication Action

Deduplicated Replication Statistics

Each replication action has a stats node, which records information about the most recent replication update, as well as the accumulated statistics over the lifetime of the replication action.

These stats fields quantify:

  • On-disk compression benefits

  • Deduplication benefits

  • Replication data stream compression benefits

  • Replication update duration

  • Deduplication tables construction time (before sending data)

  • Deduplication tables maximum memory consumption

The stats node of a deduplicated replication stream has the following read-only properties:

Table 135  Replication Action: stats Node Properties
Property Name
Description
logical_bytes
Number of bytes that the replication update data stream would have contained if the data on disk had not been compressed and without any subsequent compression or deduplication.
phys_bytes
Number of bytes in the internal replication data stream prior to replication deduplication or replication data stream compression.
after_dedup
Number of bytes in the internal replication data stream after any deduplication of the replication data stream.
to_network
Number of bytes that the replication data stream compression pipeline delivered to the network. This shows the consequence of replication data stream compression, if enabled.
duration
Total time required to perform the replication update.
dd_table_build
Time spent building the deduplication tables prior to the actual transmission of the replication update.
dd_table_mem
Maximum amount of memory that was consumed by the deduplication tables.

To list the stats node fields, first navigate to the replication action, enter the stats node and then enter get

hostname:shares testproj action-001> stats
hostname:shares testproj action-001 stats>
hostname:shares testproj action-001 stats> get
Properties:
          replica_data_timestamp = Thu Apr 21 2016 06:14:58 GMT+0000 (UTC)
                       last_sync = Thu Apr 21 2016 17:50:18 GMT+0000 (UTC)
                        last_try = Thu Apr 21 2016 17:50:18 GMT+0000 (UTC)
                     last_result = success
              last_logical_bytes = 5.80401479T
                 last_phys_bytes = 3.57996902T
                last_after_dedup = 953.489698G
                 last_to_network = 943.954802G
                   last_duration = 11:35:26
             last_dd_table_build = 02:57:10
               last_dd_table_mem = 3.5273976G
                   total_updates = 40
             total_logical_bytes = 232.16591T
                total_phys_bytes = 143.198761T
               total_after_dedup = 90.2222261T
                total_to_network = 90.0359976T
                  total_duration = 404:34:00
                dd_total_updates = 20
          dd_total_logical_bytes = 116.080296T
             dd_total_phys_bytes = 71.5993804T
            dd_total_after_dedup = 18.6228456T
             dd_total_to_network = 18.4366172T
               dd_total_duration = 231:48:40
            dd_total_table_build = 59:03:20
              dd_total_table_mem = 70.547952G

Recent replication statistics are also recorded as send alerts, which can be seen and accessed through the BUI and CLI. For more information, see Replication Alerts.

Measuring Deduplicated Replication Statistics

When deduplication is enabled for a replication stream, the data is transformed through several layers of deduplication and compression. Data rates are measured and recorded as the data is transformed. These statistics are recorded in the stats node of a replication action.

To determine if deduplication was sufficiently effective for the replication action, examine the replication statistics.

Single Deduplicated Replication Update Benefits Comparison

  • In the BUI, use the replication finish alerts to compare the phys_bytes and after_dedup statistics to gauge the benefit of deduplicated replication. For information on replication alerts, see Replication Alerts.

  • In the CLI, use the replication action stats node to compare last_phys_bytes and last_after_dedup statistics to gauge the benefit of deduplicated replication. For information on the stats node, see Deduplicated Replication Statistics.

Averaged Deduplicated Replications Updates Benefits Comparison

  • To gauge the average benefit of all deduplicated replication updates performed by this replication action, use the replication action stats node to compare statistics dd_total_phys_bytes and dd_total_after_dedup. For information on the stats node, see Deduplicated Replication Statistics.