Try to determine if the replication divergence is a result of low disk performance on the consumer using the output of the iostat tool. For more information about diagnosing disk performance problems, see Example: Troubleshooting a Replication Problem Using RUVs and CSNs.
Replication divergence is typically the result of one of the following:
The supplier is not fast enough when sending the data to the consumer. For example, the supplier's changelog has low in-memory cache settings. Confirm these settings by looking at the nsslapd-cachememsize and nslapd-cachesize attribute values in the cn=changelog5,cn=config entry.
The nsslapd-cachememsize attribute specifies the changelog, or database, cache size in terms of the available memory space. The nslapd-cachesize attribute specifies the replication changelog, or database, cache size in terms of the number of entries it can hold.
The network's capacity is not large enough to guarantee transport speed at the rate that updates are generated. The network capacity may the problem when operating over a very low bandwidth.
On Directory Server 5.1, the network latency is too large to guarantee transport speed at the rate that updates are generated. Network latency can cause problems with Directory Server 5.1 because the replication transport protocol is synchronous.
Consumer not fast enough to apply the changes is receives. For example, consumer speed can be an issue when disk usage is saturated or when a problem occurs when replication is happening in parallel (unindexed searches, for example).
If you are working on the 5.1 version of Directory Server and are experiencing replication divergence, it may be a result of protocol limits. Replication in 5.1 is synchronous, and therefore is not supported over a WAN. If you are replicating over a WAN, you must upgrade.
If replicating over a LAN, verify the network latency between the supplier and consumers using the ping command. In the 5.1 version of Directory Server, a supplier can only send changes once it receives an acknowledgement from the consumer. This results in consumer downtime that may resemble a halt when in fact the exchange is only slow. For example, you may update a password, but the new password does not go into effect immediately, giving you the impression that you are experiencing a replication divergence. Analyze the access log of the supplier and see how many updates are received, second by second. For example, the supplier access log should show varied traffic for each second, such as:
13:07:04 14 13:07:05 10 13:07:06 15 13:07:07 5 |
Next, look in the access log of the consumer. It may show continuous updates, suggesting a bottleneck:
13:07:04 8 13:07:05 8 13:07:06 8 13:07:07 8 |
If you are experiencing a problem of this kind, it may be the result of your method of network access, bandwidth, or small links.