How Replication Works

The topics in this section describe the mechanics involved in the replication process and how specific functionality is achieved.

Replication Initialization
Directory Server Change Processing
Replication Server Selection
Change Replay
Auto Repair
Directory Server Crashes
Replication Server Crashes

Replication Initialization

Before a server can participate in a replicated topology, that server must be initialized with data. That is, a complete data set must be copied onto the server in some way. For information about the ways in which a server can be initialized with data, see Initializing a Replicated Server With Data in Oracle Fusion Middleware Administration Guide for Oracle Unified Directory.

Directory Server Change Processing

When a modification occurs on a directory server, replication code on the directory server performs the following tasks:

Assigns a change number
Generates historical information
Forwards the change to a replication server
Updates the server state

Historical information is stored in the entry and must therefore be included in the operation before the server writes to the back end. The server uses the change number when generating historical information. The change number is therefore generated before the historical information. Both the change number and the historical information are performed as part of the pre-operation phase.

The operation is sent to the replication server before an acknowledgment for the update is sent to the client application that requested the operation. This ensures that a synchronous, assured replication mode can be implemented. For more information, see Assured Replication. The acknowledgment is therefore sent as part of the post-operation phase.

Changes are sent in the order defined by their change numbers. The correct order enables replication servers to make sure that all the changes are forwarded to other directory servers.

Because a directory server is multi-threaded, post-operation plug-ins can be called in a different order to pre-operation plug-ins, for the same operation. The replication code maintains a list of pending changes. This list includes changes that have started, and for which change numbers have already been generated, but that have not yet been sent to the replication server. Changes are added to the list of pending changes in the pre-operation phase. Changes are removed from the list when they are sent to the replication server. If a specific operation reaches the post-operation phase ahead of its change number-defined position, that operation waits until previous operations are sent before being sent to the replication server.

The server state is updated when the operation is sent to the replication server. For more information, see Replication Server State.

Replication Server Selection

When a directory server starts (or when the replication server to which it is connected is stopped), the directory server selects a suitable replication server for publishing and receiving changes. This section describes how the replication server is selected.

Replication Server Selection Algorithm

The directory server uses the following principles to select a suitable replication server:

Filtering. To begin, the directory server creates a list of “eligible” replication servers, from all of the configured replication servers in the topology. The list is created based on the following criteria:
1. Replication servers that have the same group ID (or geographic identifier) as the directory server.
2. Replication servers that have the same generation ID (initial data set) as the directory server.
3. Replication servers that include all of the latest updates that were generated from the directory server.
4. Replication servers that run in the same virtual machine as the directory server.
Note - These criteria are listed in order of preference. So, for example, if a replication server has the same generation ID (criterion 2) as the directory server but does not have the same group ID (criterion 1), it will not be included in the list, unless no replication server in the topology has the same group ID as the directory server.
Load Balancing. When the directory server has compiled a list of eligible replication servers, it selects a replication server in a manner that balances the load across all the replication servers in the topology. This selection is made in accordance with the replication server weight in the topology. For more information, see Replication Server Load Balancing.

Replication Server Load Balancing

In large topologies with several directory servers and several replication servers, it is more efficient to spread the directory servers out across the replication servers in a predefined manner. This is particularly important if the replication servers are running on different types of machines, with different capabilities. If the estimated “global power” of the machines differs significantly from one replication server to another, it is useful to balance the load on the replication servers according to their power.

You can configure the proportional weight of a replication server so that the number of directory servers connecting to each replication server is balanced efficiently. Replication server weight is defined as an integer (1..n). Each replication server in a topology has a default weight of 1. This weight only has meaning in its comparison to the weights of other replication servers in the topology.

The replication server weight determines the proportion of the directory servers currently in the topology that should connect to this particular replication server. The replication server weight is configured as a fraction of the estimated global power of all the replication servers in the topology. For example, if replication server A is estimated to be twice as powerful as replication server B, the weight of replication server A should be twice the weight of replication server B.

The weight of a particular replication server can be represented as (ⁿ/_d) where n is the weight of the replication server and d is the sum of the weights of all the replication servers in the topology.

For information about configuring the replication server weight, see Configuring the Replication Server Weight in Oracle Fusion Middleware Administration Guide for Oracle Unified Directory.

Change Replay

The replay of changes on replicated directory servers is efficient on multi-core and multi-CPU systems. On a directory server, multiple threads read the changes sent by the replication server.

Dependency information is used to decide whether an operation can be replayed immediately. The server checks the server state and the list of operations on which the current operation depends to determine whether those operations have been replayed. If the operations have not been replayed, the server puts the operation in a queue that holds dependency operations. If the operation can be replayed, the server builds an internal operation from information sent by replication servers. The server then runs the internal replay operation.

Internal replay operations built from the operations that are sent by a replication server can conflict with prior operations. Such internal operations cannot therefore always be replayed as if they were taking place on the original directory server. The server checks for conflicts when processing the handleConflictResolution phase.

In the majority of cases, the internal replay operations do not conflict with prior operations. In such cases, the handleConflictResolution phase does nothing. The replication code is therefore optimized to return quickly.

When a conflict does occur, the handleConflictResolution code takes the appropriate action to resolve the conflict. For modify conflicts, the handleConflictResolution code changes the modifications to retain the most recent changes.

When conflict resolution is handled, historical information is updated as for local operations. The operation can then be processed by the core server. Finally, at the end of the operation, the server state is updated.

After completing an operation, the server thread processing the operation checks whether an operation in the dependency queue was waiting for the current operation to complete. If so, that operation is eligible to be replayed, so the thread starts the replay process for the eligible operation. If not, the thread listens for further operations from the replication server.

Auto Repair

Despite efforts to keep servers in sync, directory servers can begin to show incoherent data. Typically, this occurs in the following circumstances:

A disk error taints the stored data
A memory error leads to an error in processing data
A software bug leads to bad data or missing changes

In such cases, tracking and replaying changes is not sufficient to synchronize the incoherent data.

An automatic repair mechanism is provided, which leverages historical information inside entries to determine what the coherent data should be. The replication mechanism then repairs the data on directory servers where the data is bad or missing. The auto repair mechanism is implemented as an LDAP application, and runs on the hosts that run replication servers.

The auto repair application can run in different modes. Depending on the mode in which it is run, the auto repair application performs the following tasks:

Repairs inconsistencies manifested as an error when the server was replaying modifications
Repairs inconsistencies detected by the administrator
Periodically scans directory entries to detect and repair inconsistencies

Note - In the current directory server release, the auto repair mechanism must be run manually. For more information, see Detecting and Resolving Replication Inconsistencies in Oracle Fusion Middleware Administration Guide for Oracle Unified Directory.

Directory Server Crashes

If a directory server crashes, its connection to the replication server is lost. Recent changes that the directory server has processed and committed to its database might not yet have been transmitted to any replication server.

When a directory server restarts, therefore, it must compare its state with the server state of the replication servers to which the directory server connects. If the directory server detects that changes are missing and not yet sent to a replication server, the directory server constructs fake operations from historical information. The directory server sends these fake operations to its replication server.

Because the local server state is not saved after each operation, the directory server cannot trust its saved server state after a crash. Instead, it recalculates its server update state, based on historical information.

Replication Server Crashes

If a replication server crashes, directory servers connect to another replication server in the topology. The directory servers then check for and, if necessary, resend missing changes.

Skip Navigation Links
Exit Print View
	Oracle Fusion Middleware Architecture Reference for Oracle Unified Directory 11g Release 1 (11.1.1)