Replication is the mechanism that automatically copies directory data and changes from one directory server to another directory server. With replication, you can copy a directory tree or subtree that is stored in its own suffix between servers.
You cannot copy the configuration or monitoring information subtrees.
By replicating directory data across servers, you can reduce the access load on a single machine, improving server response time and providing read scalability. Replicating directory entries to a location close to your users also improves directory response time. Replication is generally not a solution for write scalability.
The replication mechanism is described in detail in Chapter 4, Directory Server Replication, in Sun Java System Directory Server Enterprise Edition 6.3 Reference. The following section provides basic information that you need to understand before reviewing the sample topologies described later in this chapter.
A database that participates in replication is defined as a replica.
Directory Server distinguishes between three kinds of replicas:
Master or read-write replica. A read-write database that contains a master copy of the directory data. A master replica can process update requests from directory clients. A topology that contains more than one master is called a multi-master topology.
Consumer replica. A read-only database that contains a copy of the information in the master replica. A consumer replica can process search requests from directory clients but refers update requests to master replicas.
Hub replica. A read-only database (like a consumer replica) that is stored on a Directory Server that supplies one or more consumer replicas.
The following figure illustrates the role of each of these replicas in a replication topology.
The previous figure is for illustration purposes only and is not necessarily a recommended topology. Directory Server6.x supports an unlimited number of masters in a multi-master topology. A master-only topology is recommended in most cases.
A Directory Server that replicates to other servers is called a supplier. A Directory Server that is updated by other servers is called a consumer. The supplier replays all updates on the consumer through specially designed LDAP v3 extended operations. In terms of performance, a supplier is therefore likely to be a demanding client application for the consumer.
A server can be both a supplier and a consumer, as in the following situations:
In multi-master replication, a master replica is mastered on two different Directory Servers. Each server acts as a supplier and a consumer of the other server.
When the server contains a hub replica, the server receives updates from a supplier and replicates the changes to consumers.
A server that plays the role of a consumer only is called a dedicated consumer.
For a master replica, the server must do the following:
Respond to update requests from directory clients
Maintain historical information and a change log
Initiate replication to consumers
The server that contains the master replica is responsible for recording any changes made to the master replica and for replicating these changes to consumers.
For a hub replica, the server must do the following:
Respond to read requests
Refer update requests to the servers that contain a master replica
Maintain historical information and a change log
Initiate replication to consumers
For a consumer replica, the server must do the following:
Respond to read requests
Maintain historical information
Refer update requests to the servers that contain a master replica
In a multi-master replication configuration, data can be updated simultaneously in different locations. Each master maintains a change log for its replica. The changes that occur on each master are replicated to the other servers.
Multi-master configurations have the following advantages:
Automatic write failover occurs when one master is inaccessible.
Updates can be made on a local master in a geographically distributed environment.
Multi-master replication uses a loose consistency replication model. This means that the same entries may be modified simultaneously on different servers. When updates are sent between the two servers, any conflicting changes must be resolved. Various attributes of a WAN, such as latency, can increase the chance of replication conflicts. Conflict resolution generally occurs automatically. A number of conflict rules determine which change takes precedence. In some cases conflicts must be resolved manually. For more information, see Solving Common Replication Conflicts in Sun Java System Directory Server Enterprise Edition 6.3 Administration Guide.
The number of masters that are supported in a multi-master topology is theoretically unlimited. The number of consumers and hubs is also theoretically unlimited. However, the number of consumers to which a single supplier can replicate depends on the capacity of the supplier server. You can use the SLAMD Distributed Load Generation Engine (SLAMD) to assess the capacity of the supplier server. For information about SLAMD, and to download the SLAMD software, see http://www.slamd.com.
The smallest unit of replication is the suffix. The replication mechanism requires one suffix to correspond to one database. You cannot replicate a suffix, or namespace, that is distributed over two or more databases using custom distribution logic. The unit of replication applies to both consumers and suppliers, which means that you cannot replicate two suffixes to a consumer that holds only one suffix.
Every server that acts as a supplier maintains a change log. A change log is a record that describes the modifications that have occurred on a master replica. The supplier replays these modifications to its consumers. When an entry is modified, renamed, added, or deleted, a change record that describes the LDAP operation is recorded in the change log.
Directory Server uses replication agreements to define how replication occurs between two servers. A replication agreement describes replication between one supplier and one consumer.
A replication agreement identifies the following:
The suffix to replicate
The consumer server to which the data is pushed
The times during which replication can occur
The bind DN and credentials that the supplier must use to bind to the consumer
How the connection is secured, SSL or client authentication, for example
Information about the replication status for this particular agreement
Information about replication filtering
In versions of Directory Server prior to Directory Server 6.x, updates were replicated in chronological order. In this version of the product, updates can be prioritized for replication. Priority is a boolean feature, it is on or off. There are no levels of priority. In a queue of updates waiting to be replicated, updates with priority are replicated before updates without priority. In a queue of updates waiting to be replicated, updates with priority are replicated before updates without priority.
The priority rules are configured according to the following parameters:
The identity of the client
The type of update
The entry or subtree that was updated
The attributes changed by the update
For more information, see Prioritized Replication in Sun Java System Directory Server Enterprise Edition 6.3 Reference.
A successful replicated directory service requires comprehensive testing and analysis in a production environment. However, the following basic calculation enables you to start designing a replicated topology. The sections that follow use the result of this calculation as the basis of the replicated topology design.
Estimate the maximum number of searches per second that are required at peak usage time.
This estimate can be called Total searches.
Test the number of searches per second that a single host can achieve.
This estimate can be called Searches per host. Note that this should be evaluated with replication enabled.
The number of searches that a host can achieve is affected by several variables. Among these are the size of the entries, the capacity of the host, and the speed of the network. A number of third party performance testing tools are available to assist you in conducting these tests. The SLAMD Distributed Load Generation Engine (SLAMD) is an open source Java application designed for stress testing and performance analysis of network-based applications. SLAMD can be used effectively to perform this part of the replication assessment. For information about SLAMD, and to download the SLAMD software, see http://www.slamd.com.
Calculate the number of hosts that are required.
Number of hosts = Total searches / Searches per host
Replication can balance the load on Directory Server in the following ways:
By spreading search activities across several servers
By dedicating specific servers to specific tasks or applications
Generally, if the Number of hosts calculated in Assessing Initial Replication Requirements is about 16, or not significantly larger, your topology should include only master servers in a fully connected topology. Fully connected means that every master replicates to every other master in the topology.
The Number of hosts is approximate and depends on the hardware and other details of the deployment.
The following figure assumes that the Number of hosts is two. LDAP operations are divided between two master servers, based on the type of client application. This strategy reduces the load that is placed on each server and increases the total number of operations that can be served by the deployment.
For a similar scenario in a global deployment, see Using Multi-Master Replication Over a WAN.
If your deployment requires a Number of hosts significantly larger than 16, you might need to add dedicated consumers to the topology.
The following figure assumes that the Number of hosts is 24 and, for simplicity, shows only a portion of the topology. (The remaining 10 servers would have an identical configuration, with a total of 8 masters and 16 consumers.
A change log can be enabled on any of these consumers if you need to do the following:
Promote the consumer to a master in the event of an outage
Perform a binary initialization from a master to any one of the consumers
If the Number of hosts is several hundred, you might want to add hubs to the topology. In such a case, there should be more hubs than masters, with up to 10 hubs for each master. Each hub should handle replication to only 20 consumers at most.
No topology should have the same number of hubs as masters, or the same number of hubs as consumers.
When the Number of hosts is large, the use of server groups can simplify the topology and improve resource usage. In a topology with 16 masters, the use of four server groups, each containing four masters, is easier to manage than 16 fully meshed masters.
Setting up a such a topology involves the following steps:
Configure the 16 masters, without any replication agreements.
Create four server groups and include four masters in each group.
Set up replication agreements between all the masters in a single group.
Set up replication agreements between the first master of each group, the second master of each group, and so forth.
The following figure shows the resulting topology.