|Previous Contents Index DocHome Next|
|iPlanet Directory Server 5.1 Deployment Guide|
Chapter 6 Designing the Replication Process
Replicating your directory contents increases the availability and performance of your directory. In Chapter 4 and Chapter 5, you made decisions about the design of your directory tree and your directory topology. This chapter addresses the physical and geographical location of your data, and specifically, how to use replication to ensure that your data is available when and where you need it.
This chapter discusses uses for replication and offers advice on designing a replication strategy for your directory environment. It contains the following sections:
Introduction to Replication
Introduction to Replication
Replication is the mechanism that automatically copies directory data from one Directory Server to another. Using replication, you can copy any directory tree or subtree (stored in its own database) between servers. The Directory Server that holds the master copy of the information, will automatically copy any updates to all replicas.
Replication enables you to provide a highly available directory service, and to geographically distribute your data. In practical terms, replication brings the following benefits:
By replicating directory trees to multiple servers, you can ensure your directory is available even if some hardware, software, or network problem prevents your directory client applications from accessing a particular Directory Server. Your clients are referred to another directory server for read and write operations. Note that to support write failover you must have a multi-master replication environment.
By replicating your directory tree across servers, you can reduce the access load on any given machine, thereby improving server response time.
Higher performance and reduced response times
By replicating directory entries to a location close to your users, you can vastly improve directory response times.
Local data management
Before defining a replication strategy for your directory information, you should understand how replication works. This section describes:
When you consider replication, you always start by making the following fundamental decisions:
What information you want to replicate.
Which server or servers hold the master copy, or supplier replica, of that information.
Which server or servers hold the read-only copy, or consumer replica, of the information. These decisions cannot be made effectively without an understanding of how the Directory Server handles these concepts. For example, when you decide what information you want to replicate, you need to know what is the smallest replication unit that the Directory Server can handle. The following sections contain definitions of concepts used by the Directory Server. This provides a framework for thinking about the global decisions you need to make.
A database that participates in replication is defined as a replica. There are several kinds of replicas:
Master replica: a read-write database that contains a master copy of the directory data. A master replica can process update requests from directory clients.
Consumer replica: a read-only database that contains a copy of the information held in the master replica. A consumer replica can process search requests from directory clients but refers update requests to the master replica. You can configure a Directory Server to manage several databases. Each database can have a different role in replication. For example, you could have a Directory Server that stores the dc=engineering,dc=siroe,dc=com suffix in a master replica, and the dc=sales,dc=siroe,dc=com suffix in a consumer replica.
A server that manages a master replica that it replicates to other servers is called a supplier server or master server. A server that manages a consumer replica that is updated by a different server is called a consumer server.
It is convenient to talk about the role of a server as a supplier or a consumer, even though it is not always accurate because a server can be both a supplier and a consumer. This is true in the following cases:
When the Directory Server manages a combination of master replicas and consumer replicas;
When the Directory Server acts as a hub supplier, that is, it receives updates from a master server and replicates the changes to consumer servers. For more information, refer to "Cascading Replication".
In multi-master replication, when a master replica is mastered on two different Directory Servers, each Directory Server acts as a supplier and a consumer of the other Directory Server. For more information, refer to "Multi-Master Replication". In iPlanet Directory Server 5.1, replication is always initiated by the supplier server, never by the consumer. This operation is called supplier-initiated replication. It allows you to configure a supplier server to push data to one or more consumer servers.
Earlier versions of the iPlanet Directory Server allowed consumer-initiated replication where you could configure consumer servers to pull data from a supplier server. This is replaced, in iPlanet Directory Server 5.1, by a procedure in which the consumer can prompt the supplier to send updates.
For any particular replica, the supplier server must:
Respond to read, add and modify requests from directory clients.
A consumer server must:
Respond to read requests.
In the special case of cascading replication, the hub supplier must:
Respond to read requests. For more information on cascading replication, refer to "Cascading Replication".
Every supplier server maintains a change log. A change log is a record that describes the modifications that have occurred on a supplier replica. The supplier server then replays these modifications to the replicas stored on consumer servers, or to other masters in the case of multi-master replication.
When an entry is modified, added or deleted, a change record describing the LDAP operation that was performed is recorded in the change log.
In earlier versions of Directory Server, the change log was accessible over LDAP. Now, however, it is intended only for internal use by the server. If you have applications that need to read the change log, you need to use the Retro Change Log Plug-in for backward compatibility. For more information, refer to iPlanet Directory Server Administrator's Guide.
Unit of Replication
In iPlanet Directory Server 5.1, the smallest unit of replication is a database. This means that you can replicate an entire database, but not a subtree within a database. Therefore, when you create your directory tree, you must take your replication plans into consideration. For more information on how to set up your directory tree, refer to Chapter 5 "Designing the Directory Topology."
The replication mechanism also requires that one database correspond to one suffix. This means that you cannot replicate a suffix (or namespace) that is distributed over two or more databases using custom distribution logic.
Directory Servers use replication agreements to define replication. A replication agreement describes replication between one supplier and one consumer. The agreement is configured on the supplier server. It identifies:
The database to replicate
The DN and credentials the supplier server must use to bind on the consumer, called the Replication Manager entry or supplier bind DN (for more information, refer to "Replication Identity")
How the connection is secured (SSL, client authentication)
When replication occurs between two servers, the consumer server authenticates the supplier when it binds to send replication updates. This authentication process requires that the entry used by the supplier to bind to the consumer is stored on the consumer server. This entry is called the Replication Manager entry, or supplier bind DN.
The Replication Manager entry, or any entry you create to fulfill that role, must meet the following criteria:
You must have at least one on every server that manages consumer replicas (or hub replicas).
When you configure replication between two servers, you must identify the Replication Manager (supplier bind DN) on both servers:
On the consumer server or hub supplier, when you configure the consumer replica or hub replica, you must specify this entry as the one authorized to perform replication updates.
Note In the Directory Server Console, this Replication Manager entry is referred to as the supplier bind DN, which may be misleading as the entry does not actually exist on the supplier server. It is called the supplier bind DN because it is the entry that must be present on the consumer so that it can authenticate the supplier when it binds to provide replication updates to the consumer.
Consistency refers to how closely the contents of replicated databases match each other at a given point in time. When you set up replication between two servers, part of the configuration is to schedule updates. With iPlanet Directory Server 5.1, it is always the supplier server that determines when consumer servers need to be updated, and initiates replication.
Directory Server offers the option of keeping replicas always synchronized, or of scheduling updates for a particular time of day, or day in the week. The advantage of keeping replicas always in sync is obviously that it provides better data consistency. The cost is the network traffic resulting from the frequent update operations. This solution is the best in cases where:
You have a reliable high-speed connection between servers In cases where you can afford to have looser consistency in data, you can choose the frequency of updates that best suits your needs or lowers the effect on network traffic. This solution is the best in cases where:
You have unreliable or intermittently available network connections (such as a dial-up connection to synchronize replicas). In the case of multi-master replication, the replicas on each master are said to be loosely consistent because at any given time, there can be differences in the data stored on each master. This is true even when you have selected to always keep replicas in sync, because:
There is a latency in the propagation of replication updates between masters.
Common Replication Scenarios
You need to decide how the updates flow from server to server and how the servers interact when propagating replication updates. There are three basic scenarios:
The following sections describe these methods and provide strategies for deciding the method that is most appropriate for your environment. You can also combine these basic scenarios to build the replication topology that best suits your needs.
In the most basic replication configuration, a master server copies a supplier replica directly to one or more consumer servers. In this configuration, all directory modifications occur on the supplier replica on the supplier server, and the consumer servers contain read-only copies of the data.
The supplier server maintains a change log that records all the changes made to the master replica. The supplier server also stores the replication agreement.
The consumer server stores the entry corresponding to the supplier bind DN, so that the consumer can authenticate the supplier when the supplier binds to send replication updates.
The supplier server must propagate all modifications to the consumer replicas. Figure 6-1 shows this simple configuration.
Figure 6-1    Single-Master Replication
Although Figure 6-1 shows just one consumer server, the supplier server can replicate to several consumer servers. The total number of consumer servers that a single supplier server can manage depends on the speed of your network and the total number of entries that are modified on a daily basis. However, you can reasonably expect a supplier server to maintain several consumer servers.
Multi-master configurations have the following advantages:
Automatic write failover when one supplier is inaccessible In a multi-master replication environment, master copies of the same information exist on two servers. This means that data can be updated simultaneously in two different locations. The changes that occur on each server are replicated to the other. This means that each server plays both roles of supplier and consumer.
When the same data is modified on both servers, there is a conflict resolution procedure to determine which change is kept. The Directory Server considers the valid change to be the most recent one.
Although two separate servers can have master copies of the same data, within the scope of a single replication agreement, there is only one supplier server and one consumer. So, to create a multi-master environment between two supplier servers that share responsibility for the same data, you need to create more than one replication agreement.The following figure shows this configuration:
Figure 6-2    Multi-Master Replication Configuration (Two Masters)
In this illustration, Supplier A and Supplier B each hold a supplier replica of the same data.
The number of masters or suppliers you can have in any replication environment is limited to two. However, the number of consumer servers that hold consumer replicas is not limited. Figure 6-3 shows the replication traffic in an environment with two master servers, and two consumer servers. This figure shows that the consumers can be updated by both masters. The master servers ensure that the changes do not collide.
Figure 6-3    Replication Traffic in a Multi-Master Environment
Cascading replication is very useful in the following cases:
When you need to balance heavy traffic loads: for example, because your supplier servers need to handle all update traffic, it would put them under a very heavy load to support all replication traffic to consumers as well. You can offload replication traffic to a hub server that can service replication updates to a large number of consumers.
To increase performance of your directory service: if you direct all client applications performing read operations to the consumers, and all those performing update operations to the supplier, you can remove all of the indexes (except system indexes) from your hub server. This will dramatically increase the speed of replication between the supplier and the hub server. In a cascading replication scenario, a hub supplier receives updates from a supplier server, and replays those updates on consumer servers. The hub supplier is a hybrid: it holds a read-only copy of the data, like a typical consumer server and it maintains a change log like a typical supplier server.
Hub suppliers pass on copies of the master data as they are received from the original master. For the same reason, when a hub supplier receives an add or modify request from a directory client, it refers the client to the master server.
This cascading replication scenario is illustrated in Figure 6-4:
Figure 6-4    Cascading Replication Scenario
A similar scenario is illustrated in Figure 6-5 from a different perspective. It shows how the servers are configured (replication agreements, change logs, referrals).
Figure 6-5    Server Configuration in Cascading Replication
You can combine any of the scenarios outlined in the previous sections to best fit your needs. For example, you could combine a multi-master configuration with a cascading configuration to produce something similar to the scenario illustrated in Figure 6-6.
Figure 6-6    Combined Multi-Master and Cascading Replication
Defining a Replication Strategy
The replication strategy that you define is determined by the service you want to provide:
If high availability is your primary concern, you should create a data center with multiple directory servers on a single site. You can use single-master replication to provide read-failover, and multi-master replication to provide write-failover. How to configure replication for high availability is described in "Using Replication for High Availability".
If local availability is your primary concern, you should use replication to geographically distribute data to directory servers in local offices around the world. You can decide to hold a master copy of all information in a single location, such as the company headquarters, or to let local sites manage the parts of the DIT that are relevant for them. The type of replication configuration to set up is described in "Using Replication for Local Availability".
In all cases, you probably want to balance the load of requests serviced by your directory servers, and avoid network congestion. Strategies for load balancing your directory servers and your network are provided in "Using Replication for Load Balancing". To determine your replication strategy, start by performing a survey of your network, your users, your applications, and how they use the directory service you can provide. For guidelines on performing this survey, refer to "Replication Survey."
Once you understand your replication strategy, you can start deploying your directory. This is a case where deploying your service in stages will pay large dividends. By placing your directory into production in stages, you can get a better sense of the loads that your enterprise places on your directory. Unless you can base your load analysis on an already operating directory, be prepared to alter your directory as you develop a better understanding of how your directory is used.
The following sections describe in more detail the factors affecting your replication strategy:
The type of information you need to gather from your survey to help you define your replication strategy includes:
Quality of the LANs and WANs connecting different buildings or remote sites, and the amount of available bandwidth.
For example, a site that manages human resource databases or financial information is likely to put a heavier load on your directory than a site containing engineering staff that uses the directory for simple telephone book purposes.
The number of applications that access the directory, and relative percentage of read/search/compare operations to write operations.
For example, if your messaging server uses the directory, you need to know how many operations it performs for each email message it handles. Other products that rely on the directory are typically products such as authentication applications, or meta-directory applications. For each one you must find out the type and frequency of operations that are performed in the directory.
The number and size of the entries stored in the directory.
Replication Resource Requirements
Using replication requires more resources. Consider the following resource requirements when defining your replication strategy:
On supplier servers, the change log is written after each update operation. Supplier servers receiving many updabhubcon2supplier server, if a supplier contains multiple replicated databases the change log will be used more frequently, and the disk usage will be even higher.
Each replication agreement consumes one server thread. So, the number of threads available to client applications is reduced, possibly affecting the server performance for the client applications.
Using Replication for High Availability
Use replication to prevent the loss of a single server from causing your directory to become unavailable. At a minimum you should replicate the local directory tree to at least one backup server.
Some directory architects argue that you should replicate three times per physical location for maximum data reliability. How much you use replication for fault tolerance is up to you, but you should base this decision on the quality of the hardware and networks used by your directory. Unreliable hardware needs more backup servers.
Note You should not use replication as a replacement for a regular data backup policy. For information on backing up your directory data, refer to the iPlanet Directory Server Administrator's Guide.
If you need to guarantee write-failover for all your directory clients, you should use a multi-master replication scenario. If read-failover is sufficient, you can use single-master replication.
LDAP client applications can usually be configured to search only one LDAP server. That is, unless you have written a custom client application to rotate through LDAP servers located at different DNS hostnames, you can only configure your LDAP client application to look at a single DNS hostname for a Directory Server. Therefore, you will probably need to use either DNS round robins or network sorts to provide fail-over to your backup Directory Servers. For information on setting up and using DNS round robins or network sorts, see your DNS documentation.
Alternatively, you can use the iPlanet Directory Access Router product. For more information on iPlanet Directory Access Router, go to http://www.iplanet.com.
Using Replication for Local Availability
Your need to replicate for local availability is determined by the quality of your network as well as the activities of your site. In addition, you should carefully consider the nature of the data contained in your directory and the consequences to your enterprise in the event that the data becomes temporarily unavailable. The more mission critical this data is, the less tolerant you can be of outages caused by poor network connections.
You should use replication for local availability for the following reasons:
You need a local master copy of the data.
This is an important strategy for large, multinational enterprises that need to maintain directory information of interest only to the employees in a specific country. Having a local master copy of the data is also important to any enterprise where interoffice politics dictate that data be controlled at a divisional or organizational level.
You are using unreliable or intermittently available network connections.
Intermittent network connections can occur if you are using unreliable WANs, such as often occurs in international networks.
Your networks periodically experience extremely heavy loads that may cause the performance of your directory to be severely reduced.
Using Replication for Load Balancing
Replication can balance the load on your Directory Servers in several ways:
By spreading your user's search activities across several servers. One of the more important reasons to replicate directory data is to balance the work load of your network. When possible, you should move data to servers that can be accessed using a reasonably fast and reliable network connection. The most important considerations are the speed and reliability of the network connection between your server and your directory users.
Directory entries generally average around one KB in size. Therefore, every directory lookup adds about one KB to your network load. If your directory users perform around ten directory lookups per day, then for every directory user you will see an increased network load of around 10,000 bytes per day. Given a slow, heavily loaded, or unreliable WAN, you may need to replicate your directory tree to a local server.
You must carefully consider whether the benefit of locally available data is worth the cost of the increased network load because of replication. For example, if you are replicating an entire directory tree to a remote site, you are potentially adding a large strain on your network in comparison to the traffic caused by your users' directory lookups. This is especially true if your directory tree changes frequently, yet you have only a few users at the remote site performing a few directory lookups per day.
For example, consider that your directory tree on average includes in excess of 1,000,000 entries and that it is not unusual for about ten percent of those entries to change every day. If your average directory entry is only one KB in size, this means you could be increasing your network load by 100 MB per day. However, if your remote site has only a few employees, say 100, and they are performing an average of ten directory lookups a day, then the network load caused by their directory access is only one MB per day.
Given the difference in loads caused by replication versus that caused by normal directory usage, you may decide that replication for network load-balancing purposes is not desirable. On the other hand, you may find that the benefits of locally available directory data far outweigh any considerations you may have regarding network loads.
A good compromise between making data available to local sites without overloading the network is to use scheduled replication. For more information on data consistency and replication schedules, refer to "Data Consistency".
Example of Network Load Balancing
Suppose your enterprise has offices in two cities. Each office has specific subtrees that they manage as follows:
Each office contains a high-speed network, but you are using a dial-up connection to network between the two cities. To balance your network load:
Select one server in each office to be the master server for the locally managed data.
Replicate locally managed data from that server to the corresponding master server in the remote office.
Replicate the directory tree on each master server (including data supplied from the remote office) to at least one local Directory Server to ensure availability of the directory data. You can use multi-master replication for the suffix managed locally, and cascading replication for the suffix that receives a master copy of the data from a remote server.
Example of Load Balancing for Improved Performance
Suppose that your directory must include 1,500,000 entries in support of 1,000,000 users, and that each user performs ten directory lookups a day. Also assume that you are using a messaging server that handles 25,000,000 mail messages a day, and that performs five directory lookups for every mail message that it handles. Therefore, you can expect 125,000,000 directory lookups per day just as a result of mail. Your total combined traffic is, therefore, 135,000,000 directory lookups per day.
Assuming an eight-hour business day, and that your 1,000,000 directory users are clustered in four time zones, your business day (or peak usage) across four time zones is 12 hours long. Therefore you must support 135,000,000 directory lookups in a 12-hour day. This equates to 3,125 lookups per second (135,000,000 / (60*60*12)). That is:
Now, assume that you are using a combination of CPU and RAM with your Directory Servers that allows you to support 500 reads per second. Simple division indicates that you need at least six or seven Directory Servers to support this load. However, for enterprises with 1,000,000 directory users, you should add more Directory Servers for local availability purposes.
You could, therefore, replicate as follows:
Place two Directory Servers in a multi-master configuration in one city to handle all write traffic.
Use these supplier servers to replicate to one or more hub suppliers.
The read, search and compare requests serviced by your directory should be targeted at the consumer servers, thereby freeing the master servers to handle write requests. For a definition of a hub supplier, refer to "Cascading Replication".
Use the hub supplier to replicate to local sites throughout the enterprise.
Replicating to local sites helps balance the work load of your servers and your WANs, as well as ensuring high availability of directory data. Assume that you want to replicate to four sites around the country. You then have four consumers of each hub supplier.
At each site, replicate at least once to ensure high availability, at least for read operations.
Example Replication Strategy for a Small Site
Suppose your entire enterprise is contained within a single building. This building has a very fast (100 MB per second) and lightly used network. The network is very stable and you are reasonably confident of the reliability of your server hardware and OS platforms. Also, you are sure that a single server's performance will easily handle your site's load.
In this case, you should replicate at least once to ensure availability in the event that your primary server is shut down for maintenance or hardware upgrades. Also, set up a DNS round robin to improve LDAP connection performance in the event that one of your Directory Servers becomes unavailable. Alternatively, use an LDAP proxy such as iPlanet Directory Access Router. For more information on iPlanet Directory Access Router, go to http://www.iplanet.com.
Example Replication Strategy for a Large Site
Suppose your entire enterprise is contained within two buildings. Each building has a very fast (100 MB per second) and lightly used network. The network is very stable and you are reasonably confident of the reliability of your server hardware and OS platforms. Also, you are sure that a single server's performance will easily handle the load placed on a server within each building.
Also assume that you have slow (ISDN) connections between the buildings, and that this connection is very busy during normal business hours.
Your replication strategy follows:
Choose a single server in one of the two buildings to contain a master copy of your directory data.
This server should be placed in the building that contains the largest number of people responsible for the master copy of the directory data. Call this Building A.
Replicate at least once within Building A for high availability of directory data.
Create two replicas in the second building (Building B).
Using Replication with other Directory Features
Replication interacts with other iPlanet Directory Server features to provide advanced replication features. The following sections describe feature interactions to help you better design your replication strategy.
Replication and Access Control
The directory stores ACIs as attributes of entries. This means that the ACI is replicated along with other directory content. This is important because Directory Server evaluates ACIs locally.
For more information about designing access control for your directory, refer to Chapter 7, "Designing a Secure Directory".
Replication and Directory Server Plug-ins
You can use replication with most of the plug-ins delivered with iPlanet Directory Server. There are some exceptions and limitations in the case of multi-master replication with the following plug-ins:
You cannot use multi-master replication with the attribute uniqueness plug-in at all, because this plug-in can validate only attribute values on the same server, and not on both servers in the multi-master set.
You can use the referential integrity plug-in with multi-master replication providing that this plug-in is enabled on just one master in the multi-master set. This ensures that referential integrity updates are made on just one of the master servers, and propagated to the other.
Replication and Database Links
When you distribute entries using chaining, the server containing the database link points to a remote server that contains the actual data. In this environment, you cannot replicate the database link itself. You can, however, replicate the database that contains the actual data on the remote server.
You must not use the replication process as a backup for database links. You must backup database links manually. For more information about chaining and entry distribution, refer to Chapter 5, "Designing the Directory Topology".
Figure 6-7    Replicating Chained Databases
When iPlanet Directory Server is used in a replicated environment, the schema must be consistent across all of the directory servers that participate in replication. If the schema is not consistent across servers, the replication process is likely to generate many errors.
The best way to guarantee schema consistency is to make schema modifications on a single master server, even in the case of a multi-master replication environment.
Schema replication happens automatically. If replication has been configured between a supplier and a consumer, schema replication will happen by default.
The logic used by iPlanet Directory Server for schema replication is the same in every replication scenario, and can be described as follows:
Before pushing data to consumer servers, the supplier server checks whether its own version of the schema is in sync with the version of the schema held on consumer servers.
If the schema entries on both supplier and consumers are the same, the replication operation proceeds.
If the version of the schema on the supplier server is more recent than the version stored on the consumer, the supplier server replicates its schema to the consumer before proceeding with the data replication.
Note If the version of the schema on the supplier server is older than the version stored on the consumer, you will probably witness a lot of errors during replication because the schema on the consumer cannot support the new data.
If you make schema modifications on two master servers in a multi-master set, the consumers will contain replicated data from the two masters, each with different schema. Whichever master was updated last will "win" and its schema will be propagated to the consumer. In this situation, the schema on the consumers is always different from one of the masters. To avoid this, always make sure you make schema modifications on one master only.
Note You must never update the schema on a consumer server because the supplier server is unable to resolve the conflicts that will occur and replication will fail.
Schema should be maintained on a master supplier server in a replicated topology. If using the standard 99user.ldif file, these changes will be replicated to all consumers. When using custom schema files, ensure that these files are copied to all servers after making changes on the master supplier. After copying files, the server must be restarted. Refer to "Creating Custom Schema Files" for more information.
Changes made to custom schema files are only replicated if the schema is updated using LDAP or the Directory Server Console. These custom schema files should be copied to each server in order to maintain the information in the same schema file on all servers. For more information, refer to "Creating Custom Schema Files".
For more information on schema design, refer to Chapter 3 "How to Design the Schema."
Previous Contents Index DocHome Next
Copyright © 2002 Sun Microsystems, Inc. Some preexisting portions Copyright © 2001 Netscape Communications Corp. All rights reserved.
Last Updated February 26, 2002