Complete Contents
Introduction
Chapter 1 Welcome to the Directory Server
Chapter 2 Directory Deployment Overview
Chapter 3 Planning Your Directory Data
Chapter 4 Planning Directory Schema
Chapter 5 Planning Security Policies
Chapter 6 Directory Tree Design
Chapter 7 Planning Replication
Chapter 8 Planning Referrals
Chapter 9 Directory Design Examples
Chapter 10 Extending Your Directory Service
Appendix A Quick Start
Previous Next Contents Index


Chapter 7 Planning Replication

Every directory service should use replication to ensure data availability in the event that a server becomes unavailable. Therefore, replication is an important strategy that you should plan on using with your directory service.

In this chapter you will learn about replication, its concepts and uses. This chapter also provides advice on how and when to use replication. This chapter includes the following sections:

Replication Overview—This section briefly introduces replication, including the benefits that replication brings to your directory service. This section also includes a brief look at the supplier-consumer architecture used by Netscape's replication strategy.

Replicating Directory Trees—This section describes the kinds of replication that you can configure for your directory service. Topics include whole tree replication, subtree replication, cascading replication, and multiple subtree replication. This section also describes replication initialization, and directory configuration requirements for creating replication.

Building a Highly Available Directory Service—This section provides concepts and examples of how to use replication to build high availability into your directory service. DNS strategies, using replication for load balancing, and using replication for local availability are also discussed. Examples include how to load balance for a small site, a large site, for local data management, and for server traffic. A specific section on how to load balance for Netscape Messaging Server 3.0 is also provided.


Replication Overview
Replication is the mechanism by which directory data is automatically copied from one Directory Server to another. Using replication, you can copy everything from entire directory trees to individual directory entries between servers. Replication provides several important benefits to your directory service:

Before examining the issues behind replication usage, it is useful to understand Netscape's replication architecture.

To begin, every directory object must be mastered by one and only one Directory Server. This mastering Directory Server is called the supplier server because it supplies the object to other servers.

Servers that receive directory objects from supplier servers are called consumer servers.

Any given Directory Server can be both a supplier of directory objects as well as a consumer of objects supplied to it from other servers.

Supplier Servers

A supplier server is responsible for:

Consumer Servers

Consumer servers contain at least one directory entry that has been copied to it by a supplier server. Consumer servers can contain:

Only read operations occur on the consumer server; all other operations are handled on the supplier server. Anytime an LDAP client tries to modify entries (add, delete, or change any part of the entry) in a replicated tree, the consumer server automatically refers the LDAP client's request to the supplying server. For more information on referrals, see Chapter  8, "Planning Referrals."


Replicating Directory Trees
In its most basic configuration, a supplier server replicates a directory tree to one or more consumer servers. In this configuration, all directory modifications occur on the supplier server, and the consumer server contains read-only data. The supplier server is responsible for managing (performing modifies to) all the directory data contained on the consumer server.

The supplier server can, of course, replicate its directory tree to more than one consumer server. The total number of consumer servers that a single supplier server can be manage is dependent upon the speed of your networks and the total number of entries changing on a daily basis. However, you can reasonably expect a supplier server to maintain several consumer servers.

Cascading Replicas

Cascading is a replication technique that involves a supplier server replicating directory data to a consumer server, and then that consumer server replicating the same data to yet another consumer or consumers.

This form of replication is useful, for example, if some network connections between various locations in your organization are better than others. For example, suppose you are mastering your directory data in Minneapolis, and you have consumer servers in Saint Cloud as well as Duluth. Suppose, too, that your network connection between Minneapolis and Saint Cloud is very good, but your network connection between Minneapolis and Duluth is of poor quality. Then, if your network between Saint Cloud and Duluth is of acceptable quality, you can use chaining to move directory data from Minneapolis to Saint Cloud to Duluth:

Subtree Replication

A supplier server does not have to replicate its entire directory tree to a consumer. Instead, the supplier server can replicate a portion of its tree to other servers. This form of replication is known as subtree replication.

One of the best uses of subtree replication is if you are enabling an extranet. In this case, you may want to master all of your directory data in a single server inside your firewall. However, your trading partners are likely to need access to some portion of that tree, which means it needs to be accessible outside your firewall.

Because it is best to avoid making internal, private information physically available on the Internet (even if your access-control mechanism secures that data from outside intrusion), you should create a special Directory Server that is available only outside your firewall. Then use subtree replication to copy directory data from your internal server to your external server. Only replicate that portion of your directory tree that you want your trading partners to be able to access.

Multiple Subtree Replication

Large enterprises may find the need to replicate directory subtrees from multiple suppliers to a single consumer. For example, suppose your enterprise has main branch offices in New York and Los Angeles, with smaller satellite offices in Chicago and Dallas. Management of your enterprise's directory tree is evenly split between New York and L.A. However, Chicago and Dallas need local copies of everything.

Then you should replicate subtrees between servers as follows:

Notice that in this arrangement New York and L.A. are both suppliers as well as consumers of directory data.

The one thing you cannot do with multiple subtree replication is have two different servers manage the same tree, and then replicate those subtrees to a single consumer. For example:

Initiating Replication

Replication synchronization can be initiated by either the supplier or the consumer server. You choose which form of synchronization is used for each replication agreement. A replication agreement indicates what directory entries will be replicated, the servers between which the replication is occurring, and when the replication can occur.

If you are using supplier-initiated replication, it is the responsibility of the supplier server to determine when its consumer servers are to be updated.

If you are using consumer-initiated replication, it is the responsibility of the consumer server to determine when it wants to retrieve updates from the supplier server.

From an administrative perspective, there is no difference in managing these two types of synchronization. Both types of synchronization allow you to update the consumer server(s) on a set time interval. However, supplier-initiated replication allows you to configure the supplier server to update its consumer(s) the instant that an object is updated on the supplier, whereas consumer-initiated replication can only occur on a set interval. Therefore:

Other than these conditions, there is no difference in using the two types of synchronization from either an administrative or performance perspective. In general, you should try these two types of synchronization in a lab environment and decide which one you are most comfortable with from a management perspective. Then, if possible, use that one type of synchronization across your enterprise so as to lower training costs for your IS personnel.

Directory Entries used to Support Replication

You must create special directory entries to support replication. The actual type of entry that you need is dependent upon the type of replication that you are using.

Supplier-Initiated Replication

For supplier-initiated replication, you configure each consumer server with a supplier DN. This special distinguished name is used by a supplier to bind to the consumer server and update the consumer's directory. If a directory client attempts to modify a replicated entry on the consumer server, and that client does not bind as the supplier DN, then the consumer server refers that request to the server that originally supplied the entry.

The supplier DN is not an actual entry in the directory tree. Instead, it is an "ethereal" or "virtual" entry. This entry is identified to the consumer server by a configuration parameter in the server's configuration file (see the Netscape Directory Server Administrator's Guide for more information).

Every consumer server must be configured with one and only one supplier DN. Since the supplier DN is a virtual entry, you do not have to set any access-control statements to allow a user binding as this entry to update the consumer server.

Consumer-Initiated Replication

For consumer-initiated replication, a directory entry must be created on the supplier that the consumer can use to bind to the supplier server. This entry must have read and search privileges for the tree that the consumer is retrieving from the supplier, as well as read and search privileges for the supplier server's change log. The change log is a special directory tree on the supplier server that the supplier uses to track changes to its directory tree.

For consumer-initiated replication, it is administratively easiest to create a single directory entry that every consumer server will use to perform the synchronization. However, you may want to create a unique entry for every consumer to use for this purpose. Doing so will allow you to see which exact server is accessing you supplier server when you view the access log. This can help you to resolve replication problems should any occur.

For more information on configuring replication, see the Netscape Directory Server Administrator's Guide.


Building a Highly Available Directory Service
Replication is a vital part of your directory strategy. This is because you are using your directory service to centralize data, most likely data that is mission critical, and thus you will have to use replication to ensure uninterrupted availability of your directory service.

You use replication to ensure high availability of your directory service in three basic ways:

Each of the following sections describe these strategies in detail.

Using Replication for High Availability

Use replication to avoid a situation in which the loss of a single server causes your directory service to become unavailable. At a minimum you should replicate the local directory tree to at least one backup server.

LDAP clients usually can be configured to search only one LDAP server. That is, unless you have written a custom client to rotate through LDAP servers located at different DNS hostnames, you can only configure your LDAP client to look at a single DNS hostname for a Directory Server. Therefore, you will likely need to use either DNS round robins or network sorts to provide fail-over to your backup Directory Servers.

DNS round robins are a feature available in most DNS servers that allows you to configure multiple IP addresses for a single DNS hostname. The DNS server then rotates the order of the IP addresses that it returns as the result of a DNS query. Therefore, if you configure DNS lookups of hostname directory.airius.com to return the IP addresses 123.456.789.1, 123.456.789.2, and 123.456.789.3, the DNS server will return IP addresses in the following orders:

1st Lookup Results
2nd Lookup Results
3rd Lookup Results
4th Lookup Results
123.456.789.1
123.456.789.2
123.456.789.3
123.456.789.1
123.456.789.2
123.456.789.3
123.456.789.2
123.456.789.2
123.456.789.3
123.456.789.1
123.456.789.1
123.456.789.3

Since DNS clients try the IP addresses in the order that they are returned, your LDAP clients will only have to time out once every third request if the machine at one of the three IP addresses becomes available. If the server will be unavailable for a long time, its IP address can be removed from the DNS round robin so as to avoid repeated time out situations.

The problem with DNS round robins is that they make no attempt to prioritize DNS results based on network location. It is almost always more desirable to try hosts on the local network before trying hosts that are farther away in your network topology.

To solve this problem, some more recent versions of DNS allow network sorts. A network sort allows you to identify a set of IP addresses that are a best choice and a set of IP addresses that are a second best choice. The DNS server will always return the best choice IP addresses at the top of the list with the second best choice at the bottom. The round robin then rotates the top and bottom choices within their groupings. Thus, with network sort, DNS queries would return IP addresses in the following order:


1st Lookup
2nd Lookup
3rd Lookup
4th Lookup
Best choice
123.456.789.1
123.456.789.2
123.456.789.1
123.456.789.2

123.456.789.2
123.456.789.1
123.456.789.2
123.456.789.1

123.456.789.3
123.456.789.4
123.456.789.5
123.456.789.3
Next choice
123.456.789.4
123.456.789.5
123.456.789.3
123.456.789.4

123.456.789.5
123.456.789.3
123.456.789.4
123.456.789.5

Round robins and network sorts are a viable strategy only for read-only operations. You cannot mirror supplier servers on multiple machines and use DNS round robins to provide automatic fail-over. If your supplier server becomes unavailable, you must manually convert a consumer server to the supplier server and then reinitialize the remainder of your consumer servers.

For information on setting up and using DNS round robins or network sorts, see your DNS documentation.

Using Replication for Load Balancing

On average, a Directory Server running on a reasonably fast machine can be expected to handle around 800 search requests per second and thousands of simultaneous directory connections. However, there are a great many factors that can impact server performance, including:

You can use replication to balance the load on your directory service by:

Load Balancing the Server

As the load on your directory service grows, you will have to increase computing power to service the increased traffic. Your need to add hardware to handle increased directory access depends on quite a few factors, including CPU speed and the amount of RAM available to the server. In some cases you may be able to simply add additional RAM to handle an increased load. However, at some point you will have to add a new server entirely.

In terms of mapping the number of supported search requests per second to real-world loads, consider the following:

Load Balancing the Network

One of the more important reasons to replicate directory data is to load balance your network. When possible, you should move data to servers that can be accessed using a reasonably fast and reliable network connection. The most important considerations are the speed and reliability of the network connection between your server and your directory users.

In terms of network load, directory entries generally average around 1 KB in size. Therefore, every directory lookup adds about 1 KB to your network load. If your directory users perform around 10 directory lookups per day, then for every directory user you will see an increased network load of around 10,000 bytes per day. Given a slow, heavily loaded, or unreliable WAN, you may need to replicate your directory tree to a local server.

You must carefully consider whether the benefit of locally available data is worth the cost of the increased network load due to replication. For example, if you are replicating an entire directory tree to a remote site, you are potentially adding a large strain on your network in comparison to the traffic caused by your users' directory lookups. This is especially true if your directory tree is changing frequently, yet you only have a few users at the remote site performing a few directory lookups per day.

For example, consider that your directory tree can easily include in excess of 1,000,000 entries and that it is not unusual for approximately 10% of those entries to change every day. If your average directory entry is only 1 KB in size, this means you could be increasing your network load by 100 MB/day. However, if your remote site only has a few employees, say 100, and they are performing an average of 10 directory lookups a day, then the network load caused by their directory access is only 1 MB per day.

Given the difference in loads caused by replication versus that caused by normal directory usage, you may decide that replication for network load-balancing purposes is not desirable. On the other hand, you may find that the benefits of locally available directory data far outweigh any considerations you may have regarding network loads.

Using Replication for Local Availability

There are several reasons why you may want to replicate for local availability. They include:

Your need to replicate for local availability is determined by the quality of your network as well as the activities of your site. In addition, you should carefully consider the nature of the data contained in your directory and the consequences to your enterprise in the event that the data becomes temporarily unavailable. The more mission critical this data is, the less tolerant you can be of outages caused by poor network connections.

Determining Your Replication Strategy

Although there is no way to predict the actual amount of activity that your directory service will experience, some of the factors that will affect directory activity are:

To determine your replication strategy, start by performing a survey of your enterprise's networks. Also examine the physical locations of your users and look at the quality of the WANs connecting those sites.

As a part of this survey, you should find out how many users are at each site and estimate how often directory lookups will be occurring from those locations. For example, a site that manages HR databases or financial information is likely to put a heavier load on your directory service than a site containing engineering staff that uses the directory service for simple telephone book purposes.

Once you have a good understanding of your enterprise's infrastructure, consider the performance numbers discussed earlier in this chapter. As you develop an understanding of your replication needs, make sure you include the following strategies in your replication plans:

Once you have a basic understanding of your replication strategy, you can start deploying your directory service. This is a case where deploying your service out in controlled stages will pay large dividends (see "Deployment Advice"). By placing your directory service into production in stages, you can get a better sense of the loads that your enterprise places on your directory service. Unless you have an existing directory service to base your load analysis on, be prepared to alter your directory service as you develop a better understanding on how your directory is used.

Remember throughout this design phase that the Netscape Directory Server's replication feature is extremely flexible. You will find it very easy to scale or alter your directory topology as the loads on your Directory Server change. Therefore, if you find that you have misjudged the load placed on your service, you can easily add, remove, or move servers to better handle your enterprise's actual requirements.

Example: Load Balancing a Small Site

Suppose your entire enterprise is contained within a single building. This building has a very fast (100 MB/second) and lightly used network. The network is very stable and you are reasonably confident of the reliability of your server hardware and OS platforms. Also, you are sure that a single server's performance will easily handle your site's load.

In this case, you should replicate at least once to ensure availability in the event your primary server is shut down for maintenance or hardware upgrades. Also, set up a DNS round robin to improve LDAP connection performance in the event that one of your Directory Servers becomes unavailable.

Example: Load Balancing a Large Site

Suppose your entire enterprise is contained within two buildings. Each building has a very fast (100 MB/second) and lightly used network. The network is very stable and you are reasonably confident of the reliability of your server hardware and OS platforms. Also, you are sure that a single server's performance will easily handle the load placed on a server within each building.

Also assume that you have slow (ISDN) connections between the buildings, and that this connection is very busy during normal business hours.

In this case, you should do the following:

Example: Load Balancing for Local Management

Suppose you have an enterprise with offices in two cities. Each office has specific subtrees that they are responsible for managing as follows:

Each office contains a high-speed network, but you are using a dial-up connection to network between the two cities. Do the following:

Example: Load Balancing for Server Traffic

Suppose that your directory service must include 1,500,000 entries in support of 1,000,000 users, each of which is performing 10 directory lookups a day. Also assume that you are using Netscape Messaging Server 3.0 which is handling 25,000,000 mail messages a day. Since the 3.0 messaging server performs 5 directory lookups for every mail message that it handles, you can expect to see 125,000,000 directory lookups per day just as a result of mail. Your total combined traffic is therefore 135,000,000 directory lookups per day.

Assuming an 8-hour business day, and that your 1,000,000 directory users are clustered in 4 time zones, your business day (or peak usage) across 4 time zones is 12 hours long. Therefore you must support 135,000,000 directory lookups in a 12-hour day. This equates to 3,125 lookups per second (135,000,000 / (60*60*12)).

That is:

1,000,000 users
@ 10 lookups per user  =
10,000,000 reads/day
25,000,000 messages
@ 5 lookups per message  =
125,000,000 reads/day

Total reads/day =
135,000,000

12-hour day includes 43,200 seconds
Total reads/second =
3,125

Now, assume that you are using a combination of CPU and RAM with your Directory Servers that allows you to support 500 reads per second. Simple division indicates that you need at least 6-7 Directory Servers to support this load. However, for enterprises with 1,000,000 directory users, you should add additional Directory Servers for local availability purposes.

You could, therefore, replicate as follows:

Example: Replicating for Messaging 3.0

Once you start looking closely at server performance numbers, you will find that the Directory Server is usually not the part of your intranet that causes performance problems. Generally, server for server, you will need more instances of other types of Netscape servers to handle your intranet load than you will need Directory Servers.

For example, suppose your Netscape Directory Server is running on a combination of CPU and memory that allows it to handle 600 to 700 search requests per second. Conversely, a 3.0 Netscape Messaging Server can handle around 25 messaging requests per second. Therefore, for any appreciable intranet population, you will need many more Messaging Servers than Directory Servers.

The question therefore is: how should you best deploy Directory Servers to support a large population of messaging servers? While you will not need to replicate for directory load balancing, you will want to replicate for high availability.

Depending on the quality of your hardware and networks and on the size of your user population, there are two approaches that you may want to take. The first is to replicate only that set of directory data to a Directory Server instance running on the same physical host as the messaging server instance is running. This has the advantage of providing the highest availability of directory data possible to the messaging server. However, it also has the disadvantage of being slightly harder to manage because every time you want to move a messaging account from one messaging server to another, you have to revisit your directory replication scheme. Therefore, for large user populations that may require constant movement of users from mail hosts to mail hosts, you should avoid this strategy.

A second strategy is to replicate to a central Directory Server residing on the same network as your messaging servers. The Netscape Directory Server can support roughly four or five 3.0 Messaging Servers at a time (each messaging server can process around 25 mails per second, and each mail requires 5 directory lookups). Consequently, place 1 Directory Server instance for every 4 or 5 messaging servers that you are using. This Directory Server should be on the same physical network as your messaging servers, and as always you should replicate this local server at least once for availability purposes.

Depending on the number of messaging servers that you are supporting with your Directory Server, you may want to dedicate the Directory Server to handling only messaging lookups. This is especially true if you are replicating to a Directory Server instance running on the same physical host as the messaging server. When you dedicate a Directory Server to support only messaging lookups, do the following:

 

© Copyright 1999 Netscape Communications Corporation, a subsidiary of America Online, Inc. All Rights Reserved.