Sun Java System Directory Server Enterprise Edition 6.1 Deployment Planning Guide

Using Distribution for Write Scalability

Write operations are resource intensive. When a client requests a write operation, the follow sequence of events occurs on the database:

The backend database is locked
The entry is locked in the database cache
The access control check plug-in is called
Any backend pre-operation plug-ins are called
The database transaction begins
The database files are updated
The old entry cache is replaced with new data
The database transaction is committed
Any backend post-operation plug-ins are called
The backend database is unlocked

Because of this complex procedure, an increased number of writes can have a dramatic impact on performance.

As an enterprise grows, more client applications require rapid write access to the directory. Also, as more information is stored in a single Directory Server, the cost of adding or modifying entries in the directory database increases. This is because indexes become larger and it takes longer to manipulate the information that the indexes contain.

In some cases, the service level agreements might only be achieved by having all the data cached in memory. However, the data might be too large to fit on a single memory machine

When the volume of directory data increases to this extent, you need to break up the data so that it can be stored in multiple servers. One approach is to use a hierarchy to divide the information. By separating the information into multiple branches based on some criteria, each branch can be stored on a separate server. Each server can then be configured with chaining or referrals to enable clients to access all the information from a single point.

In this kind of division, each server is responsible for only a part of the directory tree. A distributed directory works in a similar way to the Domain Name Service (DNS). The DNS assigns each portion of the DNS namespace to a particular DNS server. In the same way, you can distribute your directory namespace across servers while maintaining, from a client standpoint, a single directory tree.

A hierarchy-based distribution mechanism has certain disadvantages. The main problem is that this mechanism requires that the clients know exactly where the information is. Alternatively, the clients must perform a broad search to find the data. Another problem is that some directory-enabled applications might not have the capability to deal with the information if it is broken up into multiple branches.

Directory Server supports hierarchy-based distribution in conjunction with the chaining and referral mechanisms. However, a distribution feature is also provided with Directory Proxy Server, which supports smart routing. This feature enables you to decide on the best distribution mechanism for your enterprise.

Using Multiple Databases

Directory Server stores data in high-performance, disk-based LDBM databases. Each database consists of a set of files that contains all of the data that is assigned to this set. You can store different portions of your directory tree in different databases. Imagine, for example, that your directory tree contains three subsuffixes, as shown in the following figure.

Figure 10–6 Directory Tree With Three Subsuffixes

Figure shows a directory tree with one suffix and three
subsuffixes.

The data of the three subsuffixes can be stored in three separate databases as shown in the following figure.

Figure 10–7 Three Subsuffixes Stored in Three Separate Databases

Figure shows three subsuffixes stored in three separate
databases.

When you divide your directory tree among databases, the databases can be distributed across multiple servers. This strategy generally equates to several physical machines, which improves performance. The three databases in the preceding illustration can be stored on two servers as shown in the following figure.

Figure 10–8 Three Databases Stored on Two Separate Servers

Figure shows two databases stored on one server (A) and
one database stored on a different server (B).

When databases are distributed across multiple servers, the amount of work that each server needs to do is reduced. Thus, the directory can be made to scale to a much larger number of entries than would be possible with a single server. Because Directory Server supports dynamic addition of databases, you can add new databases as required, without making the entire directory unavailable.

Using Directory Proxy Server for Distribution

Directory Proxy Server divides directory information into multiple servers but does not require that the hierarchy of the data be altered. An important aspect of data distribution is the ability break up the data set in a logical manner. However, distribution logic that works well for one client application might not work as well for another client application.

For this reason, Directory Proxy Server enables you to specify how data is distributed and how directory requests should be routed. For example, LDAP operations can be routed to different directory servers based on the directory information tree (DIT) hierarchy. The operations can also be routed based on operation type or on a custom distribution algorithm.

Directory Proxy Server effectively hides the distribution details from the client application. From the clients' standpoint, a single directory addresses their directory queries. Client requests are distributed according to a particular distribution method. Different routing strategies can be associated with different portions of the DIT, as explained in the following sections.

Routing Based on the DIT

This strategy can be used to distribute directory entries based on the DIT structure. For example, entries in the subtree o=sales,dc=example,dc=com can be routed to Directory Server A, and entries in the subtree o=hr,dc=example,dc=com can be routed to Directory Server B.

Routing Based on a Custom Algorithm

In some cases, you might want to distribute entries across directory servers without using the DIT structure. Consider, for example, a service provider who stores entries that represent subscribers under ou=subscribers,dc=example,dc=com. As the number of subscribers grows, there might be a need to distribute them across servers based on the range of the subscriber ID. With a custom routing algorithm, subscriber entries with an ID in the range 1-10000 can be located in Directory Server A, and subscriber entries with an ID in the range 10001-infinity can be located in Directory Server B. If the data on server B grows too large, the distribution algorithm can be changed so that entries with an ID starting from 2000 can be located on a new server, Server C.

You can implement your own routing algorithm using the Directory Proxy Server DistributionAlgorithm interface.

Using Directory Proxy Server to Distribute Requests Based on Bind DN

In this scenario, an enterprise distributes customer data between three master servers based on geographical location. Customers that are based in the United Kingdom have their data stored on a master server in London. French customers have their data stored on a master server in Paris. The data for Japanese customers is stored on a master server in Tokyo. Customers can update their own data through a single web-based interface.

Users can update their own information in the directory using a web-based application. During the authentication phase, users enter an email address. email addresses for customers in the UK take the form *@uk.example.com. For French customers, the email addresses take the form *@fr.example.com, and for Japanese customers, *@ja.example.com. Directory Proxy Server receives these requests through an LDAP-enabled client application. Directory Proxy Server then routes the requests to the appropriate master server based on the email address entered during authentication.

This scenario is illustrated in the following figure.

Figure 10–9 Using Directory Proxy Server to Route Requests Based on Bind DN

Figure shows Directory Proxy Server distributing write requests
based on email address.