Performance requirements should be based on typical models of directory usage. In all directory deployments, Directory Server supports one or more client applications, and the requirements of these applications must be assessed. Estimating how much information your directory contains, and how often that information is accessed, involves identifying these applications and determining how they use Directory Server.
The applications that access your directory and the data needs of these applications have a significant impact on performance requirements. When identifying client applications, consider the following:
What types of client applications are accessing Directory Server?
How many users access each of these applications?
What kind of operations do these applications perform?
What are the usage patterns for these operations?
Common applications that might use your directory include the following:
Browser applications, such as white pages. Applications of this type generally access information such as email addresses, telephone numbers, and employee names.
Messaging applications, especially email servers. All email servers require email addresses, user names, and routing information. Others require more advanced information such as the place on disk where a user’s mailbox is stored, vacation notification information, and protocol information.
Directory-enabled human resources applications. These applications require more personal information such as government identification numbers, home addresses, home telephone numbers, and salary details.
Security, web portal, or personalization applications. Applications of this type access profile information.
When you have identified the information used by each application, you might see that some types of data are used by more than one application. Performing this kind of exercise during the planning stage can help you to avoid data redundancy.
The number and size of entries that are stored in the directory depend largely on your data requirements, as described in Chapter 4, Defining Data Characteristics.
Consider the following when calculating the number and size of entries:
Does the deployment require repeated bulk import initialization?
If so, how often are imports performed?
How many entries are imported at a time?
What types of entries are imported?
Must initialization be performed online with the server running?
In estimating read traffic, consider the following:
How many searches per second are expected?
What types of searches are expected?
For example, unique ID searches, wildcard searches, exact match searches.
What is the estimated peak search rate?
What is the estimated average search rate?
How many unindexed searches are expected?
An unindexed search means that the database is searched directly, instead of the index file. Unindexed searches occur either when the All IDs Threshold is reached within the index file used for the search, when no index file exists or when the index file is not configured in the way required by the search.
Unindexed searches are generally more time consuming than indexed searches.
Are searches concentrated in a particular data center or geographic region?
If one data receives proportionally more search traffic than other data centers, it might be worth placing additional, replicated servers in this data center to balance the load.
Are searches concentrated at a particular time of day?
How many searches are anticipated from within the firewall?
How many searches are anticipated from outside the firewall?
If read performance is crucial to your enterprise, see Chapter 10, Designing a Scaled Deployment for suggestions on deploying a directory service that is scaled for reads.
In estimating write traffic, consider the following:
How many updates per second are expected?
What types of updates are expected?
What is the estimated peak update rate?
What is the estimated average update rate?
Are updates concentrated in a particular data center or geographic region?
If one data receives proportionally more update traffic than other data centers, it might be worth placing additional writable servers in this data center to distribute the update load.
Are updates concentrated at a particular time of day?
If write performance is crucial to your enterprise, see Chapter 10, Designing a Scaled Deployment for suggestions on deploying a directory service that is scaled for writes.
For each client application, determine the maximum response time that is acceptable. The acceptable response time might differ for various geographical locations, and for different kinds of operations.
Estimate the level of synchronicity that is required between master replicas and consumer replicas. The Directory Server replication model is loosely consistent, that is, updates are accepted on a master without requiring communication with the other replicas in a topology. At any given time, the contents of each replica might be different. Over time, the replicas converge until each replica has an identical copy of the data. As part of performance planning, determine the maximum acceptable time that replicas have to converge.
Directory Server 6.1 includes a new prioritized replication feature. This feature enables you to specify that changes to certain attributes must be replicated as soon as possible. Prioritized replication might affect your decisions about acceptable replication latency. For more information, see Prioritized Replication in Sun Java System Directory Server Enterprise Edition 6.1 Reference.