Sun Directory Server Enterprise Edition 7.0 Deployment Planning Guide

Part III Logical Design

A logical architecture identifies the components of a Directory Server Enterprise Edition deployment, and shows interrelationships between the components. Typically, use cases developed during the technical requirements phase indicate which components the deployment requires. However, the required components can often be derived directly from the business requirements.

This part provides sample logical architectures that are based on typical Directory Server Enterprise Edition deployment scenarios. The information in this part flows from a basic, single-server deployment to more complex deployments that span multiple data centers. The architectures discussed in the later chapters of this part build on the simpler architectures discussed in the earlier chapters.

This part includes the following chapters:

Chapter 9, Designing a Basic Deployment describes a basic Directory Server Enterprise Edition deployment.
Chapter 10, Designing a Scaled Deployment describes a deployment scaled to meet additional service requirements.
Chapter 11, Designing a Global Deployment covers deployment considerations for deployments across multiple data centers.
Chapter 12, Designing a Highly Available Deployment describes deployments designed to meet availability requirements.

Chapter 9 Designing a Basic Deployment

In the simplest Directory Server Enterprise Edition deployment, your directory service requirements can be fulfilled by a single Directory Server, installed on one machine, in a single data center. Such a scenario might occur in a small organization or if, you are running Directory Server for demonstration or evaluation purposes. Note that the technical requirements discussed in the previous chapters apply equally to all deployments.

This chapter describes a basic deployment, involving a single Directory Server. The chapter covers the following topics:

Basic Deployment Architecture

A basic Directory Server Enterprise Edition deployment includes the following elements:

Directory Server instance files
Directory Server daemon
dsadm and dsconf command-line utilities
Directory Service Control Center (DSCC), if GUI access is required
Console Agent, if DSCC is used

These elements can all be installed on a single machine. The following figure illustrates the high-level architecture of a basic Directory Server Enterprise Edition deployment.

Figure 9–1 Basic Directory Server Enterprise Edition Architecture on a Single Machine

Figure shows a basic deployment with all elements installed
on a single server.

In this scenario, internal LDAP and DSML clients can be configured to access Directory Server directly. External HTML clients can be configured to access DSCC over a firewall.

Although all of the components described previously can be installed on a single machine, this is unlikely in a real deployment. A more typical scenario would be the installation of DSCC and the dsconf command-line utility on separate remote machines. All Directory Server hosts could then be configured remotely from these machines. The following figure illustrates this more typical scenario.

Figure 9–2 Basic Directory Server Enterprise Edition Architecture With Remote Directory Service Control Center

Figure shows a basic deployment with the Directory Service Control Center and
dsconf installed on a remote server.

The Directory Server instance stores server and application configuration settings, as well as user information. Typically, server and application configuration information is stored in one suffix of Directory Server while user and group entries are stored in another suffix. A suffix refers to the name of the entry in the directory tree, below which data is stored.

Directory Service Control Center (DSCC) is a centralized, web-based user interface for all servers. DSCC locates all servers and applications that are registered with it. DSCC displays the servers in a graphical user interface, where you can manage and configure the servers. The Directory Service Control Center might not be required in a small deployment because all functionality is also provided through a command-line interface.

In the chapters that follow, it is assumed that the Directory Service Control Center is installed on a separate machine. This aspect of the topology is not referred to again in the remaining chapters.

Basic Deployment Setup

Complete installation information is provided in the Sun Directory Server Enterprise Edition 7.0 Installation Guide. The purpose of this section is to provide a clear picture of the elements that make up a basic deployment and how these elements work together.

This section lists the main tasks for setting up the basic deployment described in the previous section.

Install the required shared components, including the security packages.
Install Directory Server, the Console Agent, and the command-line interface.
If you want to manage the server by using the command-line utilities, do the following:
- Create and start a standalone Directory Server instance by using the dsadm command.
- Create and configure a suffix in the new instance, by using the dsconf command.
If you want to manage the server through a graphical user interface, do the following:
- Initialize the Directory Service Control Center.
- Create a Directory Server instance by using the Directory Service Control Center.
- Create and configure a suffix in the new instance by using the Directory Service Control Center.

Improving Performance in a Basic Deployment

In even the most basic deployment, you might want to tune Directory Server to improve performance in specific areas. The following sections describe basic tuning strategies that can be applied to a simple single-server deployment. These strategies can be applied to each server in larger, more complex deployments, for improved performance across the topology.

Using Indexing to Speed Up Searches

Indexes speed up searches by effectively reducing the number of entries a search has to check to find a match. An index contains a list of values. Each value is associated with a list of entry identifiers. Directory Server can look up entries quickly by using the lists of entry identifiers in indexes. Without an index to manage a list of entries, Directory Server must check every entry in a suffix to find matches for a search.

Directory Server processes each search request as follows:

Directory Server receives a search request from a client.
Directory Server examines the request to confirm that the search can be processed.

If Directory Server cannot perform the search, it returns an error to the client and might refer the search to another instance of Directory Server.
Directory Server determines whether it manages one or more indexes that are appropriate to the search.
- If Directory Server manages indexes that are appropriate to the search, the server looks in all of the appropriate indexes for candidate entries. A candidate entry is an entry that might be a match for the search request.
- If Directory Server does not manage an index appropriate to the search, the server generates the set of candidate entries by checking all of the entries in the database.
  
  When Directory Server cannot use indexes, this process consumes more time and system resources.
Directory Server examines each candidate entry to determine whether the entry matches the search criteria.
Directory Server returns matching entries to the client application as it finds the entries.

You can optimize search performance by doing the following:

Preventing Directory Server from performing searches on non-indexed entries
Ensuring that cache sizes are appropriately tuned
Limiting the length of an index

For a comprehensive overview of how indexes work, see Chapter 9, Directory Server Indexing, in Sun Directory Server Enterprise Edition 7.0 Reference. For information about defining indexes, see Chapter 12, Directory Server Indexing, in Sun Directory Server Enterprise Edition 7.0 Administration Guide.

Optimizing Cache for Search Performance

For improved search performance, cache as much directory data as possible in memory. By preventing the directory from reading information from disk, you limit the disk I/O bottleneck. Different possibilities exist for doing this, depending on the size of your directory tree, the amount of memory available, and the hardware used. Depending on the deployment, you might choose to allocate more or less memory to entry and database caches to optimize search performance. You might alternatively choose to distribute searches across Directory Server consumers on different servers.

For more information, see Tuning Cache Settings.

Consider the following scenarios:

All Entries and Indexes Fit Into Memory

In the optimum case, the database cache and the entry cache fit into the physical memory available. The entry caches are large enough to hold all entries in the directory. The database cache is large enough to hold all indexes and entries. In this case, searches find everything in cache. Directory Server never has to go to file system cache or to disk to retrieve entries.

Ensure that database cache can contain all database indexes, even after updates and growth. When space runs out in the database cache for indexes, Directory Server must read indexes from disk for every search request, severely impacting throughput. You can monitor paging and cache activity with DSCC or through the command line.

Appropriate cache sizes must be determined through empirical testing with representative data. In general, the database cache size can be calculated as (total size of database files) x 1.2. Start by allocating a large amount of memory for the caches. Then exercise and monitor Directory Server to observe the result, repeating the process as necessary. Entry caches in particular might use much more memory than you allocate to these caches.

Entry cache should be dimensioned in such a way so that the number of entries accessed by the load on the server in a second are readily available. Try to avoid the situations where contents of the entry cache are replaced many times per second.

Sufficient Memory For 32-Bit Directory Server

Imagine a system with sufficient memory to hold all data in entry and database caches, but no support for a 64-bit Directory Server process. If hardware constraints prevent you from deploying Directory Server on a Solaris system with 64-bit support, size caches appropriately with respect to memory limitations for 32-bit processes. Then leave the remaining memory to the file system cache.

As a starting point when benchmarking performance, size the entry cache to hold as many entries as possible. Size the database cache relatively small such as 100 Mbytes without completely minimizing it, but letting file system cache hold the database pages.

Note –

File system cache is shared with other processes on the system, especially file-based operations. Thus, controlling file system cache is more difficult than controlling other caches, particularly on systems that are not dedicated to Directory Server.

The system might reallocate file system cache to other processes.

Avoid online import in this situation because import cache is associated with the Directory Server process.

Insufficient Memory

Imagine a system with insufficient memory to hold all data in entry and database caches. In this case, avoid causing combined entry and database cache sizes to exceed the available physical memory. This might result in heavy virtual memory paging that could bring the system to a virtual halt.

For small systems, start benchmarking by devoting available memory to entry cache and database caches, with sizes no less than 100 Mbytes each. Try disabling the file system cache by mounting Solaris UFS file systems with the -o forcedirectio option of the mount_ufs command. For more information, see the mount_ufs(1M) man page. Disabling file system cache can prevent the file system cache from using memory needed by Directory Server.

For large Directory Servers running on large machines, maximize the file system cache and reduce the database cache. Verify and correct assumptions through empirical testing.

Optimizing Cache for Write Performance

In addition to planning a deployment for write scalability from the outset, provide enough memory for the database cache to handle updates in memory. Also, minimize disk activity. You can monitor the effectiveness of the database cache by reading the hit ratio in the Directory Service Control Center.

After Directory Server has run for some time, the caches should contain enough entries and indexes that disk reads are no longer necessary. Updates should affect the database cache in memory, with data from the large database cache in memory being flushed only infrequently.

Flushing data to disk during a checkpoint can be a bottleneck. The larger the database cache size, the larger the bottleneck. Storing the database on a separate RAID system, such as a Sun StorEdge^TM disk array, can help improve update performance. You can use utilities such as iostat on Solaris systems to isolate potential I/O bottlenecks. For more information, see the iostat(1M) man page.

The following table shows database and log placement recommendations for systems with 2, 3, and 4 disks.

Table 9–1 Isolating Databases and Logs on Different Disks


Disks Available	Recommendations
2	Place the Directory Server database on one disk. Place the transaction log, the access, audit, and error logs and the retro changelog on the other disk.
3	Place the Directory Server database on one disk. Place the transaction log on the second disk. Place the access, audit, and error logs and the retro changelog on the third disk.
4	Place the Directory Server database on one disk. Place the transaction log on the second disk. Place the access, audit, and error logs on the third disk. Place the retro changelog on the fourth disk.

Chapter 10 Designing a Scaled Deployment

The basic deployment described in Chapter 9, Designing a Basic Deployment assumes that a single Directory Server is enough to satisfy the read and write requirements of your organization. Organizations that have large read or write requirements, that is, several clients attempting to access directory data simultaneously, need to use a scaled deployment.

Generally, the number of searches a Directory Server instance can perform per second is directly related to the number and speed of the server's CPUs, provided there is sufficient memory to cache all data. Horizontal read scalability can be achieved by spreading the load across more than one server. This usually means providing additional copies of the data so that clients can read the data from more than one source.

Write operations do not scale horizontally because a write operation to a master server results in a write operation to every replica. The only way to scale write operations horizontally is to split the directory data among multiple databases and place those databases on different servers.

This chapter describes the different ways of scaling a Directory Server Enterprise Edition deployment to handle more reads and writes. The chapter covers the following topics:

Using Load Balancing for Read Scalability

Load balancing increases performance by spreading the read load across multiple servers. Load balancing can be achieved using replication, Directory Proxy Server, or a combination of the two.

Using Replication for Load Balancing

Replication is the mechanism that automatically copies directory data and changes from one directory server to another directory server. With replication, you can copy a directory tree or subtree that is stored in its own suffix between servers.

Note –

You cannot copy the configuration or monitoring information subtrees.

By replicating directory data across servers, you can reduce the access load on a single machine, improving server response time and providing read scalability. Replicating directory entries to a location close to your users also improves directory response time. Replication is generally not a solution for write scalability.

Basic Replication Concepts

The replication mechanism is described in detail in Chapter 7, Directory Server Replication, in Sun Directory Server Enterprise Edition 7.0 Reference. The following section provides basic information that you need to understand before reviewing the sample topologies described later in this chapter.

Master, Consumer, and Hub Replicas

A database that participates in replication is defined as a replica.

Directory Server distinguishes between three kinds of replicas:

Master or read-write replica. A read-write database that contains a master copy of the directory data. A master replica can process update requests from directory clients. A topology that contains more than one master is called a multi-master topology.
Consumer replica. A read-only database that contains a copy of the information in the master replica. A consumer replica can process search requests from directory clients but refers update requests to master replicas.
Hub replica. A read-only database (like a consumer replica) that is stored on a Directory Server that supplies one or more consumer replicas.

The following figure illustrates the role of each of these replicas in a replication topology.

Figure 10–1 Role of Replicas in a Replication Topology

Figure shows the flow of replication traffic and LDAP
traffic.

Note –

The previous figure is for illustration purposes only and is not necessarily a recommended topology. Directory Server supports an unlimited number of masters in a multi-master topology. A master-only topology is recommended in most cases.

Assessing Initial Replication Requirements

A successful replicated directory service requires comprehensive testing and analysis in a production environment. However, the following basic calculation enables you to start designing a replicated topology. The sections that follow use the result of this calculation as the basis of the replicated topology design.

To Determine Initial Replication Requirements

Estimate the maximum number of searches per second that are required at peak usage time.

This estimate can be called Total searches.

Test the number of searches per second that a single host can achieve.

This estimate can be called Searches per host. Note that this should be evaluated with replication enabled.

The number of searches that a host can achieve is affected by several variables. Among these are the size of the entries, the capacity of the host, and the speed of the network. A number of third party performance testing tools are available to assist you in conducting these tests. The SLAMD Distributed Load Generation Engine (SLAMD) is an open source Java application designed for stress testing and performance analysis of network-based applications. SLAMD can be used effectively to perform this part of the replication assessment. For information about SLAMD, and to download the SLAMD software, see http://www.slamd.com.

Calculate the number of hosts that are required.

Number of hosts = Total searches / Searches per host

Load Balancing With Multi-Master Replication in a Single Data Center

Replication can balance the load on Directory Server in the following ways:

By spreading search activities across several servers
By dedicating specific servers to specific tasks or applications

Generally, if the Number of hosts calculated in Assessing Initial Replication Requirements is about 16, or not significantly larger, your topology should include only master servers in a fully connected topology. Fully connected means that every master replicates to every other master in the topology.

Note –

The Number of hosts is approximate and depends on the hardware and other details of the deployment.

The following figure assumes that the Number of hosts is two. LDAP operations are divided between two master servers, based on the type of client application. This strategy reduces the load that is placed on each server and increases the total number of operations that can be served by the deployment.

Figure 10–2 Using Multi-Master Replication for Load Balancing

Figure shows two different kinds of client applications,
whose requests are sent to two separate masters.

For a similar scenario in a global deployment, see Multi-Master Replication Over WAN.

Load Balancing With Replication in Large Deployments

If your deployment requires a Number of hosts significantly larger than 16, you might need to add dedicated consumers to the topology.

The following figure assumes that the Number of hosts is 24 and, for simplicity, shows only a portion of the topology. (The remaining 10 servers would have an identical configuration, with a total of 8 masters and 16 consumers.

Figure 10–3 Using Multi-Master Replication for Load Balancing in a Large Deployment

Figure shows multi-master replication with four masters
and eight consumers.

A change log can be enabled on any of these consumers if you need to do the following:

Promote the consumer to a master in the event of an outage
Perform a binary initialization from a master to any one of the consumers

If the Number of hosts is several hundred, you might want to add hubs to the topology. In such a case, there should be more hubs than masters, with up to 10 hubs for each master. Each hub should handle replication to only 20 consumers at most.

No topology should have the same number of hubs as masters, or the same number of hubs as consumers.

Using Server Groups to Simplify Multi-Master Topologies

When the Number of hosts is large, the use of server groups can simplify the topology and improve resource usage. In a topology with 16 masters, the use of four server groups, each containing four masters, is easier to manage than 16 fully meshed masters.

Setting up a such a topology involves the following steps:

Configure the 16 masters, without any replication agreements.
Create four server groups and include four masters in each group.
Set up replication agreements between all the masters in a single group.
Set up replication agreements between the first master of each group, the second master of each group, and so forth.

The following figure shows the resulting topology.

Figure 10–4 Server Groups in Multi-Master Topologies

Figure shows four server groups, each containing four
masters

Using Directory Proxy Server for Load Balancing

Directory Proxy Server can use multiple servers to distribute the load of a single source of data. Directory Proxy Server can also ensure that if one of the servers is unavailable, the data remains available. Apart from distributing data, Directory Proxy Server provides operation-based load balancing. That is, the server is able to route client operations to a specific Directory Server, based on the type of operation.

Directory Proxy Server supports operation-based load balancing, and a variety of load balancing algorithms that determine how the workload is shared between Directory Servers. For a detailed description of each of these algorithms, see Chapter 16, Directory Proxy Server Load Balancing and Client Affinity, in Sun Directory Server Enterprise Edition 7.0 Reference.

The following figure illustrates how the proportional algorithm is used to balance read load across two servers. Operation-based load balancing routes all writes to Master 1, unless that server fails. On failure all reads and writes are routed to Master 2.

Figure 10–5 Using Proportional and Operation-Based Load Balancing in a Scaled Deployment

Figure shows proportional and operation-based load balancing
with Directory Proxy Server.

Note that the configuration for load balancing is not recalculated when one server instance fails. You cannot use proportional load balancing to create a “hot standby” server by setting a server's load balancing weight to 0.

Imagine, for example, you have three servers A, B, and C. Proportional load balancing has been configured such that servers A and B each receive 50% of the load. Server C is configured to have 0% of the load as it is designed to be a standby server only. If server A fails, 100% of the load will go to server B automatically. Only if server B also fails, will the load be distributed to server C. So, either the instance participates in load balancing all the time, always ready to take part of the load, or all primary instances have to fail before that server will take any load.

You can achieve something like a hot standby by using the saturation load balancing algorithm and applying a low weight to the standby server. Although the server is not a true standby server, you can configure the algorithm such that requests are distributed to this server only if the primary servers are under heavy load. Effectively if one primary server is disabled, the load on the other primary servers increases to the extent that requests must be distributed to the standby server.

Using Distribution for Write Scalability

Write operations are resource intensive. When a client requests a write operation, the follow sequence of events occurs on the database:

The backend database is locked
The entry is locked in the database cache
The access control check plug-in is called
Any backend pre-operation plug-ins are called
The database transaction begins
The database files are updated
The old entry cache is replaced with new data
The database transaction is committed
Any backend post-operation plug-ins are called
The backend database is unlocked

Because of this complex procedure, an increased number of writes can have a dramatic impact on performance.

As an enterprise grows, more client applications require rapid write access to the directory. Also, as more information is stored in a single Directory Server, the cost of adding or modifying entries in the directory database increases. This is because indexes become larger and it takes longer to manipulate the information that the indexes contain.

In some cases, the service level agreements might only be achieved by having all the data cached in memory. However, the data might be too large to fit on a single memory machine

When the volume of directory data increases to this extent, you need to break up the data so that it can be stored in multiple servers. One approach is to use a hierarchy to divide the information. By separating the information into multiple branches based on some criteria, each branch can be stored on a separate server. Each server can then be configured with chaining or referrals to enable clients to access all the information from a single point.

In this kind of division, each server is responsible for only a part of the directory tree. A distributed directory works in a similar way to the Domain Name Service (DNS). The DNS assigns each portion of the DNS namespace to a particular DNS server. In the same way, you can distribute your directory namespace across servers while maintaining, from a client standpoint, a single directory tree.

A hierarchy-based distribution mechanism has certain disadvantages. The main problem is that this mechanism requires that the clients know exactly where the information is. Alternatively, the clients must perform a broad search to find the data. Another problem is that some directory-enabled applications might not have the capability to deal with the information if it is broken up into multiple branches.

Directory Server supports hierarchy-based distribution in conjunction with the chaining and referral mechanisms. However, a distribution feature is also provided with Directory Proxy Server, which supports smart routing. This feature enables you to decide on the best distribution mechanism for your enterprise.

Using Multiple Databases

Directory Server stores data in high-performance, disk-based LDBM databases. Each database consists of a set of files that contains all of the data that is assigned to this set. You can store different portions of your directory tree in different databases. Imagine, for example, that your directory tree contains three subsuffixes, as shown in the following figure.

Figure 10–6 Directory Tree With Three Subsuffixes

Figure shows a directory tree with one suffix and three
subsuffixes.

The data of the three subsuffixes can be stored in three separate databases as shown in the following figure.

Figure 10–7 Three Subsuffixes Stored in Three Separate Databases

Figure shows three subsuffixes stored in three separate
databases.

When you divide your directory tree among databases, the databases can be distributed across multiple servers. This strategy generally equates to several physical machines, which improves performance. The three databases in the preceding illustration can be stored on two servers as shown in the following figure.

Figure 10–8 Three Databases Stored on Two Separate Servers

Figure shows two databases stored on one server (A) and
one database stored on a different server (B).

When databases are distributed across multiple servers, the amount of work that each server needs to do is reduced. Thus, the directory can be made to scale to a much larger number of entries than would be possible with a single server. Because Directory Server supports dynamic addition of databases, you can add new databases as required, without making the entire directory unavailable.

Using Directory Proxy Server for Distribution

Directory Proxy Server divides directory information into multiple servers but does not require that the hierarchy of the data be altered. An important aspect of data distribution is the ability break up the data set in a logical manner. However, distribution logic that works well for one client application might not work as well for another client application.

For this reason, Directory Proxy Server enables you to specify how data is distributed and how directory requests should be routed. For example, LDAP operations can be routed to different directory servers based on the directory information tree (DIT) hierarchy. The operations can also be routed based on operation type or on a custom distribution algorithm.

Directory Proxy Server effectively hides the distribution details from the client application. From the clients' standpoint, a single directory addresses their directory queries. Client requests are distributed according to a particular distribution method. Different routing strategies can be associated with different portions of the DIT, as explained in the following sections.

Routing Based on the DIT

This strategy can be used to distribute directory entries based on the DIT structure. For example, entries in the subtree o=sales,dc=example,dc=com can be routed to Directory Server A, and entries in the subtree o=hr,dc=example,dc=com can be routed to Directory Server B.

Routing Based on a Custom Algorithm

In some cases, you might want to distribute entries across directory servers without using the DIT structure. Consider, for example, a service provider who stores entries that represent subscribers under ou=subscribers,dc=example,dc=com. As the number of subscribers grows, there might be a need to distribute them across servers based on the range of the subscriber ID. With a custom routing algorithm, subscriber entries with an ID in the range 1-10000 can be located in Directory Server A, and subscriber entries with an ID in the range 10001-infinity can be located in Directory Server B. If the data on server B grows too large, the distribution algorithm can be changed so that entries with an ID starting from 2000 can be located on a new server, Server C.

You can implement your own routing algorithm using the Directory Proxy Server DistributionAlgorithm interface.

Using Directory Proxy Server to Distribute Requests Based on Bind DN

In this scenario, an enterprise distributes customer data between three master servers based on geographical location. Customers that are based in the United Kingdom have their data stored on a master server in London. French customers have their data stored on a master server in Paris. The data for Japanese customers is stored on a master server in Tokyo. Customers can update their own data through a single web-based interface.

Users can update their own information in the directory using a web-based application. During the authentication phase, users enter an email address. email addresses for customers in the UK take the form *@uk.example.com. For French customers, the email addresses take the form *@fr.example.com, and for Japanese customers, *@ja.example.com. Directory Proxy Server receives these requests through an LDAP-enabled client application. Directory Proxy Server then routes the requests to the appropriate master server based on the email address entered during authentication.

This scenario is illustrated in the following figure.

Figure 10–9 Using Directory Proxy Server to Route Requests Based on Bind DN

Figure shows Directory Proxy Server distributing write requests
based on email address.

Distributing Data Lower Down in a DIT

In many cases, data distribution is not required at the top of the DIT. However, entries further up the tree might be required by the entries in the portion of the tree that has been distributed. This section provides a sample scenario that shows how to design a distribution strategy in this case.

Logical View of Distributed Data

Example.com has one subtree for groups and a separate subtree for people. The number of group definitions is small and fairly static, while the number of person entries is large, and continues to grow. Example.com therefore requires only the people entries to be distributed across three servers. However, the group definitions, their ACIs, and the ACIs located at the top of the naming context are required to access all entries under the people subtree.

The following illustration provides a logical view of the data distribution requirements.

Figure 10–10 Logical View of Distributed Data

Figure shows how the ou=people branch must be distributed.

Physical View of Data Storage

The ou=people subtree is split across three servers, according to the first letter of the sn attribute for each entry. The naming context (dc=example,dc=com) and the ou=groups containers are stored in one database on each server. This database is accessible to entries under ou=people. The ou=people container is stored in its own database.

The following illustration shows how the data is stored on the individual Directory Servers.

Figure 10–11 Physical View of Data Storage

Figure shows physical storage of distributed data.

Note that the ou=people container is not a subsuffix of the top container.

Directory Server Configuration for Sample Distribution Scenario

Each server described previously can be understood as a distribution chunk. The suffix that contains the naming context and the entries under ou=groups, is the same on each chunk. A multi-master replication agreement is therefore set up for this suffix across each of the three chunks.

For availability, each chunk is also replicated. At least two master replicas are therefore defined for each chunk.

The following illustration shows the Directory Server configuration with three replicas defined for each chunk. For simplification, the replication agreements are only shown for one chunk, although they are the same for the other two chunks.

Figure 10–12 Directory Server Configuration

Figure shows replication topology for distributed data.

Directory Proxy Server Configuration for Sample Distribution Scenario

Client access to directory data through Directory Proxy Server is provided through data views. For information about data views see Chapter 17, Directory Proxy Server Distribution, in Sun Directory Server Enterprise Edition 7.0 Reference.

For this scenario, one data view is required for each distributed suffix, and one data view is required for the naming context (dc=example,dc=com) and the ou=groups subtrees.

The following illustration shows the configuration of Directory Proxy Server data views to provide access to the distributed data.

Figure 10–13 Directory Proxy Server Configuration

Figure shows data view configuration for distributed
data.

Considerations for Data Growth

Distributed data is split according to a distribution algorithm. When you decide which distribution algorithm to use, bear in mind that the volume of data might change, and that your distribution strategy must be scalable. Do not use an algorithm that necessitates complete redistribution of data.

A numeric distribution algorithm based on uid, for example, can be scaled fairly easily. If you start with two data segments of uid=0-999 and uid=1000–1999, it is easy to add third segment of uid=2000–2999 at a later stage.

Using Referrals For Distribution

A referral is information returned by a server that tells a client application which server to contact to proceed with an operation request. If you do not use Directory Proxy Server to manage distribution logic, you must define the relationships between distributed data in another way. One way to define relationships is using referrals.

Directory Server supports three ways of configuring how and when referrals are returned:

Default referrals. The directory returns a default referral when a client application presents a DN for which the server does not have a matching suffix.
Suffix referrals. When an entire suffix has been taken offline for maintenance or security reasons, the server returns the referrals defined by that suffix. Read-only replicas of a suffix also return referrals to the master server when a client requests a write operation.
Smart referrals. These referrals are stored on entries within the directory. Smart referrals point to Directory Servers that have knowledge of the subtree whose DN matches the DN of the entry that contains the smart referral.

The following figure illustrates how referrals are used to direct clients from the UK to the appropriate server in a global topology. In this scenario, the client application must be able to connect to all the servers in the topology (at the TCP/IP level), to enable it to follow the referral.

Figure 10–14 Using Referrals to Direct Clients to a Specific Server

Figure shows client sending a request to consumer Directory
Server, which refers the client to a different server in the topology.

Using Directory Proxy Server With Referrals

You can use Directory Proxy Server in conjunction with the referral mechanism to achieve the same result. The advantage of using Directory Proxy Server in this regard is that the load and complexity of client applications is reduced. Client applications are only aware of the Directory Proxy Server URL. If the distribution logic is changed, for any reason, this change is transparent to client applications.

The following figure illustrates how the scenario described previously can be simplified with the use of Directory Proxy Server. Client applications always connect to the Proxy Server, which handles the referrals itself.

Figure 10–15 Using Directory Proxy Server With Referrals

Figure shows clients sending requests to Directory Proxy
Server, which handles all referrals.

Chapter 11 Designing a Global Deployment

In a global deployment, access to directory services is required in more than one geographical location, or data center. This chapter provides strategies for effectively deploying Directory Server Enterprise Edition across multiple data centers. The strategies ensure that the quality of service requirements identified in Chapter 5, Defining Service Level Agreements are not compromised.

This chapter covers the following topics:

Using Replication Across Multiple Data Centers

One of the goals of replication is to enable geographic distribution of the LDAP service. Replication enables you to have identical copies of information on multiple servers and across more than one data center. Replication concepts are outlined in Chapter 10, Designing a Scaled Deployment in this guide, and in Chapter 7, Directory Server Replication, in Sun Directory Server Enterprise Edition 7.0 Reference.

Directory Server supports the replication between its instances running on different platforms.

This section covers the following topics:

Multi-Master Replication

In multi-master replication, replicas of the same data exist on more than one server. For information about multi-master replication, see the following sections:

Concepts of Multi-Master Replication

In a multi-master configuration, data is updated on multiple masters. Each master maintains a change log, and the changes made on each master are replicated to the other servers. Each master plays the role of supplier and consumer.

Multi-master configurations have the following advantages:

Automatic write failover occurs when one master is inaccessible.
Updates can be made on a local master in a geographically distributed environment.

Multi-master replication uses a loose consistency replication model. This means that the same entries may be modified simultaneously on different servers. When updates are sent between the two servers, any conflicting changes must be resolved. Various attributes of a WAN, such as latency, can increase the chance of replication conflicts. Conflict resolution generally occurs automatically. A number of conflict rules determine which change takes precedence. In some cases conflicts must be resolved manually. For more information, see Solving Common Replication Conflicts in Sun Directory Server Enterprise Edition 7.0 Administration Guide.

The number of masters that are supported in a multi-master topology is theoretically unlimited. The number of consumers and hubs is also theoretically unlimited. However, the number of consumers to which a single supplier can replicate depends on the capacity of the supplier server. You can use the SLAMD Distributed Load Generation Engine (SLAMD) to assess the capacity of the supplier server. For information about SLAMD, and to download the SLAMD software, see http://www.slamd.com.

Each supplier in a multi-master environment must have a replication agreement. The following figure shows two master servers and their replication agreements.

Figure 11–1 Multi-Master Replication Configuration (Two Masters)

Figures shows multi-master replication with two master
servers and their replication agreements.

In the preceding figure, Master A and Master B have a master replica of the same data. Each master has a replication agreement that specifies the replication flow. Master A acts as a master in the scope of Replication Agreement 1, and as a consumer in the scope of Replication Agreement 2.

Multi-master replication can be used for the following tasks:

To replicate updates by using the replica ID.

Updates by using the replica ID make it possible for a consumer to be updated by multiple suppliers at the same time, provided that the updates originate from different replica IDs.
To enable or disable a replication agreement.

Replication agreements can be configured but left disabled, then enabled rapidly when required. This feature provides flexibility in replication configuration. This can be done whether you use multiple masters or not.

Multi-Master Replication Over WAN

Directory Server supports multi-master replication over a WAN. This feature enables multi-master replication configurations across geographical boundaries in international, multiple data center deployments.

Generally, if the Number of hosts calculated in Assessing Initial Replication Requirements is less than 16, or not significantly larger, your topology should include only master servers in a fully connected topology, that is, every master replicates to every other master in the topology. In a multi-master replication over WAN configuration, all Directory Server instances separated by a WAN must not be running versions prior to Directory Server 5.2. For a multi-master topology with more than 4 masters, Directory Server 6.x is required.

The replication protocol provides full asynchronous support, as well as window, grouping, and compression mechanisms. These features make multi-master replication over a WAN viable. Replication data transfer rates will always be less than what the available physical medium allows in terms of bandwidth. If the update volume between replicas cannot physically be made to fit into the available bandwidth, tuning will not prevent replicas from diverging under heavy update load. Replication delay and update performance are dependent on many factors, including but not limited to modification rate, entry size, server hardware, average latency and average bandwidth.

Internal parameters of the replication mechanism are optimized by default for WANs. However, if you experience slow replication due to the factors mentioned above, you may wish to empirically adjust the window size and group size parameters. You may also be able to schedule your replication to avoid peak network times, thus improving your overall network usage. Finally, Directory Server supports the compression of replication data to optimize bandwidth usage.

When you replicate data over a WAN link, some form of security to ensure data integrity and confidentiality is advised. For more information on security methods available in Directory Server, see Chapter 5, Directory Server Security, in Sun Directory Server Enterprise Edition 7.0 Reference.

Group and Window Mechanisms

Directory Server provides group and window mechanisms to optimize replication flow. The group mechanism enables you to specify that changes are sent in groups, rather than individually. The group size represents the maximum number of data modifications that can be bundled into a single update message. If the network connection appears to be the bottleneck for replication, increase the group size and check replication performance again. For information on configuring the group size, see Configuring Group Size in Sun Directory Server Enterprise Edition 7.0 Administration Guide.

The window mechanism specifies that a certain number of update requests are sent to the consumer, without the supplier having to wait for an acknowledgement from the consumer before continuing. The window size represents the maximum number of update messages that can be sent without immediate acknowledgement from the consumer. It is more efficient to send many messages in quick succession instead of waiting for an acknowledgement after each one. Using the appropriate window size, you can eliminate the time replicas spend waiting for replication updates or acknowledgements to arrive. If your consumer replica is lagging behind the supplier, increase the window size to a higher value than the default, such as 100, and check replication performance again before making further adjustments. When the replication update rate is high and the time between updates is therefore small, even replicas connected by a LAN can benefit from a higher window size. For information on configuring the window size, see Configuring Window Size in Sun Directory Server Enterprise Edition 7.0 Administration Guide.

Both the group and window mechanisms are based on change size. Therefore, optimizing replication performance with these mechanisms might be impractical if the size of your changes varies considerably. If the size of your changes is relatively constant, you can use the group and window mechanisms to optimize incremental and total updates.

Replication Compression

In addition to the grouping and window mechanisms, you can configure replication compression on Solaris and Linux platforms. Replication compression streamlines replication flow, which substantially reduces the incidence of bottlenecks in replication over a WAN. Compression of replicated data can increase replication performance in specific cases, such as networks with sufficient CPU but low bandwidth, or when there are bulk changes to be replicated. You can also benefit from replication compression when initializing a remote replica with large entries. Do not set this parameter in a LAN (local area network) where there is wide network bandwidth, because the compression and decompression computations will slow down replication.

The replication mechanism uses the Zlib compression library. Empirically test and select the compression level that gives you best results in your WAN environment for your expected replication usage.

For more information on configuring replication compression, see Configuring Replication Compression in Sun Directory Server Enterprise Edition 7.0 Administration Guide.

Fully Meshed Multi-Master Topology

In a fully meshed multi-master topology, each master is connected to each of the other masters. A fully meshed topology provides high availability and guaranteed data integrity. The following figure shows a fully meshed, four-way, multi-master replication topology with some consumers.

Figure 11–2 Fully Meshed, Four-Way, Multi-Master Replication Configuration

Figure shows a fully meshed, four-way, multi-master replication
topology

In Figure 11–2, the suffix is held on four masters to ensure that it is always available for modification requests. Each master maintains its own change log. When one of the masters processes a modification request from a client, it records the operation in its change log. The master then sends the replication update to the other masters, and in turn to the other consumers. Each master also stores a Replication Manager entry used to authenticate the other masters when they bind to send replication updates.

Each consumer stores one or more entries that correspond to the Replication Manager entries. The consumers use the entries to authenticate the masters when they bind to send replication updates. It is possible for each consumer to have just one Replication Manager entry that enables all masters to use the same Replication Manager entry for authentication. By default, the consumers have referrals set up for all masters in the topology. When consumers receive modification requests from the clients, they send the referrals to back to the client. For more information about referrals, see Referrals and Replication in Sun Directory Server Enterprise Edition 7.0 Reference.

Figure 11–3 presents a detailed view of the replication agreements, change logs, and Replication Manager entries that must be set up on Master A.Figure 11–4 provides the same detailed view for Consumer E.

Figure 11–3 Replication Configuration for Master A (Fully Meshed Topology)

Figure shows the replication agreements, change logs,
and Replication Manager entries in a fully meshed replication topology.

Figure 11–4 Replication Configuration for Consumer Server E (Fully Meshed Topology)

Figure shows a detailed view of the Replication Manager
entries that must be set up on Consumer E in a fully meshed topology.

Master A requires the following:

A master replica
A change log
Replication Manager entries for Masters B, C, and D, unless you use the same Replication Manager entry on each replica
Replication agreements for Masters B, C, and D, and for Consumers E, and F

Consumer E requires the following:

A consumer replica
Replication Manager entries to authenticate Masters A, and B when they bind to send replication updates

Cascading Replication

In a cascading replication configuration, a server acting as a hub receives updates from a server acting as a supplier. The hub replays those updates to consumers. The following figure illustrates a cascading replication configuration.

Figure 11–5 Cascading Replication Configuration

Cascading replication is useful in the following scenarios:

When there are a lot of consumers.

Because the masters in a replication topology handle all update traffic, it could put them under a heavy load to support replication traffic to the consumers. You can off-load replication traffic to several hubs that can each service replication updates to a subset of the consumers.
To reduce connection costs by using a local hub in geographically distributed environments.

The following figure shows cascading replication to a large number of consumers.

Figure 11–6 Cascading Replication to a Large Number of Consumers

Figure shows cascading replication to a large number
of consumers with replication traffic through several hubs..

In Figure 11–6, hubs 1 and 2 relay replication updates to consumers 1 through 10, leaving the master replicas with more resources to process directory updates.

The masters and the hubs maintain a change log. However, only the masters can process directory modification requests from clients. The hubs contains a Replication Manager entry for each master that sends updates to them. Consumers 1 through 10 contain Replication Manager entries for hubs 1 and 2.

The consumers and hubs can process search requests received from clients, but cannot process modification requests. The consumers and hubs refer modification requests to the masters.

Prioritized Replication

Prioritized replication can be used when there is a strong business requirement to have tighter consistency for replicated data on specific attributes. In previous versions of Directory Server, updates were replicated in the order in which they were received. With prioritized replication, you can specify that updates to certain attributes take precedence when they are replicated to other servers in the topology.

Priority is a boolean feature, it is on or off. There are no levels of priority. In a queue of updates waiting to be replicated, updates with priority are replicated before updates without priority.

Priority rules are configured with the following replication priority rule properties:

The identity of the client, bind-dn.
The type of update, op-tyupe.
The entry or subtree that was updated, base-dn.
The attributes changed by the update, att.

For information about these properties, see repl-priority(5dsconf).

When the master replicates an update to one or more hubs or consumer replicas, the priority of the update is the same across all of the hubs and consumer replicas. If one parameter is configured in a priority rule for prioritized replication, all updates that match that parameter are prioritized for replication. If two or more parameters are configured in a priority rule for prioritized replication, all updates that match all parameters are prioritized for replication.

In the following scenario, it is possible that a master replica attempts to replicate an update to an entry before it has replicated the addition of the entry:

The entry is added on the master replica and then updated on the master replica
The update operation has replication priority but the add operation does not have replication priority

In this scenario, the update operation cannot be replicated until the add operation is replicated. The update waits for its chronological turn, after the add operation, to be replicated.

Prioritized replication provides the following benefits:

Improved security. Prioritized replication is used by default for account lockout. Imagine for example that an employee leaves your organization, and you lock the employee's account. To ensure that the employee cannot log in to a remote server to which the account lockout has not been replicated, account lockout changes are replicated before other changes are replicated.
Improved consistency. Directory Server replication is loosely consistent. With prioritized replication, you can assure stronger consistency for certain attributes that are considered important in your organization.

Fractional Replication

A global topology (with data centers in different countries) might require restricting replication for security or compliance reasons. For example, legal restrictions might state that specific employee information cannot be copied outside of the U.S.A. Or, a site in Australia might require Australian employee details only.

The fractional replication feature enables only a subset of the attributes that are present in an entry to be replicated. Attribute lists are used to determine which attributes can and cannot be replicated. Fractional replication can only be applied to read-only consumers.

Fractional replication can be used to replicate a subset of the attributes of all entries in a suffix or sub-suffix. Fractional replication can be configured, per agreement, to include attributes in the replication or to exclude attributes from the replication. Usually, fractional replication is configured to exclude attributes. The interdependency between features and attributes make managing a list of included attributes difficult.

Fractional replication can be used for the following purposes:

To filter content for synchronization between intranet and extranet servers
To reduce replication costs when a deployment requires only certain attributes to be available everywhere

Fractional replication is configured with the replication agreement properties repl-fractional-include-attr and repl-fractional-exclude-attr attributes. For information about these properties, see repl-agmt(5dsconf). For information about how to configure fractional replication, see Fractional Replication in Sun Directory Server Enterprise Edition 7.0 Administration Guide.

Sample Replication Strategy for an International Enterprise

In this scenario, an enterprise has two major data centers, one in London and the other in New York, separated by a WAN. The scenario assumes that the network is very busy during normal business hours.

In this scenario, the Number of hosts has been calculated to be eight. A fully connected, 4-way multi-master topology is deployed in each of the two data centers. These two topologies are also fully connected to each other. For ease of comprehension, not all replication agreements between the two data centers are shown in the following diagram.

The replication strategy for this scenario includes the following:

Master copies of directory data are held on servers in both data centers.
A multi-master replication topology is deployed between the data centers to provide high availability and write-failover across the deployment.
Replication across the WAN link is scheduled so that it occurs only during off-peak hours to optimize bandwidth.
To increase performance, client applications are directed to local servers. Clients in the U.S. read from and write to masters in the New York data center. Clients in the UK read from and write to masters in the London data center.

Figure 11–7 Using Multi-Master Replication for Load Balancing in Two Data Centers

Figure shows multi-master replication across two data
centers, with four masters in each data center.

Using Directory Proxy Server in a Global Deployment

In a global enterprise, a centralized data model can cause scalability and performance issues. Directory Proxy Server can be used in such a situation to distribute data efficiently and to route search and update requests appropriately.

Sample Distribution Strategy for a Global Enterprise

In the architecture shown here, a large financial institution has its headquarters in London. The organization has data centers in London, New York, and Hong Kong. Currently, the vast majority of the data that is available to employees resides centrally in legacy RDBMS repositories in London. All access to this data from the financial institution’s client community is over the WAN.

The organization is experiencing scalability and performance problems with this centralized model and decides to move to a distributed data model. The organization also decides to deploy an LDAP directory infrastructure at the same time. Because the data in question is considered “mission critical” it must be deployed in a highly available, fault-tolerant infrastructure.

An analysis of client application profiles has revealed that the data is customer-based. Therefore, 95 percent of the data accessed by a geographical client community is specific to that community. Clients in Asia rarely access data for a customer in North America, although this does happen infrequently. The client community must also update customer information from time to time.

The following figure shows the logical architecture of the distributed solution.

Figure 11–8 Distributed Directory Infrastructure

A distributed architecture with Directory Proxy Server

Given the profile of 95 percent local data access, the organization decides to distribute the directory infrastructure geographically. Multiple directory consumers are deployed in each geographical location: Hong Kong, New York, and London. London consumers are not shown in the diagram for ease of understanding. Each of these consumers is configured to hold the customer data specific to the location. Data for European and Middle East customers is held in the London consumers. Data for North and South American customers is held in the New York consumers. Data for Asian and Pacific Rim customers is held in the Hong Kong consumers.

With this deployment, the overwhelming data requirement of the local client community is located in the community. This strategy provides significant performance improvements over the centralized model. Client requests are processed locally, reducing network overhead. The local directory servers effectively partition the directory infrastructure, which provides increased directory server performance and scalability. Each set of consumer directory servers is configured to return referrals if a client submits an update request. Referrals are also returned if a client submits a search request for data that is located elsewhere.

Client LDAP requests are sent to Directory Proxy Server through a hardware load balancer. The hardware load balancer ensures that clients always have access to at least one Directory Proxy Server. The locally deployed Directory Proxy Server initially routes all requests to the array of local directory servers that hold the local customer data. The instances of Directory Proxy Server are configured to load balance across the array of directory servers. This load balancing provides automatic failover and failback.

Client search requests for local customer information are satisfied by a local directory. Appropriate responses are returned to the client through Directory Proxy Server. Client search requests for geographically “foreign” customer information are initially satisfied by the local directory server by returning a referral back to Directory Proxy Server.

This referral contains an LDAP URL that points to the appropriate geographically distributed Directory Proxy Server instance. The local Directory Proxy Server processes the referral on behalf of the local client. The local Directory Proxy Server then sends the search request to the appropriate distributed instance of Directory Proxy Server. The distributed Directory Proxy Server forwards the search request on to the distributed Directory Server and receives the appropriate response. This response is then returned to the local client through the distributed and the local instances of Directory Proxy Server.

Update requests received by the local Directory Proxy Server are also satisfied initially by a referral returned by the local Directory Server. Directory Proxy Server follows the referral on behalf of the local client. However, this time the proxy forwards the update request to the supplier directory server located in London. The supplier Directory Server applies the update to the supplier database and sends a response back to the local client through the local Directory Proxy Server. Subsequently, the supplier Directory Server propagates the update down to the appropriate consumer Directory Server.

Chapter 12 Designing a Highly Available Deployment

High availability implies an agreed minimum “up time” and level of performance for your directory service. Agreed service levels vary from organization to organization. Service levels might depend on factors such as the time of day systems are accessed, whether or not systems can be brought down for maintenance, and the cost of downtime to the organization. Failure, in this context, is defined as anything that prevents the directory service from providing this minimum level of service.

This chapter covers the following topics:

Availability and Single Points of Failure

Directory Server Enterprise Edition deployments that provide high availability can quickly recover from failures. With a high availability deployment, component failures might impact individual directory queries but should not result in complete system failure. A single point of failure (SPOF) is a system component which, upon failure, renders an entire system unavailable or unreliable. When you design a highly available deployment, you identify potential SPOFs and investigate how these SPOFs can be mitigated.

SPOFs can be divided into three categories:

Hardware failures, for example, server crashes, network failures, power failures, or disk drive crashes
Software failures, for example, Directory Server or Directory Proxy Server crashes
Database corruption

Mitigating SPOFs

You can ensure that failure of a single component does not cause an entire directory service to fail by using redundancy. Redundancy involves providing redundant software components, hardware components, or both. Examples of this strategy include deploying multiple, replicated instances of Directory Server on separate hosts, or using redundant arrays of independent disks (RAID) for storage of Directory Server databases. Redundancy with replicated Directory Servers is the most efficient way to achieve high availability.

Advantages and Disadvantages of Redundancy

The more common approach to providing a highly available directory service is to use redundant server components and replication. Redundant solutions are usually less expensive, easier to implement, and easier to manage. Note that replication, as part of a redundant solution, has numerous functions other than availability. While the main advantage of replication is the ability to split the read load across multiple servers, this advantage causes additional overhead in terms of server management. Replication also offers scalability on read operations and, with proper design, scalability on write operations, within certain limits. For an overview of replication concepts, see Chapter 7, Directory Server Replication, in Sun Directory Server Enterprise Edition 7.0 Reference.

During a failure, a redundant system might provide poor availability. Imagine, for example, an environment in which the load is shared between two redundant server components. The failure of one server component might put an excessive load on the other server, making this server respond more slowly to client requests. A slow response might be considered a failure for clients that rely on quick response times. In other words, the availability of the service, even though the service is operational, might not meet the availability requirements of the client.

How Redundancy Handles SPOFs

In terms of the SPOFs that are described at the beginning of this chapter, redundancy handles failure in the following ways:

Single hardware failure. A single hardware failure is fatal to a machine. Therefore, even if you have redundant hardware, manual intervention is required to repair the failure.
Directory Server or Directory Proxy Server failure. The server is automatically restarted.
Database corruption. Depending on the architecture, a redundant solution should be able to survive database corruption.

Redundancy at the Hardware Level

This section provides basic information about hardware redundancy. Many publications provide comprehensive information about using hardware redundancy for high availability. In particular, see “Blueprints for High Availability” published by John Wiley & Sons, Inc.

Hardware SPOFs can be broadly categorized as follows:

Network failures
Failure of the physical servers on which Directory Server or Directory Proxy Server are running
Load balancer failures
Storage subsystem failures
Power supply failures

Failure at the network level can be mitigated by having redundant network components. When designing your deployment, consider having redundant components for the following:

Internet connection
Network interface card
Network cabling
Network switches
Gateways and routers

You can mitigate the load balancer as an SPOF by including a redundant load balancer in your architecture.

In the event of database corruption, you must have a database failover strategy to ensure availability. You can mitigate against SPOFs in the storage subsystem by using redundant server controllers. You can also use redundant cabling between controllers and storage subsystems, redundant storage subsystem controllers, or redundant arrays of independent disks.

If you have only one power supply, loss of this supply could make your entire service unavailable. To prevent this situation, consider providing redundant power supplies for hardware, where possible, and diversifying power sources. Additional methods of mitigating SPOFs in the power supply include using surge protectors, multiple power providers, and local battery backups, and generating power locally.

Failure of an entire data center can occur if, for example, a natural disaster strikes a particular geographic region. In this instance, a well-designed multiple data center replication topology can prevent an entire distributed directory service from becoming unavailable. For more information, see Using Replication and Redundancy for High Availability.

Redundancy at the Software Level

Failure in Directory Server or Directory Proxy Server can include the following:

Excessive response time
Write overload
- Maximized file descriptors
- Maximized file system
- Poor storage configuration
- Too many indexes
Read overload
Cache issues
CPU constraints
Replication issues
- Synchronicity
- Replication propagation delay
- Replication flow
- Replication overload
Large wildcard searches

These SPOFs can be mitigated by having redundant instances of Directory Server and Directory Proxy Server. Redundancy at the software level involves the use of replication. Replication ensures that the redundant servers remain synchronized, and that requests can be rerouted with no downtime. For more information, see Using Replication and Redundancy for High Availability.

Using Replication and Redundancy for High Availability

Replication can be used to prevent the loss of a single server from causing your directory service to become unavailable. A reliable replication topology ensures that the most recent data is available to clients across data centers, even in the case of a server failure. At a minimum, your local directory tree needs to be replicated to at least one backup server. Some directory architects say that you should replicate three times per physical location for maximum data reliability. In deciding how much to use replication for fault tolerance, consider the quality of the hardware and networks used by your directory. Unreliable hardware requires more backup servers.

Do not use replication as a replacement for a regular data backup policy. For information about backing up directory data, see Designing Backup and Restore Policies and Chapter 8, Directory Server Backup and Restore, in Sun Directory Server Enterprise Edition 7.0 Administration Guide.

LDAP client applications are usually configured to search one LDAP server only. Custom client applications can be written to rotate through LDAP servers that are located at different DNS host names. Otherwise, LDAP client applications can only be configured to look at a single DNS host name for Directory Server. You can use Directory Proxy Server, DNS round robins, or network sorts to provide failover to backup Directory Servers. For information about setting up and using DNS round robins or network sorts, see your DNS documentation. For information about how Directory Proxy Server is used in this context, see Using Directory Proxy Server as Part of a Redundant Solution.

To maintain the ability to read data in the directory, a suitable load balancing strategy must be put in place. Both software and hardware load balancing solutions exist to distribute read load across multiple replicas. Each of these solutions can also determine the state of each replica and to manage its participation in the load balancing topology. The solutions might vary in terms of completeness and accuracy.

To maintain write failover over geographically distributed sites, you can use multiple data center replication over WAN. This entails setting up at least two master servers in each data center, and configuring the servers to be fully meshed over the WAN. This strategy prevents loss of service if any of the masters in the topology fail. Write operations must be routed to an alternative server if a writable server becomes unavailable. Various methods can be used to reroute write operations, including Directory Proxy Server.

The following sections describe how replication and redundancy are used to ensure high availability:

Using Redundant Replication Agreements

Redundant replication agreements enable rapid recovery in the event of failure. The ability to enable and disable replication agreements means that you can set up replication agreements that are used only if the original replication topology fails. Although this intervention is manual, the strategy is much less time consuming than waiting to set up the replication agreement when it is needed. The use of redundant replication agreements is explained and illustrated in Sample Topologies Using Redundancy for High Availability.

Promoting and Demoting Replicas

Promoting or demoting a replica changes its role in the replication topology. In a very large topology that contains dedicated consumers and hubs, online promotion and demotion of replicas can form part of a high availability strategy. Imagine, for example, a multi-master replication scenario, with two hubs configured for additional load balancing and failover. If one master goes offline, you can promote one of the hubs to a master to maintain optimal read-write availability. When the master replica comes back online, a simple demotion back to a hub replica returns you to the original topology.

For more information, see Promoting or Demoting Replicas in Sun Directory Server Enterprise Edition 7.0 Administration Guide.

Using Directory Proxy Server as Part of a Redundant Solution

Directory Proxy Server is designed to support high availability directory deployments. The proxy provides automatic load balancing as well as automatic failover and fail back among a set of replicated Directory Servers. Should one or more Directory Servers in the topology become unavailable, the load is proportionally redistributed among the remaining servers.

Directory Proxy Server actively monitors the Directory Servers to ensure that the servers are still online. The proxy also examines the status of each operation that is performed. Servers might not all be equivalent in throughput and performance. If a primary server becomes unavailable, traffic that is temporarily redirected to a secondary server is directed back to the primary server as soon as the primary server becomes available.

Note that when data is distributed, multiple disconnected replication topologies must be managed, which makes administration more complex. In addition, Directory Proxy Server relies heavily on the proxy authorization control to manage user authorization. A specific administrative user must be created on each Directory Server that is involved in the distribution. These administrative users must be granted proxy access control rights.

Using Application Isolation for High Availability

Directory Proxy Server can also be used to protect a replicated directory service from failure due to a faulty client application. To improve availability, a limited set of masters or replicas is assigned to each application.

Suppose a faulty application causes a server shutdown when the application performs a specific action. If the application fails over to each successive replica, a single problem with one application can result in failure of the entire replicated topology. To avoid such a scenario, you can restrict failover and load balancing of each application to a limited number of replicas. The potential failure is then limited to this set of replicas, and the impact of the failure on other applications is reduced.

Sample Topologies Using Redundancy for High Availability

The following sample topologies show how redundancy is used to provide continued service in the event of failure.

Using Replication for Availability in a Single Data Center

The data center that is illustrated in the following figure has a multi-master topology with three masters. In this scenario, the third master is used only for availability in case of failure. Read and write operations are routed to Masters 1 and 2 by Directory Proxy Server, unless a problem occurs. To speed up recovery and to minimize the number of replication agreements, recovery replication agreements are created. These agreements are disabled by default but can be enabled rapidly in the event of a failure.

Figure 12–1 Multi-Master Replication in a Single Data Center

Figure shows a single data center, with three master
Directory Servers and a Directory Proxy Server

Single Data Center Failure Matrix

In the scenario depicted in Figure 12–1, various components might become unavailable. These potential points of failure and the related recovery actions are described in this table.

Table 12–1 Single Data Center Failure Matrix


Failed Component	Action
Master 1	Read and write operations are rerouted to Masters 2 and 3 through Directory Proxy Server while Master 1 is repaired. The recovery replication agreement between Master 2 and Master 3 is enabled so that updates to Master 3 are replicated to Master 2.
Master 2	Read and write operations are rerouted to Masters 1 and 3 while Master 2 is repaired. The recovery replication agreement between Master 1 and Master 3 is enabled so that updates to Master 3 are replicated to Master 1.
Master 3	Because Master 3 is a backup server only, the directory service is not affected if this master fails. Master 3 can be taken offline and repaired without interruption to service.
Directory Proxy Server	Failure of Directory Proxy Server results in severe service interruption. A redundant instance of Directory Proxy Server is advisable in this topology. For an example of such a topology, see Using Multiple Directory Proxy Servers.

Single Data Center Recovery Procedure

In a single data center with three masters, read and write capability is maintained if one master fails. This section describes a sample recovery strategy that can be applied to reinstate the failed component.

The following flowchart and procedure assume that one component, Master 1, has failed. If two masters fail simultaneously, read and write operations must be routed to the remaining master while the problems are fixed.

Figure 12–2 Single Data Center Sample Recovery Procedure

Flowchart showing recovery procedure if one component
fails.

To Recover on Failure of One Component

If Master 1 is not already stopped, stop it.

Identify the cause of the failure.
- If the failure is easily repaired, by replacing a network cable, for example, make the repair and go to Step 3.
- If the problem is more serious, the failure might take more time to fix.
1. Ensure that any applications that access Master 1 are redirected to point to Master 2 or Master 3, through Directory Proxy Server.
2. Check the availability of a recent backup.
  - If a recent backup is available, reinitialize Master 1 from the backup and go to Step 3.
  - If a recent backup is not available, do one of the following:
    - Restart Master 1 and perform a total initialization from Master 2 or from Master 3 to Master 1.
      
      For details on this procedure, see Initializing Replicas in Sun Directory Server Enterprise Edition 7.0 Administration Guide.
    - If performing a total initialization will take too long, perform an online export from Master 2, or Master 3, and an import to Master 1.

Start Master 1, if it is not already started.

If Master 1 is in read-only mode, set it to read/write mode.

Check that replication is functioning correctly.

You can use DSCC, dsccmon view-suffixes, or the insync command to check replication.

For more information, see Getting Replication Status in Sun Directory Server Enterprise Edition 7.0 Administration Guide, dsccmon(1M), and insync(1).

Using Replication for Availability Across Two Data Centers

Generally in a deployment with two data centers, the same recovery strategy can be applied as described for a single data center. If one or more masters become unavailable, Directory Proxy Server automatically reroutes local reads and writes to the remaining masters.

As in the single data center scenario described previously, recovery replication agreements can be enabled. These agreements ensure that both data centers continue to receive replicated updates in the event of failure. This recovery strategy is illustrated in Figure 12–3.

An alternative to using recovery replication agreements is to use a fully meshed topology in which every master replicates its changes to every other master. While fewer replication agreements might be easier to manage, no technical reason exists for not using a fully meshed topology.

The only SPOF in this scenario would be the Directory Proxy Server in each data center. Redundant Directory Proxy Servers can be deployed to eliminate this problem, as shown in Figure 12–4.

Figure 12–3 Recovery Replication Agreements For Two Data Centers

Multi-master replication topology in two data centers
showing redundant recovery replication agreements

The recovery strategy depends on which combination of components fails. However, after you have a basic strategy in place to cope with multiple failures, you can apply that strategy if other components fail.

In the sample topology depicted in Figure 12–3, assume that Master 1 and Master 3 in the New York data center fail.

In this scenario, Directory Proxy Server automatically reroutes reads and writes in the New York data center to Master 2 and Master 4. This ensures that local read and write capability is maintained at the New York site.

Using Multiple Directory Proxy Servers

The deployment shown in the following figure includes an enterprise firewall that rejects outside access to internal LDAP services. Client LDAP requests that are initiated internally go through Directory Proxy Server by way of a network load balancer, ensuring high availability at the IP level. Direct access to the Directory Servers is prevented, except for the host that is running Directory Proxy Server. Two Directory Proxy Servers are deployed to prevent the proxy from becoming an SPOF.

A fully meshed multi-master topology ensures that all masters can be used at any time in the event of failure of any other master. For simplicity, not all replication agreements are shown in this diagram.

Figure 12–4 Internal High Availability Configuration

A highly available architecture with four Directory Server replicas
and two Directory Proxy Servers

Using Application Isolation

In the scenario illustrated in the following figure a bug in Application 1 causes Directory Server to fail. The proxy configuration ensures that LDAP requests from Application 1 are only ever sent to Master 1 and to Master 3. When the bug occurs, Masters 1 and 3 fail. However, Applications 2, 3, and 4 are not disabled, because they can still reach a functioning Directory Server.

Figure 12–5 Using Application Isolation in a Scaled Deployment

Figure shows Directory Proxy Server balancing requests based
on client application.