Sun Java System Directory Server Enterprise Edition 6.3 Deployment Planning Guide

Using Replication and Redundancy for High Availability

Replication can be used to prevent the loss of a single server from causing your directory service to become unavailable. A reliable replication topology ensures that the most recent data is available to clients across data centers, even in the case of a server failure. At a minimum, your local directory tree needs to be replicated to at least one backup server. Some directory architects say that you should replicate three times per physical location for maximum data reliability. In deciding how much to use replication for fault tolerance, consider the quality of the hardware and networks used by your directory. Unreliable hardware requires more backup servers.

Do not use replication as a replacement for a regular data backup policy. For information about backing up directory data, see Designing Backup and Restore Policies and Chapter 9, Directory Server Backup and Restore, in Sun Java System Directory Server Enterprise Edition 6.3 Administration Guide.

LDAP client applications are usually configured to search one LDAP server only. Custom client applications can be written to rotate through LDAP servers that are located at different DNS host names. Otherwise, LDAP client applications can only be configured to look at a single DNS host name for Directory Server. You can use Directory Proxy Server, DNS round robins, or network sorts to provide failover to backup Directory Servers. For information about setting up and using DNS round robins or network sorts, see your DNS documentation. For information about how Directory Proxy Server is used in this context, see Using Directory Proxy Server as Part of a Redundant Solution.

To maintain the ability to read data in the directory, a suitable load balancing strategy must be put in place. Both software and hardware load balancing solutions exist to distribute read load across multiple replicas. Each of these solutions can also determine the state of each replica and to manage its participation in the load balancing topology. The solutions might vary in terms of completeness and accuracy.

To maintain write failover over geographically distributed sites, you can use multiple data center replication over WAN. This entails setting up at least two master servers in each data center, and configuring the servers to be fully meshed over the WAN. This strategy prevents loss of service if any of the masters in the topology fail. Write operations must be routed to an alternative server if a writable server becomes unavailable. Various methods can be used to reroute write operations, including Directory Proxy Server.

The following sections describe how replication and redundancy are used to ensure high availability:

Using Redundant Replication Agreements

Redundant replication agreements enable rapid recovery in the event of failure. The ability to enable and disable replication agreements means that you can set up replication agreements that are used only if the original replication topology fails. Although this intervention is manual, the strategy is much less time consuming than waiting to set up the replication agreement when it is needed. The use of redundant replication agreements is explained and illustrated in Sample Topologies Using Redundancy for High Availability.

Promoting and Demoting Replicas

Promoting or demoting a replica changes its role in the replication topology. In a very large topology that contains dedicated consumers and hubs, online promotion and demotion of replicas can form part of a high availability strategy. Imagine, for example, a multi-master replication scenario, with two hubs configured for additional load balancing and failover. If one master goes offline, you can promote one of the hubs to a master to maintain optimal read-write availability. When the master replica comes back online, a simple demotion back to a hub replica returns you to the original topology.

For more information, see Promoting or Demoting Replicas in Sun Java System Directory Server Enterprise Edition 6.3 Administration Guide.

Using Directory Proxy Server as Part of a Redundant Solution

Directory Proxy Server is designed to support high availability directory deployments. The proxy provides automatic load balancing as well as automatic failover and fail back among a set of replicated Directory Servers. Should one or more Directory Servers in the topology become unavailable, the load is proportionally redistributed among the remaining servers.

Directory Proxy Server actively monitors the Directory Servers to ensure that the servers are still online. The proxy also examines the status of each operation that is performed. Servers might not all be equivalent in throughput and performance. If a primary server becomes unavailable, traffic that is temporarily redirected to a secondary server is directed back to the primary server as soon as the primary server becomes available.

Note that when data is distributed, multiple disconnected replication topologies must be managed, which makes administration more complex. In addition, Directory Proxy Server relies heavily on the proxy authorization control to manage user authorization. A specific administrative user must be created on each Directory Server that is involved in the distribution. These administrative users must be granted proxy access control rights.

Using Application Isolation for High Availability

Directory Proxy Server can also be used to protect a replicated directory service from failure due to a faulty client application. To improve availability, a limited set of masters or replicas is assigned to each application.

Suppose a faulty application causes a server shutdown when the application performs a specific action. If the application fails over to each successive replica, a single problem with one application can result in failure of the entire replicated topology. To avoid such a scenario, you can restrict failover and load balancing of each application to a limited number of replicas. The potential failure is then limited to this set of replicas, and the impact of the failure on other applications is reduced.

Sample Topologies Using Redundancy for High Availability

The following sample topologies show how redundancy is used to provide continued service in the event of failure.

Using Replication for Availability in a Single Data Center

The data center that is illustrated in the following figure has a multi-master topology with three masters. In this scenario, the third master is used only for availability in thdse event of failure. Read and write operations are routed to Masters 1 and 2 by Directory Proxy Server, unless a problem occurs. To speed up recovery and to minimize the number of replication agreements, recovery replication agreements are created. These agreements are disabled by default but can be enabled rapidly in the event of a failure.

Figure 12–1 Multi-Master Replication in a Single Data Center

Figure shows a single data center, with three master
Directory Servers and a Directory Proxy Server

Single Data Center Failure Matrix

In the scenario depicted in Figure 12–1, various components might become unavailable. These potential points of failure and the related recovery actions are described in this table.

Table 12–1 Single Data Center Failure Matrix


Failed Component	Action
Master 1	Read and write operations are rerouted to Masters 2 and 3 through Directory Proxy Server while Master 1 is repaired. The recovery replication agreement between Master 2 and Master 3 is enabled so that updates to Master 3 are replicated to Master 2.
Master 2	Read and write operations are rerouted to Masters 1 and 3 while Master 2 is repaired. The recovery replication agreement between Master 1 and Master 3 is enabled so that updates to Master 3 are replicated to Master 1.
Master 3	Because Master 3 is a backup server only, the directory service is not affected if this master fails. Master 3 can be taken offline and repaired without interruption to service.
Directory Proxy Server	Failure of Directory Proxy Server results in severe service interruption. A redundant instance of Directory Proxy Server is advisable in this topology. For an example of such a topology, see Using Multiple Directory Proxy Servers.

Single Data Center Recovery Procedure

In a single data center with three masters, read and write capability is maintained if one master fails. This section describes a sample recovery strategy that can be applied to reinstate the failed component.

The following flowchart and procedure assume that one component, Master 1, has failed. If two masters fail simultaneously, read and write operations must be routed to the remaining master while the problems are fixed.

Figure 12–2 Single Data Center Sample Recovery Procedure

Flowchart showing recovery procedure if one component
fails.

To Recover on Failure of One Component

If Master 1 is not already stopped, stop it.

Identify the cause of the failure.
- If the failure is easily repaired, by replacing a network cable, for example, make the repair and go to Step 3.
- If the problem is more serious, the failure might take more time to fix.
1. Ensure that any applications that access Master 1 are redirected to point to Master 2 or Master 3, through Directory Proxy Server.
2. Check the availability of a recent backup.
  - If a recent backup is available, reinitialize Master 1 from the backup and go to Step 3.
  - If a recent backup is not available, do one of the following:
    - Restart Master 1 and perform a total initialization from Master 2 or from Master 3 to Master 1.
      
      For details on this procedure, see Initializing Replicas in Sun Java System Directory Server Enterprise Edition 6.3 Administration Guide.
    - If performing a total initialization will take too long, perform an online export from Master 2, or Master 3, and an import to Master 1.

Start Master 1, if it is not already started.

If Master 1 is in read-only mode, set it to read/write mode.

Check that replication is functioning correctly.

You can use DSCC, dsccmon view-suffixes, or the insync command to check replication.

For more information, see Getting Replication Status in Sun Java System Directory Server Enterprise Edition 6.3 Administration Guide, dsccmon(1M), and insync(1).

Using Replication for Availability Across Two Data Centers

Generally in a deployment with two data centers, the same recovery strategy can be applied as described for a single data center. If one or more masters become unavailable, Directory Proxy Server automatically reroutes local reads and writes to the remaining masters.

As in the single data center scenario described previously, recovery replication agreements can be enabled. These agreements ensure that both data centers continue to receive replicated updates in the event of failure. This recovery strategy is illustrated in Figure 12–3.

An alternative to using recovery replication agreements is to use a fully meshed topology in which every master replicates its changes to every other master. While fewer replication agreements might be easier to manage, no technical reason exists for not using a fully meshed topology.

The only SPOF in this scenario would be the Directory Proxy Server in each data center. Redundant Directory Proxy Servers can be deployed to eliminate this problem, as shown in Figure 12–4.

Figure 12–3 Recovery Replication Agreements For Two Data Centers

Multi-master replication topology in two data centers
showing redundant recovery replication agreements

The recovery strategy depends on which combination of components fails. However, after you have a basic strategy in place to cope with multiple failures, you can apply that strategy if other components fail.

In the sample topology depicted in Figure 12–3, assume that Master 1 and Master 3 in the New York data center fail.

In this scenario, Directory Proxy Server automatically reroutes reads and writes in the New York data center to Master 2 and Master 4. This ensures that local read and write capability is maintained at the New York site.

Using Multiple Directory Proxy Servers

The deployment shown in the following figure includes an enterprise firewall that rejects outside access to internal LDAP services. Client LDAP requests that are initiated internally go through Directory Proxy Server by way of a network load balancer, ensuring high availability at the IP level. Direct access to the Directory Servers is prevented, except for the host that is running Directory Proxy Server. Two Directory Proxy Servers are deployed to prevent the proxy from becoming an SPOF.

A fully meshed multi-master topology ensures that all masters can be used at any time in the event of failure of any other master. For simplicity, not all replication agreements are shown in this diagram.

Figure 12–4 Internal High Availability Configuration

A highly available architecture with four Directory Server replicas
and two Directory Proxy Servers

Using Application Isolation

In the scenario illustrated in the following figure a bug in Application 1 causes Directory Server to fail. The proxy configuration ensures that LDAP requests from Application 1 are only ever sent to Master 1 and to Master 3. When the bug occurs, Masters 1 and 3 fail. However, Applications 2, 3, and 4 are not disabled, because they can still reach a functioning Directory Server.

Figure 12–5 Using Application Isolation in a Scaled Deployment

Figure shows Directory Proxy Server balancing requests based
on client application.