The following sample topologies show how redundancy is used to provide continued service in the event of failure.
The data center that is illustrated in the following figure has a multi-master topology with three masters. In this scenario, the third master is used only for availability in case of failure. Read and write operations are routed to Masters 1 and 2 by Directory Proxy Server, unless a problem occurs. To speed up recovery and to minimize the number of replication agreements, recovery replication agreements are created. These agreements are disabled by default but can be enabled rapidly in the event of a failure.
In the scenario depicted in Figure 12–1, various components might become unavailable. These potential points of failure and the related recovery actions are described in this table.
Table 12–1 Single Data Center Failure Matrix
Failed Component |
Action |
---|---|
Master 1 |
Read and write operations are rerouted to Masters 2 and 3 through Directory Proxy Server while Master 1 is repaired. The recovery replication agreement between Master 2 and Master 3 is enabled so that updates to Master 3 are replicated to Master 2. |
Master 2 |
Read and write operations are rerouted to Masters 1 and 3 while Master 2 is repaired. The recovery replication agreement between Master 1 and Master 3 is enabled so that updates to Master 3 are replicated to Master 1. |
Master 3 |
Because Master 3 is a backup server only, the directory service is not affected if this master fails. Master 3 can be taken offline and repaired without interruption to service. |
Directory Proxy Server |
Failure of Directory Proxy Server results in severe service interruption. A redundant instance of Directory Proxy Server is advisable in this topology. For an example of such a topology, see Using Multiple Directory Proxy Servers. |
In a single data center with three masters, read and write capability is maintained if one master fails. This section describes a sample recovery strategy that can be applied to reinstate the failed component.
The following flowchart and procedure assume that one component, Master 1, has failed. If two masters fail simultaneously, read and write operations must be routed to the remaining master while the problems are fixed.
If Master 1 is not already stopped, stop it.
Identify the cause of the failure.
If the failure is easily repaired, by replacing a network cable, for example, make the repair and go to Step 3.
If the problem is more serious, the failure might take more time to fix.
Ensure that any applications that access Master 1 are redirected to point to Master 2 or Master 3, through Directory Proxy Server.
Check the availability of a recent backup.
If a recent backup is available, reinitialize Master 1 from the backup and go to Step 3.
If a recent backup is not available, do one of the following:
Restart Master 1 and perform a total initialization from Master 2 or from Master 3 to Master 1.
For details on this procedure, see Initializing Replicas in Sun Directory Server Enterprise Edition 7.0 Administration Guide.
If performing a total initialization will take too long, perform an online export from Master 2, or Master 3, and an import to Master 1.
Start Master 1, if it is not already started.
If Master 1 is in read-only mode, set it to read/write mode.
Check that replication is functioning correctly.
You can use DSCC, dsccmon view-suffixes, or the insync command to check replication.
For more information, see Getting Replication Status in Sun Directory Server Enterprise Edition 7.0 Administration Guide, dsccmon(1M), and insync(1).
Generally in a deployment with two data centers, the same recovery strategy can be applied as described for a single data center. If one or more masters become unavailable, Directory Proxy Server automatically reroutes local reads and writes to the remaining masters.
As in the single data center scenario described previously, recovery replication agreements can be enabled. These agreements ensure that both data centers continue to receive replicated updates in the event of failure. This recovery strategy is illustrated in Figure 12–3.
An alternative to using recovery replication agreements is to use a fully meshed topology in which every master replicates its changes to every other master. While fewer replication agreements might be easier to manage, no technical reason exists for not using a fully meshed topology.
The only SPOF in this scenario would be the Directory Proxy Server in each data center. Redundant Directory Proxy Servers can be deployed to eliminate this problem, as shown in Figure 12–4.
The recovery strategy depends on which combination of components fails. However, after you have a basic strategy in place to cope with multiple failures, you can apply that strategy if other components fail.
In the sample topology depicted in Figure 12–3, assume that Master 1 and Master 3 in the New York data center fail.
In this scenario, Directory Proxy Server automatically reroutes reads and writes in the New York data center to Master 2 and Master 4. This ensures that local read and write capability is maintained at the New York site.
The deployment shown in the following figure includes an enterprise firewall that rejects outside access to internal LDAP services. Client LDAP requests that are initiated internally go through Directory Proxy Server by way of a network load balancer, ensuring high availability at the IP level. Direct access to the Directory Servers is prevented, except for the host that is running Directory Proxy Server. Two Directory Proxy Servers are deployed to prevent the proxy from becoming an SPOF.
A fully meshed multi-master topology ensures that all masters can be used at any time in the event of failure of any other master. For simplicity, not all replication agreements are shown in this diagram.
In the scenario illustrated in the following figure a bug in Application 1 causes Directory Server to fail. The proxy configuration ensures that LDAP requests from Application 1 are only ever sent to Master 1 and to Master 3. When the bug occurs, Masters 1 and 3 fail. However, Applications 2, 3, and 4 are not disabled, because they can still reach a functioning Directory Server.