The data center that is illustrated in the following figure has a multi-master topology with three masters. In this scenario, the third master is used only for availability in thdse event of failure. Read and write operations are routed to Masters 1 and 2 by Directory Proxy Server, unless a problem occurs. To speed up recovery and to minimize the number of replication agreements, recovery replication agreements are created. These agreements are disabled by default but can be enabled rapidly in the event of a failure.
In the scenario depicted in Figure 12–1, various components might become unavailable. These potential points of failure and the related recovery actions are described in this table.
Table 12–1 Single Data Center Failure Matrix
Failed Component |
Action |
---|---|
Master 1 |
Read and write operations are rerouted to Masters 2 and 3 through Directory Proxy Server while Master 1 is repaired. The recovery replication agreement between Master 2 and Master 3 is enabled so that updates to Master 3 are replicated to Master 2. |
Master 2 |
Read and write operations are rerouted to Masters 1 and 3 while Master 2 is repaired. The recovery replication agreement between Master 1 and Master 3 is enabled so that updates to Master 3 are replicated to Master 1. |
Master 3 |
Because Master 3 is a backup server only, the directory service is not affected if this master fails. Master 3 can be taken offline and repaired without interruption to service. |
Directory Proxy Server |
Failure of Directory Proxy Server results in severe service interruption. A redundant instance of Directory Proxy Server is advisable in this topology. For an example of such a topology, see Using Multiple Directory Proxy Servers. |
In a single data center with three masters, read and write capability is maintained if one master fails. This section describes a sample recovery strategy that can be applied to reinstate the failed component.
The following flowchart and procedure assume that one component, Master 1, has failed. If two masters fail simultaneously, read and write operations must be routed to the remaining master while the problems are fixed.
If Master 1 is not already stopped, stop it.
Identify the cause of the failure.
If the failure is easily repaired, by replacing a network cable, for example, make the repair and go to Step 3.
If the problem is more serious, the failure might take more time to fix.
Ensure that any applications that access Master 1 are redirected to point to Master 2 or Master 3, through Directory Proxy Server.
Check the availability of a recent backup.
If a recent backup is available, reinitialize Master 1 from the backup and go to Step 3.
If a recent backup is not available, do one of the following:
Restart Master 1 and perform a total initialization from Master 2 or from Master 3 to Master 1.
For details on this procedure, see Initializing Replicas in Sun Java System Directory Server Enterprise Edition 6.0 Administration Guide.
If performing a total initialization will take too long, perform an online export from Master 2, or Master 3, and an import to Master 1.
Start Master 1, if it is not already started.
If Master 1 is in read-only mode, set it to read/write mode.
Check that replication is functioning correctly.
You can use DSCC, dsccmon view-suffixes, or the insync command to check replication.
For more information, see Getting Replication Status in Sun Java System Directory Server Enterprise Edition 6.0 Administration Guide, dsccmon(1M), and insync(1).