6 Understanding Oracle Unified Directory High Availability Deployments

It is essential to have a reliable service as most enterprise applications depend on a directory server. You can deploy Oracle Unified Directory to ensure high availability between directory server instances or between groups of directory server instances.

The following topics explain high availability and how Oracle Unified Directory features help provide continued service if a system failure occurs:

6.1 Overview of High Availability

As more and more businesses and mission-critical applications connect with identities being centrally managed, it has become imperative to have LDAP service available all the time. High availability with performance has become the distinguishing feature of all extranet and enterprise deployments.

High availability is a system design approach and its associated implementation that ensures an agreed level of operational performance during a given measurement period for your directory service.

Agreed service levels vary from one organization to another. Service levels also depend on several factors such as the time of day systems are accessed, whether systems can be brought down for maintenance, and the cost of downtime to the organization. Failure or downtime in this context, is defined as periods when a system is unavailable and prevents from providing the agreed level of service.

Oracle Unified Directory provides elaborate cost-effective and easy-to-use high availability features, which eliminate the downtime and maximize the time when the system is available.

6.2 Understanding Availability and Single Points of Failure

Oracle Unified Directory deployments that provide highly available service can recover from failures and maintain service within agreed level of service. With a high availability deployment, component failures might impact individual directory queries but does not result into a complete system failure.

A single point of failure (SPOF) is a system component, which on failure renders an entire system unavailable or unreliable. When you design a highly available deployment, you identify potential SPOFs and investigate how to mitigate these SPOFs.

The following topics discuss availability and single points of failure:

6.2.1 Understanding Types of SPOFs

A single point of failure (SPOF) is a system component, which on failure renders an entire system unavailable or unreliable. When you design a highly available deployment, you identify potential SPOFs and investigate how to mitigate these SPOFs.

You can divide SPOFs into the following categories:

6.2.1.1 About Hardware Failure

You can broadly categorize the hardware SPOFs as follows:

  • Network failures

  • Failure of the physical servers on which Directory Server or Directory Proxy Server are running

  • Hardware load balancer failures

  • Storage subsystem failures

  • Power supply failures

6.2.1.2 About Software Failure

You can categorize Directory server or proxy server failures as follows:

  • Slow response time

  • Write overload

    • Maximized file descriptors

    • Maximized file system

    • Poor storage configuration

    • Too many indexes

  • Read overload

  • Cache issues

  • CPU constraints

  • Replication issues

    • Out of sync

    • Replication propagation delay

    • Replication flow

    • Replication overload

  • Large wildcard searches

6.2.2 Understanding the Approach to Mitigate SPOFs

A SPOF is hardware or software component that could cause the entire system to become non viable and unusable if the component fails. Redundancy is a strategic approach to handle SPOF.

You can implement redundancy to ensure that failure of a single component does not cause an entire directory service to fail. Redundancy involves providing redundant software components, hardware components, or both. Examples of this strategy include deploying multiple, replicated instances of Directory Server on separate hosts and using redundant arrays of independent disks (RAID) for storage of Directory Server data. Redundancy with replicated Directory Servers is the most efficient way to achieve high availability.

6.3 Overview of Redundancy for High Availability

To ensure reliability and continued services for directory service, you must maintain a high level of system availability, with a seamless transition to redundant systems during a system failure.

Redundancy works for both Directory and proxy servers and allows you to mitigate:

  • Hardware failures, because the traffic can be redirected to another hardware component.

  • Software failures, when the failure cannot be reproduced systematically.

Redundancy handles failure in the following ways:

6.3.1 Understanding Redundancy at the Hardware Level

Hardware happens to be the most crucial SPOF. Hardware could be any part in the network that handles network traffic, controls the system, or manages authentication.

This section provides an overview of hardware redundancy.

Note:

Providing comprehensive information on this topic is beyond the scope of this book. However, there are many publications available that concern using hardware redundancy for high availability, such as "Blueprints for High Availability" published by John Wiley & Sons, Inc.

Failure at the network level can be mitigated by having redundant network components. When designing your deployment, consider having redundant components for the following:

  • Internet connection

  • Network interface card

  • Network cabling

  • Network switches

  • Gateways and routers

You can mitigate the hardware load balancer as an SPOF by including a redundant hardware load balancer in your architecture.

You can mitigate against SPOFs in the storage subsystem by using redundant server controllers. You can also use redundant cabling between controllers and storage subsystems, redundant storage subsystem controllers, or redundant arrays of independent disks.

If you have only one power supply, loss of this supply could make your entire service unavailable. To prevent this situation, consider providing redundant power supplies for hardware, where possible, and diversifying power sources. Additional methods of mitigating SPOFs in the power supply include using surge protectors, multiple power providers, and local battery backups, and emergency local power generators.

Failure of an entire data center can occur if, for example, a natural disaster strikes a particular geographic region. In this instance, a well-designed multiple data center replication topology can prevent a distributed directory service from becoming unavailable. See Sample Topologies Using Redundancy for High Availability.

6.3.2 Understanding Redundancy at Directory Server Level Using Replication

Replication is a common method used to implement redundancy in Oracle Unified Directory Servers. Replication provides a failover system to implement redundancy.

Redundant solutions are usually less expensive, easier to implement, and easier to manage than clustering solutions. In a clustering model, you often have to configure at least two servers to serve the same application workload, where one node is active while the other is passive, on standby.

Be aware that replication, as part of a redundant solution, has numerous functions other than availability. While the main advantage of replication is the ability to split the read across multiple servers, you must balance this advantage with the task to manage the additional servers. Replication also offers scalability on read operations and, with proper design, scalability on write operations, within certain limits. See Understanding the Oracle Unified Directory Replication Model.

The SPOFs described in About Software Failure can be mitigated by having redundant instances of Directory Server. This involves the use of replication. Replication ensures that the redundant servers remain synchronized, and that requests can be rerouted with no downtime.

Replication is used to prevent the loss of a single server from causing your directory service to become unavailable. A reliable replication topology ensures that the most recent data is available to clients across data centers, even when a server fails. At a minimum, your local directory tree must be replicated to at least one backup server. Directory architects recommend you to replicate three times per physical location for maximum data reliability. When the data is replicated at least thrice then, if a directory server failure occurs, the configuration remains highly available and protected. In deciding how much to use replication for fault tolerance, consider the quality of the hardware and networks used by your directory. Unreliable hardware requires more backup servers.

The Oracle Unified Directory replication model is a loosely consistent, multi-master model. In other words, all directory servers in a replicated topology can process both read and write operations. See Understanding the Oracle Unified Directory Replication Model.

Do not use replication as a replacement for a regular data backup policy. Replication is designed to maintain service within a given service level agreement. It is not designed to protect against incorrect data stored in the directory by applications or users. See Backing Up, Purging, and Restoring Data.

To maintain the ability to read data in the directory with the expected Service Level Agreement, a suitable load balancing strategy must be put in place. Both software and hardware load balancing solutions exist to distribute read load across multiple replicas. Each of these solutions can also determine the state of each replica and to manage its participation in the load balancing topology. The solutions might vary in terms of completeness and accuracy.

To maintain write failover over geographically distributed sites, you can use multiple data center replication over WAN. This entails setting up at least two master servers in each data center, and configuring the servers to be fully meshed over the WAN. This strategy prevents loss of service if any of the masters in the topology fail. Write operations must be routed to an alternative server if a writable server becomes unavailable.

6.3.3 Understanding the Use of Directory Proxy Server as Part of a Redundant Solution

You can use proxy servers to implement redundancy via several instances of proxy. This is yet another approach to provide highly available directory service.

Directory Proxy Server is designed to support high availability directory deployments. The proxy provides automatic load balancing as well as automatic failover and fail back among a set of replicated Directory Servers. Should one or more Directory Servers in the topology become unavailable, the load is proportionally redistributed among the remaining servers.

Directory Proxy Server actively monitors the Directory Servers to ensure that the servers are still online. The proxy also examines the status of each operation that is performed. Servers might not all be equivalent in throughput and performance. If a primary server becomes unavailable, traffic that is temporarily redirected to a secondary server is directed back to the primary server as soon as the primary server becomes available.

Note:

If you have distributed data, then you must manage multiple disconnected replication topologies, which makes administration more complex. In addition, Directory Proxy Server relies heavily on the proxy authorization control to manage user authorization. You must create a specific administrative user on each Directory Server that is involved in the distribution, and these administrative users must be granted proxy access control rights.

6.3.4 Understanding the Use of Application Isolation for High Availability

Directory Proxy Server can also be used to protect a replicated directory service from failure due to a faulty client application. To improve availability, a limited set of masters or replicas is assigned to each application.

Consider a scenario where a faulty application causes a server shutdown when the application performs a specific action. If the application fails over to each successive replica, a single problem with one application can result in failure of the entire replicated topology. To avoid such a scenario, you can restrict failover and load balancing of each application to a limited number of replicas. The potential failure is then limited to this set of replicas, and the impact of the failure on other applications is reduced.

6.3.5 Understanding How to Use the Replication Gateway for High Availability

The replication gateway is designed to provide a highly available deployment solution by allowing you to use redundant replication gateway servers for propagating changes made on disparate servers to the entire replication topology.

The replication gateway propagates changes between Oracle Directory Server Enterprise Edition and Oracle Unified Directory topologies. See Understanding the Role of the Replication Gateway.

6.4 Sample Topologies Using Redundancy for High Availability

When a failure occurs, sample topologies show redundancy and replication and provide continuous service.

For sample topologies that show how redundancy and replication can provide continued service when a failure occurs, see the following: