6 Understanding Oracle Unified Directory High Availability Deployments

As more and more businesses and mission-critical applications connect with identities being centrally managed, it has become imperative to have LDAP service available all the time. High availability in conjunction with performance has become the distinguishing feature of all extranet and enterprise deployments.

This chapter contains the following topics:

6.1 What is High Availability?

High availability is a system design approach and its associated implementation that ensures an agreed level of operational performance during a given measurement period for your directory service.

Agreed service levels vary from one organization to another. Service levels also depend on several factors such as the time of day systems are accessed, whether or not systems can be brought down for maintenance, and the cost of downtime to the organization. Failure or downtime in this context, is defined as periods when a system is unavailable and prevents from providing the agreed level of service.

Oracle Unified Directory provides elaborate cost-effective and easy-to-use high availability features, which eliminate the downtime and maximize the time when the system is available.

6.2 Availability and Single Points of Failure

Oracle Unified Directory deployments that provide highly available service can recover from failures and maintain service within agreed level of service. With a high availability deployment, component failures might impact individual directory queries but does not result into a complete system failure.

A single point of failure (SPOF) is a system component, which on failure renders an entire system unavailable or unreliable. When you design a highly available deployment, you identify potential SPOFs and investigate how to mitigate these SPOFs.

This section contains the following topics:

6.2.1 Types of SPOFs

SPOFs can be divided into three categories:

6.2.1.1 Hardware Failure

You can broadly categorize the hardware SPOFs as follows:

  • Network failures

  • Failure of the physical servers on which Directory Server or Directory Proxy Server are running

  • Hardware load balancer failures

  • Storage subsystem failures

  • Power supply failures

6.2.1.2 Software Failure

Directory server or proxy server failure can be categorized as follows:

  • Slow response time

  • Write overload

    • Maximized file descriptors

    • Maximized file system

    • Poor storage configuration

    • Too many indexes

  • Read overload

  • Cache issues

  • CPU constraints

  • Replication issues

    • Out of sync

    • Replication propagation delay

    • Replication flow

    • Replication overload

  • Large wildcard searches

6.2.2 Common Approach to Mitigate SPOFs - Redundancy

You can implement redundancy to ensure that failure of a single component does not cause an entire directory service to fail. Redundancy involves providing redundant software components, hardware components, or both. Examples of this strategy include deploying multiple, replicated instances of Directory Server on separate hosts and/or using redundant arrays of independent disks (RAID) for storage of Directory Server data. Redundancy with replicated Directory Servers is the most efficient way to achieve high availability.

6.3 Using Redundancy for High Availability

To ensure reliability and continued services for directory service, you must maintain a high level of system availability, with a seamless transition to redundant systems during a system failure.

Redundancy works for both Directory and proxy servers and allows you to mitigate:

  • Hardware failures, because the traffic can be redirected to another hardware component.

  • Software failures, when the failure cannot be reproduced systematically.

Redundancy handles failure in the following ways:

6.3.1 Redundancy at the Hardware Level

This section provides an overview of hardware redundancy. Many publications provide comprehensive information about using hardware redundancy for high availability. You must specifically, see "Blueprints for High Availability" published by John Wiley & Sons, Inc.

Failure at the network level can be mitigated by having redundant network components. When designing your deployment, consider having redundant components for the following:

  • Internet connection

  • Network interface card

  • Network cabling

  • Network switches

  • Gateways and routers

You can mitigate the hardware load balancer as an SPOF by including a redundant hardware load balancer in your architecture.

You can mitigate against SPOFs in the storage subsystem by using redundant server controllers. You can also use redundant cabling between controllers and storage subsystems, redundant storage subsystem controllers, or redundant arrays of independent disks.

If you have only one power supply, loss of this supply could make your entire service unavailable. To prevent this situation, consider providing redundant power supplies for hardware, where possible, and diversifying power sources. Additional methods of mitigating SPOFs in the power supply include using surge protectors, multiple power providers, and local battery backups, and emergency local power generators.

Failure of an entire data center can occur if, for example, a natural disaster strikes a particular geographic region. In this instance, a well-designed multiple data center replication topology can prevent a distributed directory service from becoming unavailable. For more information, see Section 6.4, "Sample Topologies Using Redundancy for High Availability."

6.3.2 Redundancy at Directory Server Level Using Replication

A common method to implement redundancy in Oracle Unified Directory Servers is to use Replication. Redundant solutions are usually less expensive, easier to implement, and easier to manage than clustering solutions. This is because in clustering model you often have to configure at least two servers to serve the same application workload, one node is active while the other is passive, on standby. Note that replication, as part of a redundant solution, has numerous functions other than availability. While the main advantage of replication is the ability to split the read across multiple servers, this advantage needs to be balanced with the task to manage the additional server. Replication also offers scalability on read operations and, with proper design, scalability on write operations, within certain limits. For an overview of replication concepts, see Chapter 7, "Understanding the Oracle Unified Directory Replication Model."

The SPOFs described in Section 6.2.1.2, "Software Failure" can be mitigated by having redundant instances of Directory Server. This involves the use of replication. Replication ensures that the redundant servers remain synchronized, and that requests can be rerouted with no downtime.

Replication is used to prevent the loss of a single server from causing your directory service to become unavailable. A reliable replication topology ensures that the most recent data is available to clients across data centers, even in the case of a server failure. At a minimum, your local directory tree needs to be replicated to at least one backup server. Directory architects recommend you to replicate three times per physical location for maximum data reliability. When the data is replicated at least thrice, then in the event of a failure of a directory server the configuration remains highly available and protected against failure. In deciding how much to use replication for fault tolerance, consider the quality of the hardware and networks used by your directory. Unreliable hardware requires more backup servers.

The Oracle Unified Directory replication model is a loosely consistent, multi-master model. In other words, all directory servers in a replicated topology can process both read and write operations. For more information about replication, see Chapter 7, "Understanding the Oracle Unified Directory Replication Model."

Do not use replication as a replacement for a regular data backup policy. Replication is designed to maintain service within a given service level agreement. It is not designed to protect against incorrect data stored in the directory by applications or users. For information about backing up directory data, see Section 20.3, "Backing Up and Restoring Data."

To maintain the ability to read data in the directory with the expected Service Level Agreement, a suitable load balancing strategy must be put in place. Both software and hardware load balancing solutions exist to distribute read load across multiple replicas. Each of these solutions can also determine the state of each replica and to manage its participation in the load balancing topology. The solutions might vary in terms of completeness and accuracy.

To maintain write failover over geographically distributed sites, you can use multiple data center replication over WAN. This entails setting up at least two master servers in each data center, and configuring the servers to be fully meshed over the WAN. This strategy prevents loss of service if any of the masters in the topology fail. Write operations must be routed to an alternative server if a writable server becomes unavailable.

6.3.3 Using Directory Proxy Server as Part of a Redundant Solution

Directory Proxy Server is designed to support high availability directory deployments. The proxy provides automatic load balancing as well as automatic failover and fail back among a set of replicated Directory Servers. Should one or more Directory Servers in the topology become unavailable, the load is proportionally redistributed among the remaining servers.

Proxy servers can also be made redundant by using several instances of proxy. Yet another approach to provide highly available directory service.

Directory Proxy Server actively monitors the Directory Servers to ensure that the servers are still online. The proxy also examines the status of each operation that is performed. Servers might not all be equivalent in throughput and performance. If a primary server becomes unavailable, traffic that is temporarily redirected to a secondary server is directed back to the primary server as soon as the primary server becomes available.

Note that when data is distributed, multiple disconnected replication topologies must be managed, which makes administration more complex. In addition, Directory Proxy Server relies heavily on the proxy authorization control to manage user authorization. A specific administrative user must be created on each Directory Server that is involved in the distribution. These administrative users must be granted proxy access control rights.

6.3.4 Using Application Isolation for High Availability

Directory Proxy Server can also be used to protect a replicated directory service from failure due to a faulty client application. To improve availability, a limited set of masters or replicas is assigned to each application.

Suppose a faulty application causes a server shutdown when the application performs a specific action. If the application fails over to each successive replica, a single problem with one application can result in failure of the entire replicated topology. To avoid such a scenario, you can restrict failover and load balancing of each application to a limited number of replicas. The potential failure is then limited to this set of replicas, and the impact of the failure on other applications is reduced.

6.3.5 Using Replication Gateway for High Availability

The replication gateway propagates changes between Oracle Directory Server Enterprise Edition and Oracle Unified Directory topologies. The replication gateway is designed to provide a highly available deployment solution by allowing you to use redundant replication gateway servers for propagating changes made on disparate servers to the entire replication topology. For more information about replication gateway, see Section 1.4.2, "The Role of the Replication Gateway."

6.4 Sample Topologies Using Redundancy for High Availability

The following sample topologies show how redundancy and replication is used to provide continued service in the event of failure: