Sun OpenSSO Enterprise 8.0 Deployment Planning Guide

Previous: Chapter 16 Implementing Cross-Domain Single Sign-On with Cookie Hijacking Prevention
Next: Chapter 18 Using the Windows Desktop Single Sign-On Authentication Module

Chapter 17 Configuring System Failover and Session Failover for High Availability

This chapter provides information to help you architect your OpenSSO Enterprise deployment to achieve the highest levels of system and session availability. High availability ensures continuous service for your end-users, protects against loss of data due to interrupted user sessions, and increases transaction throughput for optimized system performance.

This chapter includes the following topics:

About High Availability

Two key high-availability elements in an OpenSSO Enterprise deployment are system failover and session failover. These two features help to ensure that no single point of failure exists in the deployment, and that OpenSSO Enterprise service is always available to end-users. You can also configure OpenSSO Enterprise sites to meet more complex business requirements.

System Failover

In this chapter, system failure refers to a hardware or process failure at the OpenSSO Enterprise server, at the Policy Agent, or at a load balancer. Hardware fails due to a mechanical problem or power outage. A web container application crashes causing OpenSSO Enterprise to become inaccessible. These are examples of system failure. Whenever possible, you should install redundant OpenSSO Enterprise servers, OpenSSO Policy Agents, and load balancers to serve as backups, or to fail over to, in the event of a system failure. This helps to ensure that no single point of failure exists in your deployment. Load balancers distribute the workload among OpenSSO Enterprise servers. If a Policy Agent fails, requests are redirected to another Policy Agent. If server hardware fails, requests are routed to other server hardware. Without system failover, a single hardware failure or process failure can cause OpenSSO Enterprise downtime.

Session Failover

Session failover ensures that session data remains accessible to OpenSSO Enterprise servers and OpenSSO Enterprise Policy Agents. Service requests are routed to a failover server, the user's session continues uninterrupted, and no user data is lost. The OpenSSO Enterprise Session Service maintains authenticated session states and continues processing new client requests subsequent to the failure. In most cases, without session failover, after system failure and subsequent service recovery, the user would have to re-authenticate.

Session failover is critical when end-users' transactions involve financial data or other sensitive information that is difficult to recover when a system failure occurs. With session failover, when a system failover occurs, the user's transaction can proceed uninterrupted. Session failover is less important if end-users are, for example, reading but not writing data.

OpenSSO Enterprise Sites

The most basic OpenSSO Enterprise site consists of two or more OpenSSO Enterprise servers and one or more load balancers. When you configure all the components in the site to work under a single site identifier, or name, all components in the site act as one unit. The load balancers in the site are associated with a site identifier. When a component such as a Policy Agent accesses a site, it communicates through the load balancer associated with that site, instead of directly accessing individual OpenSSO Enterprise servers in the site. All the client requests are passed through the load balancer to the OpenSSO Enterprise servers located behind a firewall. Individual OpenSSO Enterprise servers are never directly exposed to entities outside the firewall. The only client that can access the OpenSSO Enterprise servers is a load balancer.

Single-Site Configuration

A single site configuration usually includes two or more OpenSSO servers which are centrally managed and configured under a single site identifier. The single-site configuration is typically used when the OpenSSO Enterprise servers are managed as a single operational unit such as in a LAN environment.

Multiple-Site configuration

In a multiple-site configuration, two or more OpenSSO Enterprise servers are configured in each site. A multiple-site configuration is useful when you need to centrally manage OpenSSO Enterprise servers located in distant geographical locations. Multiple-site configuration is usually used in WAN environments, or where sites are managed as separate operational units within a LAN environment. Each site can have one or more load balancers.

While system failover can be configured among all sites in the deployment, session failover is possible only within each site. WAN environments are subject to speed, network latency, firewall, and bandwidth issues. For these reasons, OpenSSO Enterprise session failover is not supported across multiple sites within a LAN or WAN environment.

The following are typical reasons to use a multiple-site configuration:

Close proximity of OpenSSO Enterprise servers in a LAN environment
Underlying network infrastructure limitations exist.
Operational domains are managed as independent units.
OpenSSO Enterprise servers span across network boundaries as in the case of a WAN environment.

Analyzing the Deployment Architecture

Figure 17–1 illustrates the components you need for basic system failover and session failover in an OpenSSO Enterprise deployment. Key components in this high availability deployment are:

Multiple OpenSSO Enterprise Policy Agents serve as backups when system failure occurs.
A single load balancer distributes the workload among multiple OpenSSO Enterprise Policy Agents. This increases transaction throughput, and ensures failover when a system failure occurs.
Multiple OpenSSO Enterprise servers with respective embedded Directory Servers act as backups when system failure occurs. Embedded Directory Servers ensure that replicated configuration data is always available even during system failure.
Multiple load balancers distribute the workload among multiple OpenSSO Enterprise servers. This increases transaction throughput, and ensures failover when system failure occurs. Additionally, the Policy Agents can be configured to failover among OpenSSO Enterprise server load balancers when system failure occurs.
When OpenSSO Enterprise is configured for session failover, a Java Message Queue Broker Cluster replicates session data and stores it in the Berkeley Database. When a system failure occurs, the replicated session data is made available to Policy Agents so that the end-user does not lose data and does not have to re-authenticate after system recovery.
Multiple Berkeley Databases are used to store session data, and are configured for session failover. If one Berkeley Database fails, the working Berkeley Database can provide session data to the OpenSSO Enterprise servers for session validation.

In all examples in this chapter, load balancers represent the only access points to OpenSSO Enterprise servers. An access point can be any hardware or software that acts as a load balancer, and is associated with a site, that is installed in front of OpenSSO Enterprise servers. Policy Agents interact with OpenSSO Enterprise servers through these access points.

The following figure illustrates the components required for basic system failover and session failover in a single-site deployment.

Figure 17–1 Basic OpenSSO Enterprise High Availability Deployment Architecture

See previous section for text description.

Understanding a Typical High-Availability Transaction

In any transaction, OpenSSO Enterprise must determine three things:

Is a valid user session token present?
Is the user authenticated?
Is the user authorized?

At any time during the transaction, if the OpenSSO Enterprise server or the OpenSSO Enterprise Policy Agent is unable to access the information required to determine these three things, then system failover or session failover may occur.

Figure 17–2 illustrates the first part of a typical high-availability process flow. In the figure, a user attempts to access a protected resource and is successfully authenticated. No system failover or session failover occurs in this first transaction.

The second part of the process flow describes how sessions are handled during subsequent requests by the same user. This second part of the process flow is influenced by two factors:

How OpenSSO Enterprise is configured for high availability
Availability of load balancers and servers

The following figure illustrates a user's first request in a typical high-availability transaction. Process flows for subsequent requests by the same user are presented in detail, and discussed along with their respective configuration examples, in the following sections.

Figure 17–2 Process Flow for High Availability (part 1)

Text-based. Needs no further explanation.

Understanding High Availability Configuration Examples

Businesses use various combinations of single or multiple OpenSSO Enterprise servers and load balancers, in single or multiple sites, to achieve system failover and session failover. The following examples illustrate typical high-availability configurations and their respective process flows:

The following table summarizes the OpenSSO Enterprise features associated with each configuration example.

Figure 17–3 Comparison of High Availability Configuration Examples

Single OpenSSO Enterprise Server Load Balancer in Single Site, No Session Failover

This is the most basic high-availability configuration. The single OpenSSO Enterprise server load balancer increases transaction throughput. When one OpenSSO Enterprise server is inaccessible, requests are automatically routed to other servers. However, the single load balancer can be a single point of failure. When this load balancer is inaccessible, no OpenSSO Enterprise services or session data are available to the Policy Agents.

Figure 17–4 Single OpenSSO Enterprise Server Load Balancer in a Single Site Configuration

See following figure for text-based description.

The following figure illustrates the session handling part of the process flow. See Figure 17–2 for a detailed illustration of steps 1 through 13.

Figure 17–5 Process Flow for Single OpenSSO Enterprise Server Load Balancer in a Single Site, No Session Failover

Text-based. No further explanation is necessary.

Multiple OpenSSO Enterprise Server Load Balancers in a Single Site, No Session Failover

The following figure illustrates a deployment with multiple OpenSSO Enterprise server load balancers in front of redundant OpenSSO Enterprise servers. In this example, both OpenSSO Enterprise server load balancers are specified in each Policy Agent bootstrap configuration. The load balancers are also configured as login URL's in each Policy Agent configuration. Policy Agent configuration can reside on the same host as the Policy Agent, or can reside in the OpenSSO Enterprise embedded configuration data store. Regardless of where the configuration is hosted, when one OpenSSO Enterprise server load balancer is inaccessible, all requests are automatically routed to the other load balancer.

Figure 17–6 Multiple OpenSSO Enterprise Server Load Balancers in a Single Site, No Session Failover

The following figure illustrates the session handling part of the process flow. See Figure 17–2 for a detailed illustration of steps 1 through 13.

Figure 17–7 Process Flow for Multiple OpenSSO Enterprise Server Load Balancers in a Single Site, No Session Failover

Multiple OpenSSO Enterprise Server Load Balancers in Multiple Sites, No Session Failover

This deployment is useful if you want to logically group redundant OpenSSO Enterprise servers in a LAN or WAN environment. For example, you can configure redundant OpenSSO Enterprise servers to work as a single unit under a single site identifier. The redundant OpenSSO Enterprise servers provide one level of system failover. When you deploy multiple sites this way, the OpenSSO Enterprise servers in one site are logically isolated from the OpenSSO Enterprise servers in other sites.

In this example, both OpenSSO Enterprise server load balancers are specified in each Policy Agent bootstrap configuration. The load balancers are also configured as login URL's in each Policy Agent configuration. Policy Agent configuration can reside on the same host as the Policy Agent, or can reside in the OpenSSO Enterprise embedded configuration data store. When system failure occurs at the load balancer, one site fails over to another site.

The following figure illustrates minimum components required for a multiple-site configuration.

Figure 17–8 Multiple OpenSSO Enterprise Load Balancers in Multiple Sites, No Session Failover

The following figure illustrates the session handling part of the process flow. See Figure 17–2 for a detailed illustration of steps 1 through 13.

Figure 17–9 Process Flow for Multiple OpenSSO Enterprise Server Load Balancers in Multiple Sites, No Session Failover

Text-based. No further explanation necessary.

Single OpenSSO Enterprise Server Load Balancer in a Single Site with Session Failover

When you configure OpenSSO Enterprise for session failover, the user's authenticated session state is stored in the Berkeley Database in the event of a single hardware or software failure. In session failover deployments, you configure the OpenSSO Enterprise servers to communicate with Message Queue brokers which manage session state persistence in the Berkeley Database. This configuration enables the users session to fail over to a backup OpenSSO Enterprise server without losing any session state information. The user does not have to login again. The backup OpenSSO Enterprise server is determined among the available servers in the configuration list by an internal algorithm.

This type of deployment ensures the state availability even if one of the OpenSSO Enterprise servers is inaccessible due to scheduled maintenance, hardware failure, or software failure. However, the single load balancer can be a single point of failure. When this load balancer is inaccessible, no OpenSSO Enterprise services or session data are available to the Policy Agents.

The following figure illustrates the components in a basic OpenSSO Enterprise deployment using session failover.

Figure 17–10 Single OpenSSO Enterprise Server Load Balancer in a Single Site with Session Failover

The following figure illustrates the session handling part of the process flow. See Figure 17–2 for a detailed illustration of steps 1 through 13.

Figure 17–11 Single OpenSSO Enterprise Server Load Balancer in a Single Site with Session Failover

Multiple OpenSSO Enterprise Server Load Balancers in a Single Site with Session Failover

This deployment is very similar to Single OpenSSO Enterprise Server Load Balancer in a Single Site with Session Failover , but with two important differences. In this deployment multiple OpenSSO Enterprise server load balancers exist. Additionally, the OpenSSO Enterprise server load balancers are specified in each Policy Agent bootstrap configuration. This deployment provides load balancer failover to ensure continuous service when system failure occurs. When system failure occurs at the load balancer, one site fails over to another site.

The load balancers are also configured as login URL's in each Policy Agent configuration. Policy Agent configuration can reside on the same host as the Policy Agent, or can reside in the OpenSSO Enterprise embedded configuration data store.

The following figure illustrates a deployment with multiple OpenSSO Enterprise server load balancers with session failover.

Figure 17–12 Multiple OpenSSO Enterprise Server Load Balancers in a Single Site with Session Failover

The following figure illustrates the session handling part of the process flow. See Figure 17–2 for a detailed illustration of steps 1 through 13.

Figure 17–13 Multiple OpenSSO Enterprise Server Load Balancers in a Single Site with Session Failover

Multiple OpenSSO Enterprise Server Load Balancers in Multiple Sites with Session Failover

This deployment is useful if you want to logically group redundant OpenSSO Enterprise servers in a LAN or WAN environment. For example, you can configure redundant OpenSSO Enterprise servers to work as a single unit under a single site identifier. Redundant OpenSSO Enterprise servers provide one level of system failover. When you deploy multiple sites this way, the OpenSSO Enterprise servers in one site are logically isolated from the OpenSSO Enterprise servers in other sites.

For an added level of system failover, you can configure one site to fail over to another site. In this example, both OpenSSO Enterprise server load balancers are specified in each Policy Agent bootstrap configuration. The load balancers are also configured as login URL's in each Policy Agent configuration. Policy Agent configuration can reside on the same host as the Policy Agent, or can reside in the OpenSSO Enterprise embedded configuration data store. When system failure occurs at the load balancer, one site fails over to another site.

This deployment ensures both system failover and session failover if one of the OpenSSO Enterprise load balancers or one of the OpenSSO Enterprise servers is inaccessible for any reason. The following issues are addressed in this deployment:

Logical grouping of OpenSSO servers can be achieved across distant geographic locations within a WAN environment or locally within a LAN environment.
The Message Queue broker and Berkeley Database provide the means for session failover.
Session failover is not supported among multiple sites.
The user's authenticated session state is maintained in the event of a single hardware or software failure. This allows the user session to fail over to a backup OpenSSO Enterprise server without losing session information. If system failure occurs within the site, the user does not have to log in again.
The backup OpenSSO Enterprise server is determined by an internal algorithm. The internal algorithm selects from the server configuration list one of the available servers from same site.
System failover works among OpenSSO Enterprise servers in different sites.

The following figure illustrates a complex high availability deployment using both system failover and session failover in multiple sites.

Figure 17–14 Multiple OpenSSO Enterprise Server Load Balancer in Multiple Sites with Session Failover

The following figure illustrates the session handling part of the process flow. See Figure 17–2 for a detailed illustration of steps 1 through 13.

Figure 17–15 Multiple OpenSSO Enterprise Server Load Balancers with Session Failover in Each Site

Considering Assumptions and Dependencies

As you plan your deployment, consider the following assumptions to determine if your environment is appropriate for using system failover and session failover.

Assumptions

Redundant OpenSSO Enterprise servers and Policy Agents are installed for basic load-balancing. Additionally, you must manually configure each instance for system failover and session failover.
All OpenSSO Enterprise servers must share the same configuration data. This can be achieved by setting up configuration data replication among multiple instances of OpenDS, or by configuring each OpenSSO Enterprise server to point to the same instance of Sun Directory Server.
You can configure system failover at either the OpenSSO Enterprise Policy Agent or at the OpenSSO Enterprise Client SDK.
When configuring session failover, you must deploy Java Message Queue and the Berkeley Database on a machine other than the one hosting the OpenSSO Enterprise servers. You can configure a single Message Queue with a single Berkeley Database, or you can configure multiple instance of both.

Using Java Message Queue Broker and Berkeley Database for Session Failover

If you configure a single instance of Java Message Queue Broker and as single instance of Berkeley Database to provide session failover for your deployment, no session data replication is possible. If either Message Queue Broker or Berkeley Database fails, then all the stored user sessions are lost. The OpenSSO Enterprise server would operate as if session failover was not configured.

A good practice is to use two instances of Message Queue Broker configured with two instances of Berkeley Database. User sessions are replicated among the Berkeley Database instances. This dual-host configuration is for failover purposes and not for load sharing. Adding more Message Queue Broker instances and Berkeley Database instances does not increase processing capacity. Adding more instances actually reduces the overall session failover processing capacity due to the extra data replication overhead.

The Java Message Queue Broker and Berkeley Database pair should be configured in an active-standby mode so that at any given time only one of the pair is up and running.
The Java Message Queue Broker and Berkeley Database pair on the backup host is be used only for failover purposes.
When the primary Java Message Queue Broker and Berkeley Database pair fail, other pair on the backup host can be started to provide uninterrupted session service.

Configuring OpenSSO Enterprise for High Availability

A good source of high-availability configuration information is the manual Deployment Example: Single Sign-On, Load Balancing and Failover Using Sun OpenSSO Enterprise 8.0. In particular, the following chapters provide examples with detailed step-by-step instructions for configuring load balancers, OpenSSO Enterprise sites, and OpenSSO Enterprise for system failover and session failover.

The following are additional resources for configuring system failover and session failover:

Evaluating Benefits and Trade-Offs

Benefits

System failover provides continuous OpenSSO Enterprise service when hardware or software fails.
Session failover ensures uninterrupted transactions and no user data loss during system failure.
In most cases, user does not have to re-authenticate after system recovery.
Increased transaction throughput through load sharing.
Increased security because Policy Agents never interact directly with OpenSSO Enterprise servers.

Trade-Offs

Slight impact to performance when OpenSSO Enterprise is configured for session failover.
Firewall must be open between OpenSSO Enterprise communicating components.