Sun OpenSSO Enterprise 8.0 Deployment Planning Guide

Chapter 17 Configuring System Failover and Session Failover for High Availability

This chapter provides information to help you architect your OpenSSO Enterprise deployment to achieve the highest levels of system and session availability. High availability ensures continuous service for your end-users, protects against loss of data due to interrupted user sessions, and increases transaction throughput for optimized system performance.

This chapter includes the following topics:

About High Availability

Two key high-availability elements in an OpenSSO Enterprise deployment are system failover and session failover. These two features help to ensure that no single point of failure exists in the deployment, and that OpenSSO Enterprise service is always available to end-users. You can also configure OpenSSO Enterprise sites to meet more complex business requirements.

System Failover

In this chapter, system failure refers to a hardware or process failure at the OpenSSO Enterprise server, at the Policy Agent, or at a load balancer. Hardware fails due to a mechanical problem or power outage. A web container application crashes causing OpenSSO Enterprise to become inaccessible. These are examples of system failure. Whenever possible, you should install redundant OpenSSO Enterprise servers, OpenSSO Policy Agents, and load balancers to serve as backups, or to fail over to, in the event of a system failure. This helps to ensure that no single point of failure exists in your deployment. Load balancers distribute the workload among OpenSSO Enterprise servers. If a Policy Agent fails, requests are redirected to another Policy Agent. If server hardware fails, requests are routed to other server hardware. Without system failover, a single hardware failure or process failure can cause OpenSSO Enterprise downtime.

Session Failover

Session failover ensures that session data remains accessible to OpenSSO Enterprise servers and OpenSSO Enterprise Policy Agents. Service requests are routed to a failover server, the user's session continues uninterrupted, and no user data is lost. The OpenSSO Enterprise Session Service maintains authenticated session states and continues processing new client requests subsequent to the failure. In most cases, without session failover, after system failure and subsequent service recovery, the user would have to re-authenticate.

Session failover is critical when end-users' transactions involve financial data or other sensitive information that is difficult to recover when a system failure occurs. With session failover, when a system failover occurs, the user's transaction can proceed uninterrupted. Session failover is less important if end-users are, for example, reading but not writing data.

OpenSSO Enterprise Sites

The most basic OpenSSO Enterprise site consists of two or more OpenSSO Enterprise servers and one or more load balancers. When you configure all the components in the site to work under a single site identifier, or name, all components in the site act as one unit. The load balancers in the site are associated with a site identifier. When a component such as a Policy Agent accesses a site, it communicates through the load balancer associated with that site, instead of directly accessing individual OpenSSO Enterprise servers in the site. All the client requests are passed through the load balancer to the OpenSSO Enterprise servers located behind a firewall. Individual OpenSSO Enterprise servers are never directly exposed to entities outside the firewall. The only client that can access the OpenSSO Enterprise servers is a load balancer.

Single-Site Configuration

A single site configuration usually includes two or more OpenSSO servers which are centrally managed and configured under a single site identifier. The single-site configuration is typically used when the OpenSSO Enterprise servers are managed as a single operational unit such as in a LAN environment.

Multiple-Site configuration

In a multiple-site configuration, two or more OpenSSO Enterprise servers are configured in each site. A multiple-site configuration is useful when you need to centrally manage OpenSSO Enterprise servers located in distant geographical locations. Multiple-site configuration is usually used in WAN environments, or where sites are managed as separate operational units within a LAN environment. Each site can have one or more load balancers.

While system failover can be configured among all sites in the deployment, session failover is possible only within each site. WAN environments are subject to speed, network latency, firewall, and bandwidth issues. For these reasons, OpenSSO Enterprise session failover is not supported across multiple sites within a LAN or WAN environment.

The following are typical reasons to use a multiple-site configuration:

Analyzing the Deployment Architecture

Figure 17–1 illustrates the components you need for basic system failover and session failover in an OpenSSO Enterprise deployment. Key components in this high availability deployment are:

In all examples in this chapter, load balancers represent the only access points to OpenSSO Enterprise servers. An access point can be any hardware or software that acts as a load balancer, and is associated with a site, that is installed in front of OpenSSO Enterprise servers. Policy Agents interact with OpenSSO Enterprise servers through these access points.

The following figure illustrates the components required for basic system failover and session failover in a single-site deployment.

Figure 17–1 Basic OpenSSO Enterprise High Availability Deployment Architecture

See previous section for text description.

Understanding a Typical High-Availability Transaction

In any transaction, OpenSSO Enterprise must determine three things:

  1. Is a valid user session token present?

  2. Is the user authenticated?

  3. Is the user authorized?

At any time during the transaction, if the OpenSSO Enterprise server or the OpenSSO Enterprise Policy Agent is unable to access the information required to determine these three things, then system failover or session failover may occur.

Figure 17–2 illustrates the first part of a typical high-availability process flow. In the figure, a user attempts to access a protected resource and is successfully authenticated. No system failover or session failover occurs in this first transaction.

The second part of the process flow describes how sessions are handled during subsequent requests by the same user. This second part of the process flow is influenced by two factors:

The following figure illustrates a user's first request in a typical high-availability transaction. Process flows for subsequent requests by the same user are presented in detail, and discussed along with their respective configuration examples, in the following sections.

Figure 17–2 Process Flow for High Availability (part 1)

Text-based. Needs no further explanation.

Understanding High Availability Configuration Examples

Businesses use various combinations of single or multiple OpenSSO Enterprise servers and load balancers, in single or multiple sites, to achieve system failover and session failover. The following examples illustrate typical high-availability configurations and their respective process flows:

The following table summarizes the OpenSSO Enterprise features associated with each configuration example.

Figure 17–3 Comparison of High Availability Configuration Examples

Text-based. Needs no further explanation.

Single OpenSSO Enterprise Server Load Balancer in Single Site, No Session Failover

This is the most basic high-availability configuration. The single OpenSSO Enterprise server load balancer increases transaction throughput. When one OpenSSO Enterprise server is inaccessible, requests are automatically routed to other servers. However, the single load balancer can be a single point of failure. When this load balancer is inaccessible, no OpenSSO Enterprise services or session data are available to the Policy Agents.

Figure 17–4 Single OpenSSO Enterprise Server Load Balancer in a Single Site Configuration

See following figure for text-based description.

The following figure illustrates the session handling part of the process flow. See Figure 17–2 for a detailed illustration of steps 1 through 13.

Figure 17–5 Process Flow for Single OpenSSO Enterprise Server Load Balancer in a Single Site, No Session Failover

Text-based. No further explanation is necessary.

Multiple OpenSSO Enterprise Server Load Balancers in a Single Site, No Session Failover

The following figure illustrates a deployment with multiple OpenSSO Enterprise server load balancers in front of redundant OpenSSO Enterprise servers. In this example, both OpenSSO Enterprise server load balancers are specified in each Policy Agent bootstrap configuration. The load balancers are also configured as login URL's in each Policy Agent configuration. Policy Agent configuration can reside on the same host as the Policy Agent, or can reside in the OpenSSO Enterprise embedded configuration data store. Regardless of where the configuration is hosted, when one OpenSSO Enterprise server load balancer is inaccessible, all requests are automatically routed to the other load balancer.

Figure 17–6 Multiple OpenSSO Enterprise Server Load Balancers in a Single Site, No Session Failover

See following figure for text-based description.

The following figure illustrates the session handling part of the process flow. See Figure 17–2 for a detailed illustration of steps 1 through 13.

Figure 17–7 Process Flow for Multiple OpenSSO Enterprise Server Load Balancers in a Single Site, No Session Failover

Text-based. Needs no further explanation.

Multiple OpenSSO Enterprise Server Load Balancers in Multiple Sites, No Session Failover

This deployment is useful if you want to logically group redundant OpenSSO Enterprise servers in a LAN or WAN environment. For example, you can configure redundant OpenSSO Enterprise servers to work as a single unit under a single site identifier. The redundant OpenSSO Enterprise servers provide one level of system failover. When you deploy multiple sites this way, the OpenSSO Enterprise servers in one site are logically isolated from the OpenSSO Enterprise servers in other sites.

In this example, both OpenSSO Enterprise server load balancers are specified in each Policy Agent bootstrap configuration. The load balancers are also configured as login URL's in each Policy Agent configuration. Policy Agent configuration can reside on the same host as the Policy Agent, or can reside in the OpenSSO Enterprise embedded configuration data store. When system failure occurs at the load balancer, one site fails over to another site.

The following figure illustrates minimum components required for a multiple-site configuration.

Figure 17–8 Multiple OpenSSO Enterprise Load Balancers in Multiple Sites, No Session Failover

See following figure for text-based description.

The following figure illustrates the session handling part of the process flow. See Figure 17–2 for a detailed illustration of steps 1 through 13.

Figure 17–9 Process Flow for Multiple OpenSSO Enterprise Server Load Balancers in Multiple Sites, No Session Failover

Text-based. No further explanation necessary.

Single OpenSSO Enterprise Server Load Balancer in a Single Site with Session Failover

When you configure OpenSSO Enterprise for session failover, the user's authenticated session state is stored in the Berkeley Database in the event of a single hardware or software failure. In session failover deployments, you configure the OpenSSO Enterprise servers to communicate with Message Queue brokers which manage session state persistence in the Berkeley Database. This configuration enables the users session to fail over to a backup OpenSSO Enterprise server without losing any session state information. The user does not have to login again. The backup OpenSSO Enterprise server is determined among the available servers in the configuration list by an internal algorithm.

This type of deployment ensures the state availability even if one of the OpenSSO Enterprise servers is inaccessible due to scheduled maintenance, hardware failure, or software failure. However, the single load balancer can be a single point of failure. When this load balancer is inaccessible, no OpenSSO Enterprise services or session data are available to the Policy Agents.

The following figure illustrates the components in a basic OpenSSO Enterprise deployment using session failover.

Figure 17–10 Single OpenSSO Enterprise Server Load Balancer in a Single Site with Session Failover

See following figure for text-based description.

The following figure illustrates the session handling part of the process flow. See Figure 17–2 for a detailed illustration of steps 1 through 13.

Figure 17–11 Single OpenSSO Enterprise Server Load Balancer in a Single Site with Session Failover

Text-based. No further explanation necessary.

Multiple OpenSSO Enterprise Server Load Balancers in a Single Site with Session Failover

This deployment is very similar to Single OpenSSO Enterprise Server Load Balancer in a Single Site with Session Failover , but with two important differences. In this deployment multiple OpenSSO Enterprise server load balancers exist. Additionally, the OpenSSO Enterprise server load balancers are specified in each Policy Agent bootstrap configuration. This deployment provides load balancer failover to ensure continuous service when system failure occurs. When system failure occurs at the load balancer, one site fails over to another site.

The load balancers are also configured as login URL's in each Policy Agent configuration. Policy Agent configuration can reside on the same host as the Policy Agent, or can reside in the OpenSSO Enterprise embedded configuration data store.

The following figure illustrates a deployment with multiple OpenSSO Enterprise server load balancers with session failover.

Figure 17–12 Multiple OpenSSO Enterprise Server Load Balancers in a Single Site with Session Failover

See following figure for text-based description.

The following figure illustrates the session handling part of the process flow. See Figure 17–2 for a detailed illustration of steps 1 through 13.

Figure 17–13 Multiple OpenSSO Enterprise Server Load Balancers in a Single Site with Session Failover

Text-based. No further explanation necessary.

Multiple OpenSSO Enterprise Server Load Balancers in Multiple Sites with Session Failover

This deployment is useful if you want to logically group redundant OpenSSO Enterprise servers in a LAN or WAN environment. For example, you can configure redundant OpenSSO Enterprise servers to work as a single unit under a single site identifier. Redundant OpenSSO Enterprise servers provide one level of system failover. When you deploy multiple sites this way, the OpenSSO Enterprise servers in one site are logically isolated from the OpenSSO Enterprise servers in other sites.

For an added level of system failover, you can configure one site to fail over to another site. In this example, both OpenSSO Enterprise server load balancers are specified in each Policy Agent bootstrap configuration. The load balancers are also configured as login URL's in each Policy Agent configuration. Policy Agent configuration can reside on the same host as the Policy Agent, or can reside in the OpenSSO Enterprise embedded configuration data store. When system failure occurs at the load balancer, one site fails over to another site.

This deployment ensures both system failover and session failover if one of the OpenSSO Enterprise load balancers or one of the OpenSSO Enterprise servers is inaccessible for any reason. The following issues are addressed in this deployment:

The following figure illustrates a complex high availability deployment using both system failover and session failover in multiple sites.

Figure 17–14 Multiple OpenSSO Enterprise Server Load Balancer in Multiple Sites with Session Failover

See following figure for text-based description.

The following figure illustrates the session handling part of the process flow. See Figure 17–2 for a detailed illustration of steps 1 through 13.

Figure 17–15 Multiple OpenSSO Enterprise Server Load Balancers with Session Failover in Each Site

Text-based. No further explanation necessary.

Considering Assumptions and Dependencies

As you plan your deployment, consider the following assumptions to determine if your environment is appropriate for using system failover and session failover.

Assumptions

Using Java Message Queue Broker and Berkeley Database for Session Failover

If you configure a single instance of Java Message Queue Broker and as single instance of Berkeley Database to provide session failover for your deployment, no session data replication is possible. If either Message Queue Broker or Berkeley Database fails, then all the stored user sessions are lost. The OpenSSO Enterprise server would operate as if session failover was not configured.

A good practice is to use two instances of Message Queue Broker configured with two instances of Berkeley Database. User sessions are replicated among the Berkeley Database instances. This dual-host configuration is for failover purposes and not for load sharing. Adding more Message Queue Broker instances and Berkeley Database instances does not increase processing capacity. Adding more instances actually reduces the overall session failover processing capacity due to the extra data replication overhead.

Configuring OpenSSO Enterprise for High Availability

A good source of high-availability configuration information is the manual Deployment Example: Single Sign-On, Load Balancing and Failover Using Sun OpenSSO Enterprise 8.0. In particular, the following chapters provide examples with detailed step-by-step instructions for configuring load balancers, OpenSSO Enterprise sites, and OpenSSO Enterprise for system failover and session failover.

The following are additional resources for configuring system failover and session failover:

Evaluating Benefits and Trade-Offs

Benefits

Trade-Offs