Oracle Waveset 8.1.1 Overview

Chapter 3 Clustering and High Availability

This chapter provides guidance on how to implement a high availability / fault tolerant (HA/FT) Waveset environment.


Note –

Please consult your web server, application server, and database provider's documentation for best practices on ensuring a highly available deployment with each technology. This guide is not a substitute for the vendor-specific recommendations for web servers.


Assessing the Need for Availability

This section describes how to assess the amount of availability that your specific deployment requires.

Assessing the Cost of Downtime

Because Waveset is not in the transaction path between general users and the systems and applications that they already have access to, Waveset downtime is not the disaster that you might imagine. If Waveset is unavailable, end users are still able to access resources through their provisioned accounts.

The main cost of Waveset downtime is lost productivity. If Waveset is down, end users cannot use Waveset to gain access to systems that they are either locked out of or not provisioned to.

To calculate the cost of downtime, the first number that is needed is the average cost of lost productivity due to end users being unable to access computing resources within the enterprise. In our assessment, this number is called productivity per person hour.

The other major number that needs to be determined is the percentage of end users within the user population who need to use Waveset at any given time. This population usually includes new hires who need to be provisioned, and end users who have forgotten their password if password management is a part of the deployment.

Consider the following hypothetical situation:

Total number of employees 

20,000 

 

Number of password resets in a day 

130 

 

Number of new hires in a day 

30 

 

Number of hours in a work day 

 

For this particular situation you can calculate the following:

Using these numbers you can then estimate the cost of a Waveset outage:

Productivity per person hour 

$100 

   

Loss in productivity 

.5 

 

(50% decrease in productivity due to inability to access system) 

Number of people affected 

20 

   

Subtotal 

$1,000 

   
       

Duration of outage 

2 hours 

   

Total immediate loss 

$2,000 

   

This example shows that even though the number of users being managed by Waveset is high, the number of users needing Waveset to gain access to systems at any given time is usually low.

Another point to consider is that the time it takes to bring a system like Waveset back online is usually less than the time it takes to execute the manual provisioning processes that Waveset is automating. So while Waveset downtime exacts a cost, it is usually less than the cost of using manual processes to give users access to resources.

Understanding the Causes of Downtime

When planning for a Waveset highly-available deployment, it is worthwhile to consider the causes of downtime.

These causes include the following:

Calculating Return on Investment

Waveset automates processes and reduces lost productivity. The return on investing in a highly-available Waveset architecture is realized by minimizing downtime and averting lost productivity.

You can use the cost of downtime to determine the amount of availability that is ultimately needed for Waveset. In general, a moderate investment in making Waveset highly-available is worthwhile.

When calculating the cost of your investment, remember that purchasing HA/FT hardware and software is only one part of implementing an available solution. Having a knowledgeable staff to keep it up and running is another cost.

Understanding the Waveset High-Availability Feature Set

Waveset is designed to leverage HA infrastructure if it is available. For example, Waveset does not require an application server cluster to achieve high availability, but it can utilize a cluster if it exists.

The following diagram shows the major Waveset components deployed in a non-redundant architecture. The sections that follow will describe how the Waveset repository, application server, and gateway can be made highly-available.

Figure 3–1 Waveset Standard System Architecture

Logical diagram representing the standard Waveset system
architecture.


Note –

See Understanding the System Separation and Physical Proximity Guidelines for information on which components should be physically sited near one another in order to minimize performance issues that could arise due to latency and network congestion.


Making the Repository Highly-Available

Waveset stores all of its provisioning and state information in the Waveset repository.

The availability of the database instance storing the Waveset repository is the most critical piece to achieving a highly available Waveset deployment. The repository is the representation of the entire Waveset installation and the data within it must be protected as with other important database applications. At minimum, regular backups must be performed.


Note –

Do not host the Waveset repository on a virtual platform such as a VMware virtual machine because performance (transactions per second) will be hindered significantly.


There can only be one image of the repository. It is not possible to have two separate databases for Waveset and attempt to synchronize them nightly. Oracle recommends using your database's clustering or mirroring capabilities to provide fault tolerance.

Making the Application Server Highly Available

Waveset can run within an application server cluster and take advantage of the added availability and load balancing that a cluster provides. Waveset does not use any J2EE features that require clustering, however.

Waveset uses the HTTP Session object that is available through the Servlet API. This session object tracks a user's visit as the user logs in and performs actions. In a cluster, you can optionally have multiple nodes handle a user's requests during a given session. This is usually not recommended, however, and most installations are configured to send a user's entire request for a given session to the same server.

It is possible to add additional availability and capacity to the application server running Waveset even if you do not set up a cluster. This is achieved by installing multiple application servers with Waveset, connecting them to the same repository, and putting a load balancer with session affinity in front of all the application servers.


Note –

For more information on session affinity, see Frequently Asked Questions Regarding Session Affinity and Session Persistence.


Waveset runs certain tasks in the background—for example, scheduled reconciliation tasks. These tasks are stored in the database and can be picked up by any Waveset server to run. Waveset uses the database to ensure that these tasks are always run to completion, even if it has to fail over to another node.

Configuring Active Sync Clustering on Application Server Nodes

The sources.hosts setting in the Waveset.properties file controls which hosts in a multi instance environment are used for executing Active Sync requests. This setting provides a list of hosts that source adapters can run on. Setting this to localhost or null will allow source adapters to execute on any host in the web farm. (This is the default behavior.) By listing one or more hosts, you can restrict execution to that list. If you have inbound updates from another system that go to a particular host, use the sources.hosts setting to record the host names.

In addition, you can define a property named sources.resourceName.hosts, which controls where the resource's Active Sync task will run. Replace resourceName with the name of the resource object you wish to specify.

Making the Gateway Highly Available

Waveset requires a lightweight gateway to manage resources that cannot be directly accessed from the server. These include systems that require client-side API calls that are platform specific. For example, if Waveset is running on a UNIX-based application server, the ability to make NTLM or ADSI calls to managed NT or Active Directory domains is not possible. Because Waveset requires a gateway to manage these resources, it is important to ensure that the Waveset Gateway is made highly available.

To prevent the Gateway from becoming a single point of failure, Oracle recommends having multiple machines running a Gateway instance. A network routing device or load balancer should be configured to provide failover if the main Gateway instance dies. The failover device should be configured to use sticky sessions. A failover device without sticky sessions is not a supported configuration and will cause certain Waveset functions to fail.

All Windows domains managed by a Gateway must be part of the same forest. Managing domains across forest boundaries is unsupported. If you have multiple forests, install at least one Gateway in each forest.

Win32 monitoring tools can be configured to watch the gateway.exe process on the Win32 host. In the event that gateway.exe fails, the process can be automatically restarted.

Understanding the Recommended HA Architecture

The following diagram shows the Waveset architecture Oracle recommends if there is no existing web application infrastructure.

Figure 3–2 Waveset High-Availability Architecture

Logical diagram representing the recommended Waveset high-availability
architecture.

In an actual deployment, existing redundant application server infrastructure should be utilized to the extent possible. The value of this architecture is that it only uses load balancers for achieving redundancy at the application server. Load balancers with session affinity detect failed application server instances and failover to active instances. Load balancers are also used to provide horizontal scaling in the web environment by spreading the user requests across a cluster of servers.

Though this is a straightforward architecture, the uptime characteristics are comparable to more complex deployments. Because of its simplicity, there are fewer pieces of software to maintain and monitor or fewer pieces that could fail. Because human error is the number one cause of downtime, a relatively simple solution may achieve better uptime characteristics than something more complex. There are no universal right answers. The point is to understand all of the causes of downtime and choose the architecture that will result in the best availability for the investment.


Note –

It would be impossible to describe all of the different HA architectures that are possible with a web application like Waveset.

Because Waveset can be deployed in a variety of possible combinations, it may be most economical to identify existing infrastructure and utilize as much of it as possible when deploying Waveset.


Understanding the Recommended Service Provider HA Architecture

If Waveset Service Provider functionality is to be utilized, Oracle recommends adding a web tier between the user tier and the application tier. The web tier consists of one or more web servers that reside in a demilitarized zone (DMZ) that is separated by a firewall from the application tier.

An LDAP repository is required if Service Provider functionality is to be utilized. In addition, the standard Waveset repository is always required for Service Provider in order to maintain configuration information, forms, rules, and other objects.

Figure 3–3 Waveset Service Provider High-Availability Architecture

Logical diagram representing the recommended Waveset high-availability
architecture for a Service Provider implementation.

Understanding Failure Scenarios

This section lists eight failure scenarios and compares two deployments, one with session persistence, and one without.

Scenario 1: The No-Workflow Scenario

Scenario Description

The end user or the administrator is editing a form that is not a part of a workflow. The instance on which the user has an established session goes down.

Without Session Persistence

User experience: A nontransparent failover. Upon submitting the form, the user is returned to the login page.

Recovery steps: The user reenters his or her user name and password. Waveset then processes the form and presents the results as the next page immediately following the login.

With Session Persistence

User experience: The user's form is submitted and the results are returned without the user being logged off and required to log in again.

Recovery steps: No user action is needed.

Other Scenario Examples

Scenario 2: The Workflow-in-Progress Scenario

Scenario Description

The end user or the administrator has submitted a form that triggered a workflow. The instance on which the workflow is executing and on which the user's session exists is generally going to be the same except in case of some scheduled tasks where they can be different. This instances goes down while the workflow is in progress.

Without Session Persistence

User experience: A nontransparent failover. The form submit returns the user to the login page. The workflow task instance being executed should be in the repository, but because the node of execution is down, the workflow status will be “terminated.”

Recovery steps: The workflow has to be submitted again. This has to be done by going back to the same form and reentering the same information that was used to trigger the workflow before the node failed.

The submission of the same request data may work in some cases, but not all. If the workflow provisions to more than one resource during its execution and some of these resources were provisioned before the failure, the workflow resubmission from the user would have to account for the “already provisioned to” resources. Note that the terminated workflow sticks around in the repository until the resultLimit expires on the TaskInstance object.

With Session Persistence

User experience: A nontransparent failover. The user does not get logged out because his session is persisted and reestablished in the new instance. The form submit, however, will probably result in an error because the workflow will be terminated. This failover is nontransparent because recovery actions are needed.

Recovery steps: Same as in the Without Session Persistence mode. The user has to resubmit the request that triggered off the previous workflow with the same or modified parameters.

Other Scenario Examples

Scenario 3: The Workflow-in-Abeyance-or-Sleep Scenario

Scenario Description

This scenario covers situations where the workflow has started, but is waiting for a manual action by an approver.

Without Session Persistence

User experience: Transparent failover with respect to the approver provided that the approver has not yet logged in. After the node failure, when the approver does log in, the approver will still see the approval request in his or her inbox, even though the request was triggered from a node that is no longer up.

Recovery steps: No user action is needed.

With Session Persistence

User experience: Same as in the Without Session Persistence mode.

Recovery steps: Same as in the Without Session Persistence mode.

Other Scenario Examples

Scenario 4: The Work-Item-Edit Scenario

Scenario Description

This scenario includes cases where a user is editing a work item and the node upon which the user has a session goes down before the work item can be submitted.

Without Session Persistence

User experience: A nontransparent failover. When the work item edit form is submitted, the user is logged off and returned to the login page.

Recovery steps: Upon resubmitting login credentials, the user's work item is marked completed and the workflow can resume from that point. The workflow should be picked up by the new mode for execution from the point where the user's manual action is marked completed.

With Session Persistence

User experience: When the work item edit form is submitted, the user sees the effect of his submission—for example, either the next form in the custom workflow if there is one, or a success message.

Recovery steps: No user action needed.

Other Scenario Examples

Scenario 5: The Scheduled-Tasks-in-Progress Scenario

Scenario Description

These scenarios cover cases where a node failure occurs while a reconciliation is in progress or while a report is executing.

Without Session Persistence

User experience: The scheduled task terminates in process.

Recovery steps: The scheduled task that was in progress has to be restarted. The task will start from the beginning. (The task will not restart from the point of failure.) This is akin to creating and starting a new task.

With Session Persistence

User experience: Same as in the Without Session Persistence mode.

Recovery steps: Same as in the Without Session Persistence mode.

Other Scenario Examples

Scenario 6: The Scheduled-Task-in-Abeyance Scenario

Scenario Description

These scenarios cover cases where a user's custom workflow has scheduled a task for execution at a later date on a specific node. Before the scheduled date is reached, the node that the task was scheduled on fails.

Without Session Persistence

User experience: The failover is transparent with respect to the recovery actions required to ensure that this task executes at its scheduled time.

Recovery steps: The scheduled task is taken up by any live node when the scheduled execution time arrives.

With Session Persistence

User experience: Same as in the Without Session Persistence mode.

Recovery steps: Same as in the Without Session Persistence mode.

Other Scenario Examples

Scenario 7: The Web-Services-Workflow-Request-Not-Yet-Received-by-Waveset Scenario

Scenario Description

These scenarios cover those cases where the Waveset GUI is not used to launch provisioning. Instead, the user interface is provided by an application that internally calls to Waveset using the SPML or other custom web service interface. Here the user session related to the user going through the UI is managed by way of the calling application. For Waveset, the requests are all launched as the “soapadmin” subject.

In such a use case, this failure scenario covers the case where the request by way of the Waveset endpoint has not been received yet and the targeted node fails.

Without Session Persistence

User experience: A transparent failover. The SOAP administrator's credentials are passed in for each SOAP request, either over the wire or within Waveset through a Waveset.properties setting. As long as the node which was to receive this SOAP request has not received the request before going down, the failover is transparent with or without session persistence.

Recovery steps: No action needed. The SOAP request is sent to a live node that executes it.

With Session Persistence

User experience: Same as in the Without Session Persistence mode.

Recovery steps: Same as in the Without Session Persistence mode.

Scenario 8: The Web-Services-Workflow-Request-In-Progress-by-Waveset Scenario

Scenario Description

This scenario is similar to scenario seven. The only difference is that the workflow is in progress when the node fails, or the node has received the SOAP request when the node fails.

Without Session Persistence

User experience: This scenario is similar to scenario two (workflow in progress). The workflow is marked terminated and the user sees an error as a result of the SOAP request.

Recovery steps: The user has to resubmit the form with similar or modified parameters (based on where the failure occurs in the workflow) using the user interface in the third-party application.

With Session Persistence

User experience: Same as in the Without Session Persistence mode.

Recovery steps: Same as in the Without Session Persistence mode.

Frequently Asked Questions Regarding Session Affinity and Session Persistence

Should you have session affinity enabled when scaling application servers horizontally?

Yes.

Should you have session persistence when scaling application servers horizontally?

Unless your business requirements place a very high emphasis on having transparent failover in the limited situations where session persistence can make a difference, Oracle recommends against using session persistence. Session persistence has its own performance overhead and, unless transparent failovers are absolutely mandated by your business requirements, leave session persistence turned off.

If you study the failure scenarios documented in Understanding Failure Scenarios, in six of the eight scenarios there is no difference in the end-user experience or required recovery actions, regardless of whether session persistence is enabled. Only in scenarios one and four are there any difference between the session-persistence scenarios as opposed to the no-session-persistence scenarios.

In these two scenarios, session persistence can provide some failover transparency, but session persistence hurts performance. Based on the size of the session objects, the repository being used for session persistence, and the optimization of your specific application server's session management code, the performance overhead can range from 10 percent to 20 percent or even higher.

Should you have multiple application server instances in a cluster when scaling horizontally?

Multiple application server instances are not absolutely needed unless you want session persistence. Fail-over without session persistence can be achieved even if all application server nodes are not in a cluster.