Sun Java System Application Server 9.1 High Availability Administration Guide

Chapter 1 High Availability in Application Server

This chapter describes the high availability features in the Sun Java System Application Server that are available with the cluster profile and the enterprise profile.

Note –

The HADB software is supplied with the Application Server standalone distribution of Sun Java System Application Server. For information about available distributions of Sun Java System Application Server, see Distribution Types and Their Components in Sun Java System Application Server 9.1 Installation Guide. HADB features are available only in the enterprise profile. For information about profiles, see Usage Profiles in Sun Java System Application Server 9.1 Administration Guide.

This chapter contains the following topics.

Overview of High Availability

High availability applications and services provide their functionality continuously, regardless of hardware and software failures. Such applications are sometimes referred to as providing five nines of reliability, because they are intended to be available 99.999% of the time.

Application Server provides the following high availability features:

High Availability Session Persistence
High Availability Java Message Service
RMI-IIOP Load Balancing and Failover

High Availability Session Persistence

Application Server provides high availability of HTTP requests and session data (both HTTP session data and stateful session bean data).

Java EE applications typically have significant amounts of session state data. A web shopping cart is the classic example of a session state. Also, an application can cache frequently-needed data in the session object. In fact, almost all applications with significant user interactions need to maintain session state. Both HTTP sessions and stateful session beans (SFSBs) have session state data.

Preserving session state across server failures can be important to end users. For high availability, Application Server provides the following types of storage for session state data:

In-memory replication on other servers in the cluster
High-availability database (HADB)

If the Application Server instance hosting the user session experiences a failure, the session state can be recovered, and the session can continue without loss of information.

For a detailed description of how to set up high availability session persistence, see Chapter 9, Configuring High Availability Session Persistence and Failover

High Availability Java Message Service

The Java Message Service (JMS) API is a messaging standard that allows Java EE applications and components to create, send, receive, and read messages. It enables distributed communication that is loosely coupled, reliable, and asynchronous. The Sun Java System Message Queue (MQ), which implements JMS, is tightly integrated with Application Server, enabling you to create components that rely on JMS, such as message-driven beans (MDBs).

JMS is made highly available through connection pooling and failover and MQ clustering. For more information, see Chapter 10, Java Message Service Load Balancing and Failover.

Connection Pooling and Failover

Application Server supports JMS connection pooling and failover. The Application Server pools JMS connections automatically. By default, Application Server selects its primary MQ broker randomly from the specified host list. When failover occurs, MQ transparently transfers the load to another broker and maintains JMS semantics.

For more information about JMS connection pooling and failover, see Connection Pooling and Failover.

MQ Clustering

MQ Enterprise Edition supports multiple interconnected broker instances known as a broker cluster. With broker clusters, client connections are distributed across all the brokers in the cluster. Clustering provides horizontal scalability and improves availability.

For more information about MQ clustering, see Using MQ Clusters with Application Server.

RMI-IIOP Load Balancing and Failover

With RMI-IIOP load balancing, IIOP client requests are distributed to different server instances or name servers, which spreads the load evenly across the cluster, providing scalability. IIOP load balancing combined with EJB clustering and availability also provides EJB failover.

When a client performs a JNDI lookup for an object, the Naming Service essentially binds the request to a particular server instance. From then on, all lookup requests made from that client are sent to the same server instance, and thus all EJBHome objects will be hosted on the same target server. Any bean references obtained henceforth are also created on the same target host. This effectively provides load balancing, since all clients randomize the list of target servers when performing JNDI lookups. If the target server instance goes down, the lookup or EJB method invocation will failover to another server instance.

IIOP Load balancing and failover happens transparently. No special steps are needed during application deployment. If the Application Server instance on which the application client is deployed participates in a cluster, the Application Server finds all currently active IIOP endpoints in the cluster automatically. However, a client should have at least two endpoints specified for bootstrapping purposes, in case one of the endpoints has failed.

For more information on RMI-IIOP load balancing and failover, see Chapter 11, RMI-IIOP Load Balancing and Failover.

More Information

For information about planning a high-availability deployment, including assessing hardware requirements, planning network configuration, and selecting a topology, see Sun Java System Application Server 9.1 Deployment Planning Guide. This manual also provides a high-level introduction to concepts such as:

Application server components such as node agents, domains, and clusters
IIOP load balancing in a cluster
HADB architecture
Message queue failover

For more information about developing applications that take advantage of high availability features, see Sun Java System Application Server 9.1 Developer’s Guide.

Tuning High Availability Servers and Applications

For information on how to configure and tune applications and Application Server for best performance with high availability, see Sun Java System Application Server 9.1 Performance Tuning Guide, which discusses topics such as:

Tuning persistence frequency and persistence scope
Checkpointing stateful session beans
Configuring the JDBC connection pool
Session size
Tuning HADB disk use, memory allocation, performance, and operating system configuration
Configuring load balancer for best performance

How Application Server Provides High Availability

Application Server provides high availability through the following sub-components and features:

Load Balancer Plug-in

The load balancer plug-in accepts HTTP / HTTPS requests and forwards them to application server instances in a cluster. If an instance fails, becomes unavailable (due to network faults), or becomes unresponsive, the load balancer redirects requests to existing, available machines. The load balancer can also recognize when a failed instance has recovered and redistribute the load accordingly. Application Server provides the load balancer plug-in for the Sun Java System Web Server and the Apache Web Server, and Microsoft Internet Information Server.

By distributing workload among multiple physical machines, the load balancer increases overall system throughput. It also provides higher availability through failover of HTTP requests. For HTTP session information to persist, you must configure HTTP session persistence.

For simple, stateless applications a load-balanced cluster may be sufficient. However, for mission-critical applications with session state, use load balanced clusters with HADB.

Server instances and clusters participating in load balancing have a homogenous environment. Usually that means that the server instances reference the same server configuration, can access the same physical resources, and have the same applications deployed to them. Homogeneity assures that before and after failures, the load balancer always distributes load evenly across the active instances in the cluster.

For information on configuring load balancing and failover, see Chapter 5, Configuring HTTP Load Balancing

Storage for Session State Data

Storing session state data enables the session state to be recovered after the failover of a server instance in a cluster. Recovering the session state enables the session to continue without loss of information. Application Server provides the following types of high availability storage for HTTP session and stateful session bean data:

In-memory replication on other servers in the cluster
High availability database

In-Memory Replication on Other Servers in the Cluster

In-memory replication on other servers provides lightweight storage of session state data without the need to obtain a separate database, such as HADB. This type of replication uses memory on other servers for high availability storage of HTTP session and stateful session bean data. Clustered server instances replicate session state in a ring topology. Each backup instance stores the replicated data in memory. Replication of session state data in memory on other servers enables sessions to be distributed.

The use of in-memory replication requires the Group Management Service (GMS) to be enabled. For more information about GMS, see Group Management Service.

If server instances in a cluster are located on different machines, ensure that the following prerequisites are met:

To ensure that GMS and in-memory replication function correctly, the machines must be on the same subnet.
To ensure that in-memory replication functions correctly, the system clocks on all machines in the cluster must be synchronized as closely as possible.

Considerations for using in-memory replication:

Very simple to set up and use. There are little or no administration costs.
More Java Virtual Machine (JVM) heap space is used. JVM will need tuning for any production system.
Does NOT provide 99.999% availability. Therefore, in-memory replication is unsuitable for any customer that requires this level of availability.
Typically incurs less overhead than HADB does.
Cannot be used to create a highly available Message Queue cluster. This is only possible by using a highly available database.

Although an Application Server does cluster its Message Queue instances, this provides only a conventional Message Queue cluster and not a highly available Message Queue cluster.

For information on Message Queue requirements, see Sun Java System Message Queue 4.1 Administration Guide.

High Availability Database

Note –

Application Server provides the High Availability Database (HADB) for high availability storage of HTTP session and stateful session bean data. HADB is designed to support up to 99.999% service and data availability with load balancing, failover, and state recovery. Generally, you must configure and manage HADB independently of Application Server.

Keeping state management responsibilities separated from Application Server has significant benefits. Application Server instances spend their cycles performing as a scalable and high performance application containers delegating state replication to an external high availability state service. Due to this loosely coupled architecture, Application Server instances can be very easily added to or deleted from a cluster. The HADB state replication service can be independently scaled for optimum availability and performance. When an Application Server instance also performs replication, the performance of Java EE applications can suffer and can be subject to longer garbage collection pauses.

Considerations for using High Availability Database:

Provides 99.999% availability if guidelines are followed (redundant power supplies, network infrastructure, and so on).
Needs additional hardware resources and careful sizing to perform efficiently.

However, the HADB nodes could be on separate lower-cost dedicated machines rather than on large machines (which also helps with redundancy). HADB processes are fairly single-threaded, and do not make the best use of our multi-threaded processors. This means that more HADB nodes would need required to make best use of the resources, thus complicating the achievement of 99.999% availability.
HADB is complex and tricky to administer
Machines need to be on the same subnet. HADB nodes use UDP multicasting for heartbeats and cluster event notification.

For information on planning and setting up your application server installation for high availability with HADB, including determining hardware configuration, sizing, and topology, see Planning for Availability in Sun Java System Application Server 9.1 Deployment Planning Guide and Chapter 3, Selecting a Topology, in Sun Java System Application Server 9.1 Deployment Planning Guide.

Highly Available Clusters

A cluster is a collection of Application Server instances that work together as one logical entity. A cluster provides a runtime environment for one or more Java EE applications. A highly available cluster integrates a state replication service with clusters and load balancer.

Using clusters provides the following advantages:

High availability, by allowing for failover protection for the server instances in a cluster. If one server instance goes down, other server instances take over the requests that the unavailable server instance was serving.
Scalability, by allowing for the addition of server instances to a cluster, thus increasing the capacity of the system. The load balancer plug-in distributes requests to the available server instances within the cluster. No disruption in service is required as an administrator adds more server instances to a cluster.

All instances in a cluster:

Reference the same configuration.
Have the same set of deployed applications (for example, a Java EE application EAR file, a web module WAR file, or an EJB JAR file).
Have the same set of resources, resulting in the same JNDI namespace.

Every cluster in the domain has a unique name; furthermore, this name must be unique across all node agent names, server instance names, cluster names, and configuration names. The name must not be domain. You perform the same operations on a cluster (for example, deploying applications and creating resources) that you perform on an unclustered server instance.

Clusters and Configurations

A cluster's settings are derived from a named configuration, which can potentially be shared with other clusters. A cluster whose configuration is not shared by other server instances or clusters is said to have a stand-alone configuration . By default, the name of this configuration is cluster_name -config, where cluster_name is the name of the cluster.

A cluster that shares its configuration with other clusters or instances is said to have a shared configuration.

Clusters, Instances, Sessions, and Load Balancing

Clusters, server instances, load balancers, and sessions are related as follows:

A server instance is not required to be part of a cluster. However, an instance that is not part of a cluster cannot take advantage of high availability through transfer of session state from one instance to other instances.
The server instances within a cluster can be hosted on one or multiple machines. You can group server instances across different machines into a cluster.
A particular load balancer can forward requests to server instances on multiple clusters. You can use this ability of the load balancer to perform an online upgrade without loss of service. For more information, see “Using Multiple Clusters for Online Upgrades Without Loss of Service” in the chapter “Configuring Clusters”
A single cluster can receive requests from multiple load balancers. If a cluster is served by more than one load balancer, you must configure the cluster in exactly the same way on each load balancer.
Each session is tied to a particular cluster. Therefore, although you can deploy an application on multiple clusters, session failover will occur only within a single cluster.

The cluster thus acts as a safe boundary for session failover for the server instances within the cluster. You can use the load balancer and upgrade components within the Application Server without loss of service.

Recovering from Failures

Using Sun Cluster

Sun Cluster provides automatic failover of the domain administration server, node agents, Application Serverinstances, Message Queue, and HADB. For more information, see Sun Cluster Data Service for Sun Java System Application Server Guide for Solaris OS.

Use standard Ethernet interconnect and a subset of Sun Cluster products. This capability is included in Java ES.

Manual Recovery

You can use various techniques to manually recover individual subcomponents:

Recovering the Domain Administration Server

Loss of the Domain Administration Server (DAS) affects only administration. Application Server clusters and applications will continue to run as before, even if the DAS is not reachable

Use any of the following methods to recover the DAS:

Run asadmin backup commands periodically, so you have periodic snapshots. After a hardware failure, install App Server on a new machine, with the same network identity and run asadmin restore from the back up created earlier. For more information, see Recreating the Domain Administration Server.
Put the domain installation and configuration on a shared and robust file system (NFS for example). If the primary DAS machine fails, a second machine is brought up with the same IP address and will take over with manual intervention or user supplied automation. Sun cluster uses a similar approach for making DAS fault-tolerant.
Zip the Application Serverinstallation and domain root directory. Restore it on the new machine, assigning it the same network identity. This may be the simplest approach if you are using the file-based installation.
Restore from DAS backup. See the AS8.1 UR2 patch 4 instructions

Recovering Node Agents and Server Instances

There are two methods for recovering node agents and sever instances.

Keep a backup zip file. There are no explicit commands to back up the node agent and server instances. Simply create a zip file with the contents of the node agents directory. After failure, unzip the saved backup on a new machine with same host name and IP address. Use the same install directory location, OS, and so on. A file-based install, package-based install, or restored backup image must be present on the machine.

Manual recovery. You must use a new host with the same IP address.

Install the Application Server node agent bits on the machine.
See the instructions for AS8.1 UR2 patch 4 installation
Recreate the node agents. You do not need to create any server instances.
Synchronization will copy and update the configuration and data from the DAS.

Recovering Load Balancer and Web Server

There are no explicit commands to back up only a web server configuration. Simply zip the web server installation directory. After failure, unzip the saved backup on a new machine with the same network identity. If the new machine has a different IP address, update the DNS server or the routers.

Note –

This assumes that the web server is either reinstalled or restored from an image first.

The load balancer plugin (plugins directory) and configurations are in the web server installation directory, typically /opt/SUNWwbsvr. The web-install/web-instance/config directory contains the loadbalancer.xml file.

Recovering Message Queue

Message Queue (MQ) configurations and resources are stored in the DAS and can be synchronized to the instances. Any other data and configuration information is in the MQ directories, typically under /var/imq, so backup and restore these directories as required. The new machine must already contain the MQ installation. Be sure to start the MQ brokers as before when you restore a machine.

Recovering HADB

Note –

If you have two active HADB nodes, you can configure two spare nodes (on separate machines), that can take over in case of failure. This is a cleaner method because backup and restore of HADB may result in stale sessions being restored.

For information on creating a database with spare nodes, see Creating a Database. For information on adding spare nodes to a database, see Adding Nodes. If recovery and self-repair fail, then the spare node takes over automatically.

Using Netbackup

Note –

This procedure has not been tested by Sun QA.

Use Veritas Netbackup to save an image of each machine. In the case of BPIP backup the four machines with web servers and Application Servers.

For each restored machine use the same configuration as the original, for example the same host name, IP address, and so on.

For file-based products such as Application Server, backup and restore just the relevant directories. However, for package-based installs such as the web server image, you must backup and restore the entire machine. Packages are installed into the Solaris package database. So, if you only back up the directories and subsequently restore on to a new system, the result will be a "deployed" web server with no knowledge in the package database. This may cause problems with future patching or upgrading.

Do not manually copy and restore the Solaris package database. The other alternative is to backup an image of the machine after the components are installed, for example, web server. Call this the baseline tar file. When you make changes to the web server, back up these directories for example, under /opt/SUNWwbsvr. To restore, start with the baseline tar file and then copy over the web server directories that have been modified. Similarly, you can use this procedure for MQ (package-based install for BPIP). If you upgrade or patch the original machine be sure to create a new baseline tar file.

If the machine with the DAS goes down there will be a time when it is unavailable until you restore it.

The DAS is the central repository. When you restore server instances and restart them they will be synchronized with information from the DAS only. Hence, all changes must be performed via asadmin or Admin Console.

Daily backup image of HADB may not work, since the image may contain old application session state.

Recreating the Domain Administration Server

If the machine hosting the domain administration server (DAS) fails, you can recreate the DAS if you have previously backed up the DAS. To recreate a working copy of the DAS, you must have:

One machine (machine1) that contains the original DAS.
A second machine (machine2) that contains a cluster with server instances running applications and catering to clients. The cluster is configured using the DAS on the first machine.
A third backup machine (machine3) where the DAS needs to be recreated in case the first machine crashes.

Note –

You must maintain a backup of the DAS from the first machine. Use asadmin backup-domain to backup the current domain.

To migrate the DAS

The following steps are required to migrate the Domain Administration Server from the first machine (machine1) to the third machine (machine3).

Install the application server on the third machine just as it is installed on the first machine.

This is required so that the DAS can be properly restored on the third machine and there are no path conflicts.
1. Install the application server administration package using the command-line (interactive) mode.
  
  To activate the interactive command-line mode, invoke the installation program using the console option:
  ./bundle-filename -console
  You must have root permission to install using the command-line interface.
2. Deselect the option to install default domain.
  
  Restoration of backed up domains is only supported on two machines with same architecture and exactly the same installation paths (use same as-install and domain-root-dir on both machines).

Copy the backup ZIP file from the first machine into the domain-root-dir on the third machine.

You can also FTP the file.

Restore the ZIP file onto the third machine.
asadmin restore-domain --filename domain-root-dir/sjsas_backup_v00001.zip --clienthostname machine3 domain1
Note –
By specifying the --clienthostname option, you avoid the need to modify the jmx-connector element's client-hostname property in the domain.xml file.

You can backup any domain. However, while recreating the domain, the domain name should be same as the original.

Change domain-root-dir/domain1/generated/tmp directory permissions on the third machine to match the permissions of the same directory on first machine.

The default permissions of this directory are: drwx------ (or 700).

For example:
chmod 700 domain-root-dir/domain1/generated/tmp
The example above assumes you are backing up domain1. If you are backing up a domain by another name, you should replace domain1 above with the name of the domain being backed up.

In the domain-root-dir/domain1/config/domain.xml file on the third machine, update the value of the jms-service element's host attribute.

The original setting of this attribute is as follows:
```
<jms-service... host=machine1.../>
```
Modify the setting of this attribute as follows:
```
<jms-service... host=machine3.../>
```

Start the restored domain on machine3:
asadmin start-domain --user admin-user --password admin-password domain1
The DAS contacts all running node agents and provides the node agents with information for contacting the DAS. The node agents use this information to communicate with the DAS.

For any node agents that are not running when the DAS is restarted, change agent.das.host property value in as-install/nodeagents/nodeagent/agent/config/das.properties on machine2.

This step is not required for node agents that are running when the DAS is restarted.

Restart the node agent on machine2.

Note –
Start the cluster instances using the asadmin start-instance command to allow them to synchronize with the restored domain.