Recovering from Failures

You can use various techniques to manually recover individual subcomponents after hardware failures such as disk crashes.

The following topics are addressed here:

Recovering the Domain Administration Server
Recovering GlassFish Server Instances
Recovering the HTTP Load Balancer and Web Server
Recovering Message Queue

Recovering the Domain Administration Server

Loss of the Domain Administration Server (DAS) affects only administration. GlassFish Server clusters and standalone instances, and the applications deployed to them, continue to run as before, even if the DAS is not reachable

Use any of the following methods to recover the DAS:

Back up the domain periodically, so you have periodic snapshots. After a hardware failure, re-create the DAS on a new host, as described in Re-Creating the Domain Administration Server (DAS) in Oracle GlassFish Server 3.1 Administration Guide.
Put the domain installation and configuration on a shared and robust file system (NFS for example). If the primary DAS host fails, a second host is brought up with the same IP address and will take over with manual intervention or user supplied automation.
Zip the GlassFish Server installation and domain root directory. Restore it on the new host, assigning it the same network identity.

Recovering GlassFish Server Instances

GlassFish Server provide tools for backing up and restoring GlassFish Server instances. For more information, see To Resynchronize an Instance and the DAS Offline.

Recovering the HTTP Load Balancer and Web Server

There are no explicit commands to back up only a web server configuration. Simply zip the web server installation directory. After failure, unzip the saved backup on a new host with the same network identity. If the new host has a different IP address, update the DNS server or the routers.

Note - This assumes that the web server is either reinstalled or restored from an image first.

The Load Balancer Plug-In (plugins directory) and configurations are in the web server installation directory, typically /opt/SUNWwbsvr. The web-install/web-instance/config directory contains the loadbalancer.xml file.

Recovering Message Queue

When a Message Queue broker becomes unavailable, the method you use to restore the broker to operation depends on the nature of the failure that caused the broker to become unavailable:

Power failure or failure other than disk storage
Failure of disk storage

Additionally, the urgency of restoring an unavailable broker to operation depends on the type of the broker:

Standalone Broker. When a standalone broker becomes unavailable, both service availability and data availability are interrupted. Restore the broker to operation as soon as possible to restore availability.
Broker in a Conventional Cluster. When a broker in a conventional cluster becomes unavailable, service availability continues to be provided by the other brokers in the cluster. However, data availability of the persistent data stored by the unavailable broker is interrupted. Restore the broker to operation to restore availability of its persistent data.
Broker in an Enhanced Cluster. When a broker in an enhanced cluster becomes unavailable, service availability and data availability continue to be provided by the other brokers in the cluster. Restore the broker to operation to return the cluster to its previous capacity.

Recovering From Power Failure and Failures Other Than Disk Storage

When a host is affected by a power failure or failure of a non-disk component such as memory, processor or network card, restore Message Queue brokers on the affected host by starting the brokers after the failure has been remedied.

To start brokers serving as Embedded or Local JMS hosts, start the GlassFish instances the brokers are servicing. To start brokers serving as Remote JMS hosts, use the imqbrokerd Message Queue utility.

Recovering from Failure of Disk Storage

Message Queue uses disk storage for software, configuration files and persistent data stores. In a default GlassFish installation, all three of these are generally stored on the same disk: the Message Queue software in as-install-parent/mq, and broker configuration files and persistent data stores (except for the persistent data stores of enhanced clusters, which are housed in highly available databases) in as-install-parent/glassfish/domains/domain1/imq. If this disk fails, restoring brokers to operation is impossible unless you have previously created a backup of these items. To create such a backup, use a utility such as zip, gzip or tar to create archives of these directories and all their content. When creating the backup, you should first quiesce all brokers and physical destinations, as described in Quiescing a Broker and Pausing and Resuming a Physical Destination in the Message Queue 4.5 Administration Guide, respectively. Then, after the failed disk is replaced and put into service, expand the backup archive into the same location.

Restoring the Persistent Data Store From Backup. For many messaging applications, restoring a persistent data store from backup does not produce the desired results because the backed up store does not represent the content of the store when the disk failure occurred. In some applications, the persistent data changes rapidly enough to make backups obsolete as soon as they are created. To avoid issues in restoring a persistent data store, consider using a RAID or SAN data storage solution that is fault tolerant, especially for data stores in production environments.

Skip Navigation Links
Exit Print View
	Oracle GlassFish Server 3.1-3.1.1 High Availability Administration Guide