Sun Java System Application Server Enterprise Edition 8.2 High Availability Administration Guide

Recovering from Failures

Using Sun Cluster

Sun Cluster provides automatic failover of the domain administration server, node agents, Application Serverinstances, Message Queue, and HADB. For more information, see Sun Cluster Data Service for Sun Java System Application Server Guide for Solaris OS.

Use standard ethernet interconnect and a subset of Sun Cluster products. This capability is included in Java ES.

Manual Recovery

You can use various techniques to manually recover individual subcomponents:

Recovering the Domain Administration Server

Loss of the Domain Administration Server (DAS) affects only administration. Application Server clusters and applications will continue to run as before, even if the DAS is not reachable

Use any of the following methods to recover the DAS:

Recovering Node Agents and Server Instances

There are two methods for recovering node agents and sever instances.

Keep a backup zip file. There are no explicit commands to back up the node agent and server instances. Simply create a zip file with the contents of the node agents directory. After failure, unzip the saved backup on a new machine with same host name and IP address. Use the same install directory location, OS, and so on. A file-based install, package-based install, or restored backup image must be present on the machine.

Manual recovery. You must use a new host with the same IP address.

  1. Install the Application Server node agent bits on the machine.

  2. See the instructions for AS8.1 UR2 patch 4 installation

  3. Recreate the node agents. You do not need to create any server instances.

  4. Synchronization will copy and update the configuration and data from the DAS.

Recovering Load Balancer and Web Server

There are no explicit commands to back up only a web server configuration. Simply zip the web server installation directory. After failure, unzip the saved backup on a new machine with the same network identity. If the new machine has a different IP address, update the DNS server or the routers.

Note –

This assumes that the web server is either reinstalled or restored from an image first.

The load balancer plugin (plugins directory) and configurations are in the web server installation directory, typically /opt/SUNWwbsvr. The web-install/web-instance/config directory contains the loadbalancer.xml file.

Recovering Message Queue

Message Queue (MQ) configurations and resources are stored in the DAS and can be synchronized to the instances. Any other data and configuration information is in the MQ directories, typically under /var/imq, so backup and restore these directories as required. The new machine must already contain the MQ installation. Be sure to start the MQ brokers as before when you restore a machine.

Recovering HADB

If you have two active HADB nodes, you can configure two spare nodes (on separate machines), that can take over in case of failure. This is a cleaner method because backup and restore of HADB may result in stale sessions being restored.

For information on creating a database with spare nodes, see Creating a Database. For information on adding spare nodes to a database, see Adding Nodes. If recovery and self-repair fail, then the spare node takes over automatically.

Using Netbackup

Note –

This procedure has not been tested by Sun QA.

Use Veritas Netbackup to save an image of each machine. In the case of BPIP backup the four machines with web servers and application servers.

For each restored machine use the same configuration as the original, for example the same host name, IP address, and so on.

For file-based products such as Application Server, backup and restore just the relevant directories. However, for package-based installs such as the web server image, you must backup and restore the entire machine. Packages are installed into the Solaris package database. So, if you only back up the directories and subsequently restore on to a new system, the result will be a "deployed" web server with no knowledge in the package database. This may cause problems with future patching or upgrading.

Do not manually copy and restore the Solaris package database. The other alternative is to backup an image of the machine after the components are installed, for example, web server. Call this the baseline tar file. When you make changes to the web server, back up these directories for example, under /opt/SUNWwbsvr. To restore, start with the baseline tar file and then copy over the web server directories that have been modified. Similarly, you can use this procedure for MQ (package-based install for BPIP). If you upgrade or patch the original machine be sure to create a new baseline tar file.

If the machine with the DAS goes down there will be a time when it is unavailable until you restore it.

The DAS is the central repository. When you restore server instances and restart them they will be synchronized with information from the DAS only. Hence, all changes must be performed via asadmin or Admin Console.

Daily backup image of HADB may not work, since the image may contain old application session state.

Recreating the Domain Administration Server

If you have backed up the Domain Administration Server (DAS), you can recreate it if the host machine fails. To recreate a working copy of the DAS, you must have:

Note –

You must maintain a backup of the DAS from the first machine. Use asadmin backup-domain to backup the current domain.

ProcedureTo migrate the Domain Administration Server

To migrate the DAS from the first machine (machine1) to the third machine (machine3), follow these steps:

  1. Install the application server on the third machine just as it is installed on the first machine.

    This is required so that the DAS can be properly restored on the third machine and there are no path conflicts.

    1. Install the application server administration package using the command-line (interactive) mode.

      To activate the interactive command-line mode, invoke the installation program using the console option:

      ./bundle-filename -console

      You must have root permission to install using the command-line interface.

    2. Deselect the option to install default domain.

      Restoration of backed up domains is only supported on two machines with same architecture and exactly the same installation paths (use same install-dir and domain-root-dir on both machines).

  2. Copy the backup ZIP file from the first machine into the domain-root-dir on the third machine.

    You can also FTP the file.

  3. Execute asadmin restore-domain command to restore the zip file onto the third machine:

    asadmin restore-domain --filename domain-root-dir/ domain1

    You can backup any domain. However, while recreating the domain, the domain name should be same as the original.

  4. Change domain-root-dir/domain1/generated/tmp directory permissions on the third machine to match the permissions of the same directory on first machine.

    The default permissions of this directory are: drwx------ (or 700).

    For example:

    chmod 700 domain-root-dir/domain1/generated/tmp

    The example above assumes you are backing up domain1. If you are backing up a domain by another name, you should replace domain1 above with the name of the domain being backed up.

  5. Change the host values for the properties in the domain.xml file for the third machine:

  6. Update the domain-root-dir/domain1/config/domain.xml on the third machine.

    For example, search for machine1 and replace it with machine3. So, you can change:

    <jmx-connector><property name=client-hostname value=machine1/>...


    <jmx-connector><property name=client-hostname value=machine3/>...
  7. Change:

    <jms-service... host=machine1.../>


    <jms-service... host=machine3.../>
  8. Start the restored domain on machine3:

    asadmin start-domain --user admin-user --password admin-password domain1
  9. Change the DAS host values for properties under node agent on machine2.

  10. Change property value in install-dir/nodeagents/nodeagent/agent/config/ on machine2.

  11. Restart the node agent on machine2.

    Note –

    Start the cluster instances using the asadmin start-instance command to allow them to synchronize with the restored domain.