Cannot Restart the HADB (Sun GlassFish Enterprise Server 2.1 Troubleshooting Guide)

Sun GlassFish Enterprise Server 2.1 Troubleshooting Guide

Cannot Restart the HADB

Description

HADB restart does not work after a double node failure. Additional recovery actions are needed before HADB can be restarted.

Symptoms of a double node failure include:

hadbm status shows that the HADB status is non-operational.
The node status shows that the nodes are in Starting or Recovering state. Even after stopping and then restarting each of the nodes, they remain in the Starting state. Eventually, the node status changes to Stopped.

This problem occurs when mirror HADB host machines have failed or been rebooted, typically after a power outage, or when a machine is rebooted without first stopping the HADB (in a single-machine installation), or when a pair of mirror machines from both Data Redundancy Units (DRUs) are rebooted.

HADB cannot heal itself automatically in such “double failure” situations because the part of the data that resided on the pair nodes is lost. In such cases, the hadbm start command does not succeed, and the hadbm status command shows that HADB is in a non-operational state.

For more information on the DRUs and HADB confutation, see “Administering the High Availability Database” in the Administration Guide, and the Deployment Guide.

Tip –

If the HADB exhibits strange behavior (for example consistent timeout problems), and you want to check whether a restart cures the problem, use the hadbm restart command.

When the HADB is restarted in this manner, HADB services remain available. Conversely, if HADB is started and stopped in separate operations using hadbm stop and hadbm start, HADB services are unavailable while HADB is stopped.

Solution

Verify that the node states show Starting/Recovering, then reset the database. Follow the instructions under “Recovering from Session Data Corruption” in the “Administering the High Availability Database” chapter of the Administration Guide.