Sun GlassFish Enterprise Server 2.1 Release Notes

Starting, stopping, and reconfiguring HADB may fail or hang (6230792, 6230415)

Description

On Solaris 10 Opteron, starting, stopping or reconfiguring HADB using the hadbm command may fail or hang with one of the following errors:


hadbm:Error 22009: The command issued had no progress in the last 
300 seconds.
HADB-E-21070: The operation did not complete within the time limit, 
but has not been cancelled and may complete at a later time.

This may happen if there are inconsistencies reading/writing to a file (nomandevice) which the clu_noman_srv process uses. This problem can be detected by looking for the following messages in the HADB history files:


n:3 NSUP INF 2005-02-11 18:00:33.844 p:731 Child process noman3 733 
does not respond.
n:3 NSUP INF 2005-02-11 18:00:33.844 p:731 Have not heard from it in 
104.537454 sec.
n:3 NSUP INF 2005-02-11 18:00:33.844 p:731 Child process noman3 733 
did not start.

Solution

The following workaround is unverified, as the problem has not been reproduced manually. However, running this command for the affected node should solve the problem.


hadbm restartnode --level=clear nodeno dbname

Note that all devices for the node will be reinitialized. You may have to stop the node before reinitializing it.