Operation of the Sun Cluster HA for Sun Java System HADB Fault Monitor

This section explains the functioning of the Sun Cluster HA for Sun Java System HADB fault monitor.

The start method of the HADB resource starts HADB nodes that are configured to run on the local Sun Cluster node if they are not running. The method then attempts to start the HADB database; in case of failure, the database will be started later in the probe.

The Sun Cluster HA for Sun Java System HADB fault monitor probe periodically checks the status of the HADB database and the HADB nodes. The probe will restart failed HADB nodes and also start the HADB database if the HADB resource was not ready to start the database during the start method. For each iteration of this procedure, the probe executes the following steps:

First, the probe sleeps for a period of Thorough_Probe_Interval seconds.
The probe retrieves the current status of the HADB database and the HADB nodes by executing the hadbm status and hadbm status --nodes commands.
If the database is not running, the probe checks that the HADB stopstate file corresponding to that database exists on the local Sun Cluster node. The hadbm start command references the stopstate file for role assignment of nodes when it starts the database.
If the stopstate file exists, the HADB resource examines it to determine if the database can be started.
- If the database can be started, the probe starts the database and sets the resource status to Online.
- If the database cannot be started, the probe sets the resource status to Online Degraded.
If the database is running, the probe starts the HADB nodes configured to run on the local Sun Cluster node.
If the database and the local HADB nodes are running, the probe sets the resource status to Online if it was Online Degraded.
If all the Sun Cluster nodes in the HADB resource group have the HADB resource running in the Online Degraded state longer than Stop_timeout seconds, the HADB resource concludes that the database cannot be started.
If the Auto_recovery extension property has been set to TRUE, the HADB resource will attempt to recover the database.
If recovery of the database is attempted, the probe executes the following steps:
- Issues the hadbm clear --fast command on one of the Sun Cluster nodes in the resource group's nodelist. This command clears the database contents and reinitializes and restarts the database.
- If the hadbm clear command succeeds, the command specified in Auto_recovery_command is issued on the same Sun Cluster node that issued the hadbm clear command. The command would normally be a script that contains the asadmin create-session-store command. The command can also take other actions. For example, it may send mail to the Application Server administrator.
- If both steps succeed, the probe sets the state of the resource to Online.
The iteration continues from the first step.

Note –

The Thorough_Probe_Interval and Stop_timeout parameters are tunable with the scrgadm command. For details, see “Standard Properties” in Sun Cluster Data Services Planning and Administration Guide for Solaris OS.

Previous: Maintaining the HADB Database