Sun Cluster Data Service for Sun Java System Application Server EE (HADB) Guide for Solaris OS

Operations by the Fault Monitor During a Probe

The start method of the HADB resource starts HADB nodes that are configured to run on the local Sun Cluster node if they are not running. The method then attempts to start the HADB database. If a failure occurs, the database is started later in the probe.

The fault monitor probe periodically checks the status of the HADB database and the HADB nodes. The probe restarts failed HADB nodes. The probe also starts the HADB database if the HADB resource is not ready to start the database during the start method. For each iteration of this procedure, the probe executes the following steps:

  1. The probe retrieves the current status of the HADB database and the HADB nodes by executing the hadbm status and hadbm status --nodes commands.

  2. If the database is not running, the probe checks that the HADB stopstate file corresponding to that database exists on the local Sun Cluster node. The hadbm start command references the stopstate file for role assignment of nodes when it starts the database.

  3. If the stopstate file exists, the HADB resource examines it to determine if the database can be started.

    • If the database can be started, the probe starts the database and sets the resource status to Online.

    • If the database cannot be started, the probe sets the resource status to Online Degraded.

  4. If the database is running, the probe starts the HADB nodes configured to run on the local Sun Cluster node.

  5. If the database and the local HADB nodes are running, the probe sets the resource status to Online if it was Online Degraded.

  6. If all the Sun Cluster nodes in the HADB resource group have the HADB resource running in the Online Degraded state longer than Stop_timeout seconds, the HADB resource concludes that the database cannot be started. For a description of the Stop_timeout property, see the method_timeout resource property in Appendix A, Standard Properties, in Sun Cluster Data Services Planning and Administration Guide for Solaris OS.

  7. If the Auto_recovery extension property is set to TRUE, the HADB resource attempts to recover the database.

  8. If recovery of the database is attempted, the probe executes the following steps:

    • Issues the hadbm clear --fast command on one of the Sun Cluster nodes in the resource group's node list. This command clears the database contents and reinitializes and restarts the database.

    • If the hadbm clear command succeeds, the command specified in Auto_recovery_command is issued on the same Sun Cluster node that issued the hadbm clear command. The command would normally be a script that contains the asadmin create-session-store command. The command can also take other actions. For example, it might send mail to the Application Server administrator.

    • If both steps succeed, the probe sets the state of the resource to Online.