Understanding the fault monitor of the Sun Cluster HA for N1 Service Provisioning System

This section describes the Sun Cluster HA for N1 Service Provisioning System fault monitor's probing algorithm and functionality, states the conditions, messages, recovery actions, and states the conditions and messages associated with unsuccessful probing.

For conceptual information on fault monitors, see the Sun Cluster Concepts Guide.

Resource Properties

The Sun Cluster HA for N1 Service Provisioning System fault monitor uses the same resource properties as the resource type SUNW.gds. Refer to the SUNW.gds(5) man page for a complete list of resource properties used.

Probing Algorithm and Functionality for the N1 Grid Service Provisioning System Master Server

The probing of the Master Server consists of two parts. One to probe the Apache Tomcat and a second part to probe the database.

The following steps are executed to monitor the sanity of the N1 Grid Service Provisioning System Master Server.

Sleeps for Thorough_probe_interval.
Pings the Host, which is configured in the Sun Cluster HA for N1 Service Provisioning System Master Server parameter file.
Connects to the Apache Tomcat via Host and Port. If the connection is successful it sends the TestCmd and tests whether the ReturnString comes back. If it fails, it is rescheduled after 5 seconds. If this fails again, then the probe will restart the Sun Cluster HA for N1 Service Provisioning System.

Caution –
The ReturnString cannot be Connection refused because this string will be returned if no connection is possible.
If the Apache Tomcat is operational, the probe manipulates the database table sc_test. If the connection to the database or the table manipulation is unsuccessful, the N1 Grid Service Provisioning System Master server will be restarted.
If the Apache Tomcat process and all the database processes died, pmf will interrupt the probe to immediately restart the N1 Grid Service Provisioning System Master Server.
If the N1 Grid Service Provisioning System Master Server is repeatedly restarted and subsequently exhausts the Retry_count within the Retry_interval, then a failover is initiated for the resource group onto another node. This is done if the resource property Failover_enabled is set to TRUE.

Probing Algorithm and Functionality for the N1 Grid Service Provisioning System Remote Agent

The probing of the Remote Agent is done by pmf only.

The following steps are executed to monitor the N1 Grid Service Provisioning System Remote Agent.

If the process of the Remote Agent has died, pmf will immediately restart the N1 Grid Service Provisioning System Remote Agent.
If the N1 Grid Service Provisioning System Remote Agent is repeatedly restarted and subsequently exhausts the Retry_count within the Retry_interval, then a failover is initiated for the resource group onto another node. This is done if the resource property Failover_enabled is set to TRUE.

Probing Algorithm and Functionality for the N1 Grid Service Provisioning System Local Distributor

The probing of the Local Distributor is done by pmf only.

The following steps are executed to monitor the N1 Grid Service Provisioning System Local Distributor.

If the process of the Local Distributor has died, pmf will immediately restart the N1 Grid Service Provisioning System Local Distributor.
If the N1 Grid Service Provisioning System Local Distributor is repeatedly restarted and subsequently exhausts the Retry_count within the Retry_interval, then a failover is initiated for the resource group onto another node. This is done if the resource property Failover_enabled is set to TRUE.