Tuning the Sun Cluster HA for Sun Java System Message Queue Fault Monitor

The Sun Cluster HA for Sun Java System Message Queue fault monitor is contained in the resource that represents Sun Java System Message Queue. You create this resource when you register and configure Sun Cluster HA for Sun Java System Message Queue. For more information, see Registering and Configuring the Sun Cluster HA for Sun Java System Message Queue.

System properties and extension properties of this resource control the behavior of the fault monitor. The default values of these properties determine the preset behavior of the fault monitor. The preset behavior should be suitable for most Sun Cluster installations. Therefore, you should tune the Sun Cluster HA for Sun Java System Message Queue fault monitor only if you need to modify this preset behavior.

For more information, see the following sections.

Operations by the Fault Monitor During a Probe

The Sun Cluster HA for Sun Java System Message Queue fault monitor uses the Smooth_shutdown extension property. For instructions on setting this property, see Setting Sun Cluster HA for Sun Java System Message Queue Extension Properties.

The Sun Cluster HA for Sun Java System Message Queue probe sends a request to the server to query the health of the Sun Java System Message Queue server instance.

The probe connects to the IP address and port combinations defined by the network resource configuration and the Port_list setting for the resource group. If the connection succeeds, the probe reads the port mapper information. Finally the probe disconnects. If any part of the connection fails, a failure is recorded.

Heavy network traffic, heavy system load, and misconfiguration can cause the query to fail. Misconfiguration can occur if you did not configure the Sun Java System Message Queue server to listen on all the IP address and port combinations that are probed. The Sun Java System Message Queue server should service every port for every IP address that is specified for this resource.

When the probe fails to connect to the server, a complete probe failure occurs. The following error message is sent, where the %s indicates the hostname and %d indicates the port number.

Failed to connect to the host <%s> and port <%d>.

The probe accumulates partial failures that happen within the resource property interval Retry_interval until they equal a complete failure that requires action.

The following are partial probe failures.

Failure to disconnect. The following error message is sent, where %d indicates the port number and %s indicates the resource name.
Failed to disconnect from port %d of resource %s.
Failure to complete all probe steps within Probe_timeout time.
Failure to read data from the server for other reasons. The following error message is sent, where the first %s indicates the hostname, %d indicates the port number, and the second %s indicates further details about the error.
Failed to communicate with server %s port %d: %s

Based on the history of failures, a failure can cause either a local restart or a failover of the data service.