Sun Cluster Data Service for Sun Java System Message Queue Guide for Solaris OS

Probing Algorithm and Functionality

The Sun Cluster HA for Sun Java System Message Queue probe sends a request to the server to query the health of the Sun Java System Message Queue server. The probe executes the following steps:

  1. Probes the Sun Java System Message Queue instance according to the time-out value set with the Probe_timeout resource property.

  2. Connects to the IP address and port combinations defined by the network resource configuration and the Port_list setting for the resource group. If the connection succeeds, the probe reads the port mapper information. Finally the probe disconnects. If any part of the connection fails, a failure is recorded.

    Heavy network traffic, heavy system load, and misconfiguration can cause the query to fail. Misconfiguration can occur if you did not configure the Sun Java System Message Queue server to listen on all of the IP address and port combinations that are probed. The Sun Java System Message Queue server should service every port for every IP address that is specified for this resource.

    The following is a complete probe failure.

    The following error message is received upon failure to connect to the server. The %s indicates the hostname and %d indicates the port number.


    Failed to connect to the host <%s> and port <%d>.

  3. Accumulates partial failures that happen within the resource property interval Retry_interval until they equal a complete failure that requires action.

    The following are partial probe failures.

    • Failure to disconnect from port %d of resource %s.


      Failed to disconnect from port %d of resource %s.

    • Failure to complete all probe steps within Probe_timeout time.

    • The following error message is received when there is a failure to read data from the server for other reasons. The first %s indicates the hostname and %d indicates the port number. The second %s indicates further details about the error.


      Failed to communicate with server %s port %d: %s

Based on the history of failures, a failure can cause either a local restart or a failover of the data service.