Understanding the HA for WebSphere MQ Fault Monitor

Language:

This section describes the HA for WebSphere MQ fault monitor probing algorithm or functionality, states the conditions, and recovery actions associated with unsuccessful probing.

For conceptual information on fault monitors, see the Oracle Solaris Cluster 4.3 Concepts Guide .

Resource Properties

The HA for WebSphere MQ fault monitor uses the same resource properties as resource type SUNW.gds. Refer to the SUNW.gds(5) man page for a complete list of resource properties used.

Probing Algorithm and Functionality

The HA for WebSphere MQ fault monitor is controlled by the extension properties that control the probing frequency. The default values of these properties determine the preset behavior of the fault monitor. The preset behavior should be suitable for most Oracle Solaris Cluster installations. Therefore, you should tune the HA for WebSphere MQ fault monitor only if you need to modify this preset behavior.

Setting the interval between fault monitor probes (Thorough_probe_interval)
Setting the time-out for fault monitor probes (Probe_timeout)
Setting the number of times the fault monitor attempts to restart the resource (Retry_count)

The HA for WebSphere MQ fault monitor checks the queue manager and other components within an infinite loop. During each cycle the fault monitor will check the relevant component and report either a failure or success.

If the fault monitor is successful it returns to its infinite loop and continues the next cycle of probing and sleeping.

If the fault monitor reports a failure a request is made to the cluster to restart the resource. If the fault monitor reports another failure another request is made to the cluster to restart the resource. This behavior will continue whenever the fault monitor reports a failure.

If successive restarts exceed the Retry_count within the Thorough_probe_interval a request to failover the resource group onto a different node is made.

Operations of the Queue Manager Probe

The IBM MQ queue manager probe checks the queue manager by using a program named create_tdq which is included in the HA for WebSphere MQ data service.

The create_tdq program connects to the queue manager, creates a temporary dynamic queue, puts a message to the queue and then disconnects from the queue manager.

Operations of the Channel Initiator, Command Server, Listener and Trigger Monitor Probes

The IBM MQ probe for the channel initiator, command server, listener and trigger monitor all operate in a similar manner and will simply restart any component that has failed.

The process monitor facility will request a restart of the resource as soon as any component has failed.

The channel initiator, command server and trigger monitor are all dependent on the queue manger being available. The listener has an optional dependency on the queue manager that is set when the listener resource is configured and registered. Therefore if the queue manager fails the channel initiator, command server, trigger monitor and optional dependent listener will be restarted when the queue manager is available again.