Understanding the HA for Oracle E-Business Suite up to Release 12.1 Fault Monitor

Language:

This section describes the HA for Oracle E-Business Suite up to release 12.1 fault monitor probing algorithm or functionality, states the conditions, and recovery actions associated with unsuccessful probing.

For conceptual information about fault monitors, see the Oracle Solaris Cluster 4.3 Concepts Guide .

Resource Properties

The HA for Oracle E-Business Suite up to release 12.1 fault monitor uses the same resource properties as resource type SUNW.gds. Refer to the SUNW.gds(5) man page for a complete list of resource properties used.

Probing Algorithm and Functionality

The HA for Oracle E-Business Suite up to release 12.1 fault monitor is controlled by the extension properties that control the probing frequency. The default values of these properties determine the preset behavior of the fault monitor. The preset behavior should be suitable for most Oracle Solaris Cluster installations. Therefore, you should tune the HA for Oracle E-Business Suite up to release 12.1 fault monitor only if you need to modify this preset behavior.

Setting the interval between fault monitor probes (Thorough_probe_interval)
Setting the timeout for fault monitor probes (Probe_timeout)
Setting the number of times the fault monitor attempts to restart the resource (Retry_count)

The HA for Oracle E-Business Suite up to release 12.1 fault monitor performs a check within an infinite loop. During each cycle, the fault monitor checks the relevant component and reports either a failure or success.

If the fault monitor is successful, it returns to its infinite loop and continues the next cycle of probing and sleeping.

If the fault monitor reports a failure, a request is made to the cluster to restart the resource. If the fault monitor reports another failure, another request is made to the cluster to restart the resource. This behavior continues whenever the fault monitor reports a failure.

If successive restarts exceed the Retry_count within the Thorough_probe_interval, a request is made to fail over the resource group onto a different node or zone.

Concurrent Manager Probe

Test whether at least one FND (Concurrent Manager) process is running. If this test fails, the probe restarts the Concurrent Manager Server resource.
Test whether the probe can still connect to the Oracle Database. If this test fails, the probe restarts the Concurrent Manager Server resource.
Calculate the number of concurrent processes running as a percentage of the maximum number of concurrent processes allowed. Then test whether that percentage is less than CON_LIMIT, when the Concurrent Manager Server resource was defined. If the percentage is less than CON_LIMIT, the probe restarts the Concurrent Manager Server resource.

Forms Server in Servlet Mode Probe

Test whether the f60srvm process is running. If f60srvm is found, then test whether f60webmx process is running. If f60webmx is not found, the probe retests after another iteration of the probe to determine whether f60webmx is still missing, because f60srvm usually restarts f60webmx. If after two successive probes, f60webmx is still missing or f60srvm is not found on any probe, the probe restarts the Forms Server resource.

Forms Server in Socket Mode Probe

Test whether the frmsrv process is running. If this test fails, the probe restarts the Forms Server in Socket Mode resource.