Sun Cluster Data Service for Oracle Application Server Guide for Solaris OS

Understanding the Sun Cluster HA for Oracle Application Server Fault Monitor

This section describes the Sun Cluster HA for Oracle Application Server fault monitor probing algorithm or functionality, states the conditions, and recovery actions associated with unsuccessful probing.

For conceptual information on fault monitors, see the Sun Cluster Concepts Guide.

Resource Properties

The Sun Cluster HA for Oracle Application Server fault monitor uses the same resource properties as resource type SUNW.gds. Refer to the SUNW.gds(5) man page for a complete list of resource properties used.

Probing Algorithm and Functionality

The Sun Cluster HA for Oracle Application Server fault monitor is controlled by the extension properties that control the probing frequency. The default values of these properties determine the preset behavior of the fault monitor. The preset behavior should be suitable for most Sun Cluster installations. Therefore, you should tune the Sun Cluster HA for Oracle Application Server fault monitor only if you need to modify this preset behavior.

Setting the interval between fault monitor probes (Thorough_probe_interval)
Setting the timeout for fault monitor probes (Probe_timeout)
Setting the number of times the fault monitor attempts to restart the resource (Retry_count)

The Sun Cluster HA for Oracle Application Server fault monitor checks the broker and other components within an infinite loop. During each cycle the fault monitor will check the relevant component and report either a failure or success.

If the fault monitor is successful it returns to its infinite loop and continues the next cycle of probing and sleeping.

If the fault monitor reports a failure a request is made to the cluster to restart the resource. If the fault monitor reports another failure another request is made to the cluster to restart the resource. This behavior will continue whenever the fault monitor reports a failure.

If successive restarts exceed the Retry_count within the Thorough_probe_interval a request to failover the resource group onto a different node or zone is made.

Operations of the Oracle 9iAS Application Server Probe

The Oracle 9iAS Application Server probe checks the following:

Test whether the OIDMON process is running. If this fails, then the probe will restart the OIDMON resource.
Test whether the directory service is available by running $ORACLE_HOME/bin/ldapsearch. If this fails, then the probe will issue a half failure as usually the Oracle Internet Directory Monitor OIDMON process will restart the Oracle Internet Directory Process OIDLDAP. If at the next probe cycle the test fails again then another half failure is issued. If two half failures are issued by successive probes, then the probe will restart the OIDLDAP resource.
Test whether each managed OPMN component reported by $ORACLE_HOME/dcm/bin/dcmctl getstate -v is Up. If this fails, then the probe will try to restart the OPMN component. However, in reality the OPMN process is responsible for restarting these components. If the OPMN probe tries to restart the OPMN component and the OPMN process has already tried to start the OPMN component then the duplicate restart will simply be ignored.

Operations of the Oracle Application Server 10g Probe

The Oracle Application Server 10g probe checks the following:

Test whether OPMN is working by $ORACLE_HOME/opmn/bin/opmnctl status. If this fails, then the probe will report an error and request a restart.
Test whether the EM status is EMD is up and running. If this fails, then the probe will restart the EM resource.