Sun Cluster Data Service for Oracle Application Server Guide for Solaris OS

Understanding the Sun Cluster HA for Oracle Application Server Fault Monitor

This section describes the Sun Cluster HA for Oracle Application Server fault monitor's probing algorithm or functionality.

For conceptual information on fault monitors, see the Sun Cluster Concepts Guide.

Resource Properties

The Sun Cluster HA for Oracle Application Server fault monitor uses the same resource properties as resource type SUNW.gds. Refer to the SUNW.gds(5) man page for a complete list of resource properties used.

Probing Algorithm and Functionality

Oracle Internet Directory Monitor (OIDMON)

Note –
This test is only performed for the Oracle 9iAS Infrastructure.
- Sleeps for Thorough_probe_interval
- Test whether the OIDMON process is running. If this fails, then the probe will restart the OIDMON resource.
- If the OIDMON resource is repeatedly restarted and subsequently exhausts the Retry_count within the Retry_interval then a failover is initiated for the Resource Group onto another node.

Oracle Internet Directory Process (OIDLDAP)

Note –
This test is only performed for the Oracle 9iAS Infrastructure.
- Sleeps for Thorough_probe_interval
- Test whether the directory service is available by running $ORACLE_HOME/bin/ldapsearch. If this fails, then the probe will issue a half failure as usually the Oracle Internet Directory Monitor (OIDMON) process will restart the Oracle Internet Directory Process (OIDLDAP). If at the next probe cycle the test fails again then another half failure is issued. If two half failures are issued by successive probes, then the probe will restart the OIDLDAP resource.
- If the OIDLDAP resource is repeatedly restarted and subsequently exhausts the Retry_count within the Retry_interval then a failover is initiated for the Resource Group onto another node. However, in reality because the OIDLDAP probe reports a half failure every time the test fails and that usually the OIDMON process is responsible for restarting the OIDLDAP process, it is very unlikely that a failover will be initiated.

Oracle Process Management and Notification (OPMN)
- Sleeps for Thorough_probe_interval
- Test whether the OPMN process is running. If this fails, then the probe will restart the OPMN resource.
- For Oracle 9iAS, test whether each managed OPMN component reported by $ORACLE_HOME/dcm/bin/dcmctl getstate -v is Up. If this fails, then the probe will try to (re)start the OPMN component. However, in reality the OPMN process is responsible for restarting these components. If the OPMN probe tries to (re)start the OPMN component and the OPMN process has already tried to start the OPMN component then the duplicate (re)start will simply be ignored.
- For Oracle 10g AS, test whether OPMN is working by $ORACLE_HOME/opmn/bin/opmnctl status. If this fails, then the probe will report an error and request a restart.
- If the OPMN resource is repeatedly restarted and subsequently exhausts the Retry_count within the Retry_interval then a failover is initiated for the Resource Group onto another node.

Enterprise Manager (EM)
- Sleeps for Thorough_probe_interval
- Test whether the EM process is running. If this fails, then the probe will restart the EM resource.
- Test whether the EM status is EMD is up and running. If this fails, then the probe will restart the EM resource.
- If the EM resource is repeatedly restarted and subsequently exhausts the Retry_count within the Retry_interval then a failover is initiated for the Resource Group onto another node.