Tuning the HA for Oracle E-Business Suite 12.2 or Later Fault Monitor

Language:

This section describes the HA for Oracle E-Business Suite 12.2 or later fault monitor's probing algorithm or functionality, and states the conditions and recovery actions associated with unsuccessful probing.

Resource Properties

The HA for Oracle E-Business Suite 12.2 or later fault monitor uses the resource properties that are specified in the resource type ORCL.ebs. Refer to the r_properties(5) man page for a list of general resource properties used. Refer to the ORCL.ebs(5) man page Extension Properties for a specific list of resource properties for this resource type.

Probing Algorithm and Functionality

The HA for Oracle E-Business Suite 12.2 or later is controlled by extension properties that control the probing frequency. The default values of these properties determine the preset behavior of the fault monitor and are suitable for most Oracle Solaris Cluster installations. You can modify this preset behavior by modifying the following settings:

The interval between fault monitor probes (Thorough_probe_interval)
The timeout for fault monitor probes (Probe_timeout)
The number of times the fault monitor attempts to restart the resource (Retry_count)

The HA for Oracle E-Business Suite 12.2 or later fault monitor checks services within an infinite loop. During each cycle, the fault monitor checks the service state and reports either a partial or complete failure or success.

If the fault monitor is successful, it returns to the infinite loop and continues the next cycle of probing and sleeping.
If the fault monitor reports a complete failure, a request is made to the cluster to restart the resource. If the fault monitor reports another complete failure, another request is made to the cluster to restart the resource. This behavior continues whenever the fault monitor reports a complete failure. If successive restarts exceed the Retry_count within the Thorough_probe_interval, a request is made to fail over the resource group onto a different node.
If the fault monitor reports a partial failure, the return code value is used to sum successive partial failures. Once successive summed partial failures reaches or exceeds 100 a complete failure is declared.

Operations of the Fault Monitor

All service fault monitors check the following:

That the RUN File System can be sourced by the Apps OS User.
That various entries from the Application Tier's context file can be retrieved. If it is not possible to performs these checks, a complete failure is declared.

The following sections describe the operations of the fault monitor.

Node Manager Fault Monitor

Checks that the Node Manager process ID exists for the Application Tier. If the process ID does not exist a complete failure is declared.

As the OS Apps User, the WebLogic Scripting Tool is used to connect to and then disconnect from the Node Manager using the Node Manager Port value (nm_port value from the Application Tier's context file).

If it is possible to successfully connect to the Node Manager, the fault monitor reports a success. Otherwise a complete failure is declared.

WebLogic Admin Server Fault Monitor

Checks that the WebLogic Admin Server process ID exists for the Application Tier. If the process ID does not exist a complete failure is declared.

As the OS Apps User, the WebLogic Scripting Tool is used to connect to the Admin Server using the Admin Server Port value (wls_adminport value from the Application Tier's context file). Once connected, it obtains the state of the Admin Server and then disconnects from the Admin Server.

If it is not possible to connect using the WebLogic Scripting Tool, the ${EBS_DOMAIN_HOME}/servers/AdminServer/data/nodemanager/AdminServer.state file is used to determine the state.

If the state is RUNNING, then the fault monitor reports a success.

If the state is STARTING, the fault monitor reports a partial failure value of 10. Successive fault monitor partial failures are summed up and as soon as a value of 100 is reached, a complete failure is declared.

If the state is STANDBY, ADMIN, RESUMING, FAILED_RESTARTING, or ADMIN_ON_ABORTED_STARTUP, the fault monitor reports a partial failure value of 20. Successive fault monitor partial failures are summed up and as soon as a value of 100 is reached, a complete failure is declared. If the state is any other value, a complete failure is declared.

TNS Listener Fault Monitor

Checks that the TNS Listener process ID exists for the Application Tier. If the process ID does not exist, a complete failure is declared.

As the OS Apps User, the lsnrctl status ${FNDNAM}_${dbSid} command is used to determine the status of the TNS Listener. Values for FNDNAM and dbSid are derived from the Application Tier's context file.

If the status is RUNNING, the fault monitor reports a success. Otherwise, a complete failure is declared.

Oracle Process Manager Fault Monitor

Checks that the Oracle Process Manager process ID exists for the Application Tier. If the process ID does not exist, a complete failure is declared.

As the OS Apps User, the ${ohs_instance_loc}/opmnctl ping command is used to determine the status of the Oracle Process Manager status. The value for ohs_instance_loc is derived from the Application Tier's context file.

If the ping is successful, the fault monitor reports a success. Otherwise a complete failure is declared.

Oracle HTTP Server Fault Monitor

Checks that Oracle HTTP process ID exists for the Application Tier. If the process ID does not exist, a complete failure is declared.

As the OS Apps User, the opmnctl status ias-component=${ohs_component} command is used to determine the status of the Oracle HTTP Server status. The value for ohs_component is derived from the Application Tier's context file.

If the status is ALIVE, the fault monitor reports a success.

If the status is INIT, the fault monitor reports a partial failure value of 50. Successive fault monitor partial failures are summed up and as soon a value of 100 is reached, a complete failure is declared.

If the status is any other value, a complete failure is declared.

Web Application Services (oacore, oafm, forms, and forms-c4ws) Fault Monitor

Checks that the Web Application Service process ID exists for the Application Tier. If the process ID does not exist, a complete failure is declared.

As the OS Apps User, the WebLogic Scripting Tool is used to connect to the Web Application Service using the Web Application Service Port value (wls_service port value from the Application Tier's context file). Once connected, it obtains the state of the Web Application Service and then disconnects from the Web Application Service.

If it is not possible to connect using the WebLogic Scripting Tool, the ${EBS_DOMAIN_HOME}/servers/service_servern/data/nodemanager/service_servern.state file is used to determine the state.

If the state is RUNNING, the fault monitor reports a success.

If the state is any other value, a complete failure is declared.

Concurrent Manager Fault Monitor

As the APPS User, connect to the database using SQL*Plus to determine the actual and target Concurrent Manager OS process IDs.

When 2 or more actual process IDs are found, the fault monitor reports a success. Otherwise, a complete failure is declared.

If the status is any other value, a complete failure is declared.

Forms Server Fault Monitor

Checks that the Forms Server process ID exists for the Application Tier. If the process ID does not exist a complete failure is declared.