Tuning the Oracle Essbase Server Fault Monitors

Language:

This section describes the Oracle Essbase Server fault monitor's probing algorithm or functionality, and states the conditions, messages, and recovery actions associated with unsuccessful probing:

For conceptual information about fault monitors, see the Oracle Solaris Cluster 4.3 Concepts Guide.

Resource Properties

The Oracle Essbase Server fault monitor uses the resource properties that are specified in the resource type ORCL.essbase. See the r_properties(5) man page for a list of general resource properties used. See “ORCL.essbase Extension Properties” on page 43 for a list of resource properties for this resource type.

Probing Algorithm and Functionality

The Oracle Essbase Server is controlled by extension properties that control the probing frequency. The default values of these properties determine the preset behavior of the fault monitor and are suitable for most Oracle Solaris Cluster installations.

You can modify this preset behavior by modifying the following settings:

The interval between fault monitor probes (Thorough_probe_interval).
The timeout for fault monitor probes (Probe_timeout).
The number of times the fault monitor attempts to restart the resource (Retry_count). The HA for Oracle Essbase Server fault monitor checks the server status within an infinite loop. During each cycle, the fault monitor checks the server state and reports failure or success.
If the fault monitor is successful, it returns to its infinite loop and continues the next cycle of probing and sleeping.
If the fault monitor reports a failure, a request is made to the cluster to restart the resource. If the fault monitor reports another failure, another request is made to the cluster to restart the resource. This behavior continues whenever the fault monitor reports a failure. If successive restarts exceed the Retry_count within the Thorough_probe_interval, a request is made to fail over the resource group onto a different node.

Operation of the Oracle Essbase Server Probe

The following list explains how the Oracle Essbase Server probe operates:

Because the Oracle Essbase Server component is under the control of Oracle Process Management and Notification (OPMN) Server component, the ORCL.opmn fault probe obtains the status of the Oracle Essbase Server component from the OPMN Server component. The following commands show how the status is obtained:
1. Check whether an Oracle Essbase Server component is found in the output of the following command:
```
$ opmnctl status ias-component=COMPONENT-INSTANCE | grep Ess
```
2. Check whether the status of the Oracle Essbase Server component is ALIVE.
```
$ opmnctl status ias-component=COMPONENT-INSTANCE -noheaders -fmt "%sta"
```
If the fault probe is successful, the resource status is set to OK. The probe returns with an exit code of 0.
If the fault probe fails, the resource status is set to FAULTED. The probe returns with an exit code of 100, causing the resource to attempt to restart.

Note - The Oracle Essbase Server probe also checks for the strong and offline restart resource dependencies on a database resource, ORCL.opmn, and storage resource respectively. If database resource is not in the online state, ORCL.essbase resource acquires a degraded status.
If the Oracle Essbase Server resource is repeatedly restarted and subsequently exhausts the Retry_count within the Retry_interval, and if Failover_enabled is set to True, a failover to another node is initiated for the resource group.