Tuning the HA for Oracle Web Tier Fault Monitors

Language:

The HA for Oracle Web Tier fault monitors are contained in the resources whose resource types are ORCL.ohs and ORCL.opmn.

System properties and extension properties of the resource control the behavior of the fault monitor. The default values of these properties determine the default behavior of the fault monitor. The default behavior should be suitable for most Oracle Solaris Cluster installations. Therefore, you should tune the HA for Oracle Web Tier fault monitors only if you need to modify this default behavior.

Tuning the HA for Oracle Web Tier fault monitors involves the following tasks:

Setting the interval between fault monitor probes
Setting the timeout for fault monitor probes
Defining the criteria for persistent faults
Specifying the failover behavior of a resource

Information about the HA for Oracle Web Tier fault monitor that you need to perform these tasks is provided in the subsections that follow.

Tune the HA for Oracle Web Tier fault monitor when you register and configure HA for Oracle Web Tier or after initial configuration. For more information, see Registering and Configuring HA for Oracle Web Tier Components.

Updates to the probe_timeout, start_timeout, stop_timeout, and thorough_probe_interval properties result in comparable updates in the opmn.xml file.

For detailed information, see Tuning Fault Monitors for Oracle Solaris Cluster Data Services in Oracle Solaris Cluster Data Services Planning and Administration Guide .

Operations by the HA for Oracle Web Tier Fault Monitors

The two resource types, ORCL.ohs and ORCL.opmn, contain separate fault probes that query the health of the Oracle HTTP Server and Oracle Process Management and Notification Server components, respectively.

Operations by the Oracle Process Management and Notification Server Fault Monitor

The ORCL.opmn fault probe for the Oracle Process Management and Notification Server component performs the following steps:

Checks that the opmnctl command exists in the /ORACLE-HOME/instances/INSTANCE-NAME/bin directory, and that the script is executable.
Checks that the opmn.xml file is valid by using the following command:
```
$ opmnctl validate
```

If either of these two checks fail, then an attempt is made to fail over (give over) the service to another node.
If both checks succeed, then the command opmnctl ping is run.
- If this command succeeds, the resource status is set to OK and the probe returns with an exit code of 0.
- If this command fails, the resource status is set to FAULTED and the probe returns with an exit code of 100, causing the resource to attempt to restart.

Operations by the Oracle HTTP Server Fault Monitor

Because the Oracle HTTP Server component is under the control of Oracle Process Management and Notification Server component, the ORCL.opmn fault probe obtains the status of the Oracle HTTP Server component from the Oracle Process Management and Notification Server component. This is done in two stages:

Checks that an Oracle HTTP Server component with type OHS is found in the output of the following command:
```
$ opmnctl status ias-component=COMPONENT-INSTANCE -noheaders -fmt "%typ"
```
Checks that the Oracle HTTP Server component is reported as ALIVE by the following command:
```
$ opmnctl status ias-component=COMPONENT-INSTANCE -noheaders -fmt "%sta"
```

If the fault probe is successful, the resource status is set to OK and the probe returns with an exit code of 0. If the fault probe fails, the resource status is set to FAULTED and the probe returns with an exit code of 100, causing the resource to attempt to restart.

Note - If the Oracle HTTP Server component is used as a load-balancer through the mod_wl_ohs plugin, then the Oracle Process Management and Notification Server component can declare that the Oracle HTTP Server component is DOWN if none of the load-balancing targets are available. In these circumstances, the fault probe for the Oracle HTTP Server component attempts to restart the service. You can avoid such behavior by creating a dependency between the load-balancer resource and the target resources.

Actions in Response to Faults

Based on the history of failures, a failure can cause either a local restart or a failover of the data service. For detailed information, see Tuning Fault Monitors for Oracle Solaris Cluster Data Services in Oracle Solaris Cluster Data Services Planning and Administration Guide .