Go to main content

Oracle® Solaris Cluster Data Service for Apache Guide

Exit Print View

Updated: August 2018
 
 

Tuning the HA for Apache Fault Monitor

The HA for Apache fault monitor is contained in a resource whose resource type is SUNW.apache.

Standard properties and extension properties of the resource control the behavior of the fault monitor. The default values of these properties determine the preset behavior of the fault monitor. The preset behavior should be suitable for most Oracle Solaris Cluster installations. Therefore, you should tune the HA for Apache fault monitor only if you need to modify this preset behavior.

Tuning the HA for Apache fault monitor involves the following tasks:

  • Setting the interval between fault monitor probes

  • Setting the timeout for fault monitor probes

  • Defining the criteria for persistent faults

  • Specifying the failover behavior of a resource

Information about the HA for Apache fault monitor that you need to perform these tasks is provided in the subsections that follow.

Tune the HA for Apache fault monitor when you register and configure HA for Apache. For more information, see Registering and Configuring HA for Apache.

For detailed information, see Tuning Fault Monitors for Oracle Solaris Cluster Data Services in Planning and Administering Data Services for Oracle Solaris Cluster 4.4.

Operations by the HA for Apache Fault Monitor

The HA for Apache probe sends a request to the server to query the health of the Apache server.

Operations by the Fault Monitor Before a Probe

Before querying the Apache server, the probe checks to confirm that network resources are configured for this Apache resource. If no network resources are configured, an error message (No network resources found for resource) is logged, and the probe exits with failure.

Operations for a Web Server

For a web server, the probe connects to the Apache server and performs an HTTP 1.0 HEAD check by sending the HTTP request and receiving a response. In turn, the probe connects to the Apache server on each IP address/port combination.

The result of this query can be either a failure or a success. If the probe successfully receives a reply from the Apache server, the probe returns to its infinite loop and continues the next cycle of probing and sleeping.

The query can fail for various reasons, such as heavy network traffic, heavy system load, and misconfiguration. Misconfiguration can occur if you did not configure the Apache server to listen on all of the IP address/port combinations that are being probed. The Apache server should service every port for every IP address that is specified for this resource.

The following probe failures are considered as complete failures.

  • Failure to connect to the server, as the following error message flags, with %s indicating the hostname and %d the port number.

    Failed to connect to %s port %d %s
  • Running out of time (exceeding the resource property timeout Probe_timeout) after trying to connect to the server.

  • Failure to successfully send the probe string to the server, as the following error message flags, with the first %s indicating the hostname, %d the port number, and the second %s indicating further details about the error.

    Failed to communicate with server %s port %d: %s

When the monitor accumulates two partial failures within the resource property interval Retry_interval, it counts them as one complete failure.

The following probe failures are considered as partial failures:

  • Running out of time (exceeding the resource property timeout Probe_timeout) while trying to read the reply from the server to the probe's query.

  • Failing to read data from the server for other reasons, as the following error message flags, with the first %s indicating the hostname and %d the port number. The second %s indicates further details about the error.

    Failed to communicate with server %s port %d: %s

Operations for a Monitored URI List

If you have configured URIs in the Monitor_Uri_List extension property, then the probe connects to the HA for Apache server and performs an HTTP 1.1 GET check by sending a HTTP request and receiving a response to each of the URIs in Monitor_Uri_List. If the HTTP server return code is 500 (Internal Server Error) or if the connect fails, the probe will take action.


Note -  The Monitor_Uri_List extension property supports HTTP requests only. It does not support HTTPs requests.

The result of the HTTP requests is either failure or success. If all of the requests successfully receive a reply from the HA for Apache server, the probe returns and continues the next cycle of probing and sleeping.

Heavy network traffic, heavy system load, and misconfiguration can cause the HTTP GET probe to fail. Misconfiguration of the Monitor_Uri_List property can cause a failure if a URI in the Monitor_Uri_List includes an incorrect port or hostname. For example, if the web server instance is listening on logical host schost-1 and the URI was specified as http://schost-2/servlet/monitor, the probe will try to contact schost-2 to request /servlet/monitor.

Operations for a HTTP Web Server

For a HTTP web server, the probe connects to each IP address and port combination. If this connection attempt succeeds, the probe disconnects and returns with a success status. No further checks are performed.

Actions in Response to Faults

Based on the history of failures, a failure can cause either a local restart or a failover of the data service. For detailed information, see Tuning Fault Monitors for Oracle Solaris Cluster Data Services in Planning and Administering Data Services for Oracle Solaris Cluster 4.4.