Sun Cluster Data Service for Sun Java System Web Server Guide for Solaris OS

Sun Cluster HA for Sun Java System Web Server Fault Monitor

The probe for Sun Cluster HA for Sun Java System Web Server uses a request to the server to query the health of that server. Before the probe actually queries the server, a check is made to confirm that network resources are configured for this web server resource. If no network resources are configured, an error message (No network resources found for resource) is logged, and the probe exits with failure.

The probe must address the following two configurations of Sun Java System Web Server.

If the web server is in secure mode and if the probe cannot get the secure ports from the configuration file, an error message (Unable to parse configuration file) is logged, and the probe exits with failure. The secure and insecure instance probes involve common steps.

The probe uses the time-out value that the resource property Probe_timeout specifies to limit the time spent trying to successfully probe Sun Java System Web Server. See “Standard Properties” in Sun Cluster Data Services Planning and Administration Guide for Solaris OS for details on this resource property.

The Network_resources_used resource-property setting on the Sun Java System Web Server resource determines the set of IP addresses that the web server uses. The Port_list resource-property setting determines the list of port numbers that Sun Java System Web Server uses. The fault monitor assumes that the web server is listening on all combinations of IP and port. If you customize your web server configuration to listen on different port numbers (in addition to port 80), ensure that your resultant configuration (magnus.conf) file contains all possible combinations of IP addresses and ports. The fault monitor attempts to probe all such combinations and might fail if the web server is not listening on a particular IP address and port combination.

The probe executes the following steps.

  1. The probe uses the specified IP address and port combination to connect to the web server. If the connection is unsuccessful, the probe concludes that a complete failure has occurred. The probe then records the failure and takes appropriate action.

  2. If the probe successfully connects, the probe checks if the web server is run in a secure mode. If so, the probe disconnects and returns with a success status. No further checks are performed for a secure Sun Java System Web Server.

    However, if the web server is running in insecure mode, the probe sends an HTTP 1.0 HEAD request to the web server and waits for the response. The request can be unsuccessful for various reasons, including heavy network traffic, heavy system load, and misconfiguration.

    Misconfiguration can occur when the web server is not configured to listen on all IP address and port combinations that are being probed. The web server should service every port for every IP address specified for this resource.

    Misconfigurations can also result if the Network_resources_used and Port_list resource properties are not set correctly while you create the resource.

    If the reply to the query is not received within the Probe_timeout resource time limit, the probe considers this probe a failure of Sun Cluster HA for Sun Java System Web Server. The failure is recorded in the probe's history.

    A probe failure can be a complete or partial failure. The following probe failures are considered complete failures.

    • Failure to connect to the server, as the following error message flags, with %s indicating the host name and %d the port number.


      Failed to connect to %s port %d
    • Running out of time (exceeding the resource-property timeout Probe_timeout) after trying to connect to the server.

    • Failure to successfully send the probe string to the server, as the following error message flags, with the first %s indicating the host name and %d the port number. The second %s indicates further details about the error.


      Failed to communicate with server %s port %d: %s

    The monitor accumulates two such partial failures within the resource-property interval Retry_interval and counts them as one failure.

    The following probe failures are considered partial failures.

    • Running out of time (exceeding the resource-property timeout Probe_timeout) while trying to read the reply from the server to the probe's query.

    • Failing to read data from the server for other reasons, as the following error message flags, with the first %s indicating the host name and %d the port number. The second %s indicates further details about the error.


      Failed to communicate with server %s port %d: %s
  3. The probe connects to the Sun Java System Web Server server and performs an HTTP 1.1 GET check by sending a HTTP request to each of the URIs in Monitor_Uri_List. If the HTTP server return code is 500 (Internal Server Error) or if the connect fails, the probe will take action.

    The result of the HTTP requests is either failure or success. If all of the requests successfully receive a reply from the Sun Java System Web Server server, the probe returns and continues the next cycle of probing and sleeping.

    Heavy network traffic, heavy system load, and misconfiguration can cause the HTTP GET probe to fail. Misconfiguration of the Monitor_Uri_List property can cause a failure if a URI in the Monitor_Uri_List includes an incorrect port or hostname. For example, if the web server instance is listening on logical host schost-1 and the URI was specified as http://schost-2/servlet/monitor, the probe will try to contact schost-2 to request /servlet/monitor.

    Based on the history of failures, a failure can cause either a local restart or a failover of the data service. This action is further described in “Sun Cluster Data Service Fault Monitors” in Sun Cluster Data Services Planning and Administration Guide for Solaris OS.