Sun Cluster Data Service for Sun Java System Application Server Guide for Solaris OS

Tuning the Sun Cluster HA for Sun Java System Application Server Fault Monitor

This section explains the Sun Cluster HA for Sun Java System Application Server Fault Monitor.

This section provides the following information.

Extension Properties

The Sun Cluster HA for Sun Java System Application Server Fault Monitor uses the extension properties described in the following table. The Tunable entry indicates if you can update the property dynamically or if you can only update the property at creation.

Use the command-line scrgadm -x parameter=value to configure extension properties when you create the Sun Java System Application Server resource. See the SUNW.s1as(5M) man page for more information about extension properties. See “Standard Properties” in Sun Cluster Data Services Planning and Administration Guide for Solaris OS for details on all Sun Cluster Data Service properties.

Table 1–2 Sun Cluster HA for Sun Java System Application Server Extension Properties

Name/Data Type 

Description 

Confdir_list(string array)

The complete path to the configuration directory of a particular instance of the Sun Java System Application Server. 

Default: None

Tunable: At creation

Monitor_Uri_List(string)

A single URI or list of URIs which can be used by the fault monitor to test the functionality of the Sun Java System Application Server. The fault monitor tests the application server by doing an HTTP GET on the URI. The Monitor_Uri_List extension property can be used to probe deployed application functionality. Probe deployed applications by setting the property to one or more URIs that are serviced by applications deployed on the Sun Java System Application Server. If the HTTP server return code is 500 (Internal Server Error) or if the connect fails, the probe will take action. See the probe method for more details.

Default: Null

Tunable: Any time

Probing Algorithm and Functionality

The Sun Cluster HA for Sun Java System Application Server probe sends a request to the server to query the health of the Sun Java System Application Server server. The probe executes the following steps:

  1. Probes the Sun Java System Application Server instance according to the time-out value set with the Probe_timeout resource property.

  2. Connects to the IP address and port combinations defined by the network resource configuration and the Port_list setting for the resource group. If the resource is configured without an empty Port_list this step is skipped. If connection succeeds, the probe disconnects. If the connection fails, the failure is recorded.

    Heavy network traffic, heavy system load, and misconfiguration can cause the query to fail. Misconfiguration can occur if you did not configure the Sun Java System Application Server server to listen on all of the IP address/port combinations that are probed. The Sun Java System Application Server server should service every port for every IP address that is specified for the resource.

  3. Connects to the Sun Java System Application Server server and performs an HTTP 1.1 GET check by sending a HTTP request and receiving a response to each of the URIs in Monitor_Uri_List.

    The result of the HTTP requests is either failure or success. If all of the requests successfully receive a reply from the Sun Java System Application Server server, the probe returns and continues the next cycle of probing and sleeping.

    Heavy network traffic, heavy system load, and misconfiguration can cause the HTTP GET probe to fail. Misconfiguration of the Monitor_Uri_List property can cause a failure if a URI in the Monitor_Uri_List includes an incorrect port or hostname. For example, if the application server instance is listening on logical host schost-1 and the URI was specified as http://schost-2/servlet/monitor, the probe will try to contact schost-2 to request /servlet/monitor.

  4. Records a failure in the history log if the reply to the probe is not received within the Probe_timeout limit. The probe considers this scenario a failure on the part of the Sun Java System Application Server data service. A Sun Java System Application Server probe failure can be a complete failure or a partial failure.

    If the reply to the probe is received within the Probe_timeout limit, the HTTP response code is checked. If the response code is 500 “Internal Server Error”, the probe is considered a complete failure. All other response codes are ignored.

    The following are complete probe failures.

    • The following error message is received upon failure to connect to the server. The %s indicates the hostname and %d indicates the port number.


      Failed to connect to the host <%s> and port <%d>. Receiving a response code of 500 “Internal Server Error” HTTP GET Response Code for probe of %s is 500. Failover will be in progress

    • The following error message is received upon failure to successfully send the probe string to the server. The first %s indicates the hostname, the %d indicates the port number, and the second %s indicates further details about the error.


      Write to server failed: server %s port %d: %s.

  5. The monitor accumulates partial failures that occur within the Retry_interval resource property setting until they equal a complete failure.

    The following are partial probe failures:

    • The following error message is received when there is a failure to disconnect before the Probe_timeout setting lapses. The first %d indicates the port number and the %s indicates the resource name.


      Failed to disconnect from port %d of resource %s.

    • Failure to complete all probe steps within Probe_timeout time is a partial failure.

    • The following error message is received when there is a failure to read data from the server for other reasons. The first %s indicates g the hostname and %d indicates the port number. The second %s indicates further details about the error.


      Failed to communicate with server %s port %d: %s

  6. Based on the history of failures, a failure can cause either a local restart or a failover of the data service.