Defining the Criteria for Persistent Faults (Sun Cluster Data Service for SAP DB Guide for Solaris OS)

Sun Cluster Data Service for SAP DB Guide for Solaris OS

Defining the Criteria for Persistent Faults

To minimize the disruption that transient faults in a resource cause, a fault monitor restarts the resource in response to such faults. For persistent faults, more disruptive action than restarting the resource is required:

For the SAP DB resource, the fault monitor fails over the resource to another node. The SAP DB resource is a failover resource.
For the SAP xserver resource, the fault monitor takes the resource offline. The SAP xserver is a scalable resource.

The fault monitors treat a fault as persistent if the number of attempts to restart a resource exceeds a specified threshold within a specified retry interval. Defining the criteria for persistent faults enables you to set the threshold and the retry interval to accommodate the performance characteristics of your cluster and your availability requirements.

Dependencies of the Threshold and the Retry Interval on Other Properties

The maximum length of time that is required for a single restart of a faulty resource is the sum of the values of the following properties:

Thorough_probe_interval system property
Probe_timeout extension property

To ensure that you allow enough time for the threshold to be reached within the retry interval, use the following expression to calculate values for the retry interval and the threshold:

retry-interval ≥ threshold × (thorough-probe-interval + probe-timeout)

System Properties for Setting the Threshold and the Retry Interval

To set the threshold and the retry interval, set the following system properties:

To set the threshold, set the Retry_count system property to the maximum allowed number of restarts.
To set the retry interval, set the Retry_interval system property to the interval in seconds that you require.

Set these properties for each resource that contains a Sun Cluster HA for SAP DB fault monitor that you need to tune. The resource types of these resources are shown in Table 1–3.

Other Effects of the Retry Interval

Besides defining a criterion for persistent faults, the retry interval affects the response of a fault monitor to the following faults:

Unavailability of SAP xserver that the SAP DB fault monitor detects. If the SAP DB fault monitor detects that SAP xserver is unavailable twice within the retry interval, the SAP DB fault monitor restarts SAP xserver.
Persistent system errors. A persistent system error is a system error that occurs four times within the retry interval. If a persistent system error occurs, the fault monitor restarts SAP xserver.