Oracle® Solaris Cluster Data Services Developer's Guide

Exit Print View

Updated: July 2014, E39646-01
 
 

Determining the Fault Monitor Action

The xfnts_probe method calls scds_fm_action() to determine the action to take.

    The logic in scds_fm_action() is as follows:

  • Maintain a cumulative failure history within the value of the Retry_interval property.

  • If the cumulative failure reaches 100 (complete failure), restart the data service. If Retry_interval is exceeded, reset the history.

  • If the number of restarts exceeds the value of the Retry_count property, within the time specified by Retry_interval, fail over the data service.

For example, suppose the probe makes a connection to the xfs server, but fails to disconnect. This indicates that the server is running, but could be hung or just under a temporary load. The failure to disconnect sends a partial (50) failure to scds_fm_action(). This value is below the threshold for restarting the data service, but the value is maintained in the failure history.

If during the next probe the server again fails to disconnect, a value of 50 is added to the failure history maintained by scds_fm_action(). The cumulative failure value is now 100, so scds_fm_action() restarts the data service.