Sun Cluster Data Services Developer's Guide for Solaris OS

Determining the Fault Monitor Action

The xfnts_probe method calls scds_fm_action() to determine the action to take. The logic in scds_fm_action() is as follows:

For example, suppose the probe makes a connection to the xfs server, but fails to disconnect. This indicates that the server is running, but could be hung or just under a temporary load. The failure to disconnect sends a partial (50) failure to scds_fm_action(). This value is below the threshold for restarting the data service, but the value is maintained in the failure history.

If during the next probe the server again fails to disconnect, a value of 50 is added to the failure history maintained by scds_fm_action(). The cumulative failure value is now 100, so scds_fm_action() restarts the data service.