The DSDL absorbs much of the complexity of implementing a fault monitor by providing a predetermined model. A Monitor_start method launches the fault monitor, under the control of PMF, when the resource starts on a node. The fault monitor runs in loop as long as the resource is running on the node. The high-level logic of a DSDL fault monitor is as follows.
The scds_fm_sleep function uses the Thorough_probe_interval property to determine the amount of time between probes. Any application process failures determined by PMF during this interval lead to a restart of the resource.
The probe itself returns a value indicating the severity of failures, from 0, no failure, to 100 complete failure.
The probe return value is sent to the scds_action function, which maintains a cumulative failure history within the interval of the Retry_interval property.
The scds_action function determines what to do in the event of failure, as follows.
If the cumulative failure is below 100, do nothing.
If the cumulative failure reaches 100 (complete failure) restart the data service. If Retry_interval is exceeded, reset the history.
If the number of restarts exceeds the value of the Retry_count property, within the time specified by Retry_interval, failover the data service.