Sun Cluster 3.0 Data Services Installation and Configuration Guide

Monitoring of the Abnormal Exit of the Server Process

The Process Monitor Facility (PMF) monitors the data service process. On abnormal exit, the PMF invokes an action script supplied by the data service to communicate the failure to the data service fault monitor.

This communication between the PMF action script and the probe occurs over a UNIX domain socket. The only communication intended to take place through the UNIX domain socket is when the PMF informs the probe, through the action script, that the data service has exited abnormally. This event is considered a total failure of the data service.

The data service fault probe runs in an infinite loop and sleeps for an adjustable amount of time set by the resource property Thorough_probe_interval. While sleeping, the probe polls for messages from the PMF action script. If the server process exits abnormally during this interval, the PMF action script informs the probe.

The probe then updates the status of the data service as "Service daemon not running" and takes action. The action can involve just restarting the data service locally or failing over the data service to a secondary cluster node. The probe decides whether to restart or to fail over the data service by checking the value set in the resource properties Retry_count and Retry_interval for the data service application resource.