Sun Cluster 3.0 Data Services Developers' Guide

Evaluating Restart Versus Failover

If the probefail variable is something other than 0 (success), it means the nslookup command timed out or that the reply came from a server other than the sample service's DNS. In either case, the DNS server is not functioning as expected and the fault monitor calls the decide_restart_or_failover function to determine whether to restart the data service locally or request that the RGM relocate the data service to a different node. If the probefail variable is 0, then a message is generated that the probe was successful.


	if [ $probefail -ne 0 ]; then
			decide_restart_or_failover
	else
			logger -p ${SYSLOG_FACILITY}.err\
			-t [$RESOURCETYPE_NAME,$RESOURCEGROUP_NAME,$RESOURCE_NAME]\
			"${ARGV0} Probe for resource HA-DNS successful"
	fi

The decide_restart_or_failover uses a time window (Retry_interval) and a failure count (Retry_count) to determine whether to restart DNS locally or request that the RGM relocate the data service to a different node. It implements the following conditional code.

If the number of restarts reaches the limit during the time interval, the function requests that the RGM relocate the data service to a different node. If the number of restarts is under the limit, or the interval has been exceeded so the count begins again, the function attempts to restart DNS on the same node. Note the following about this function: