The Sun Cluster HA for Samba fault monitor is controlled by the extension properties that control the probing frequency. The default values of these properties determine the preset behavior of the fault monitor. The preset behavior should be suitable for most Sun Cluster installations. Therefore, you should tune the Sun Cluster HA for Samba fault monitor only if you need to modify this preset behavior.
Setting the interval between fault monitor probes (Thorough_probe_interval)
Setting the time-out for fault monitor probes (Probe_timeout)
Setting the number of times the fault monitor attempts to restart the resource (Retry_count)
The Sun Cluster HA for Samba fault monitor checks the smbd, nmbd, and winbindd components within an infinite loop. During each cycle the fault monitor will check the relevant component and report either a failure or success.
If the fault monitor is successful it returns to its infinite loop and continues the next cycle of probing and sleeping.
If the fault monitor reports a failure a request is made to the cluster to restart the resource. If the fault monitor reports another failure another request is made to the cluster to restart the resource. This behavior will continue whenever the fault monitor reports a failure.
If successive restarts exceed the Retry_count within the Thorough_probe_interval a request to failover the resource group onto a different node or zone is made.
The winbindd daemon resolves user and group information as a service to the Name Service Switch. When running winbindd the Name Service Cache daemon must be turned off. To disable this refer to Step 4 in How to Prepare Samba for Sun Cluster HA for Samba.
The winbind fault monitor periodically checks that the fault monitor user can be retrieved by using getent passwd samba-fault-monitor-user.
The Samba probe checks the nmbd daemon using the nmblookup program for each interface specified within the smb.conf file.
The Samba probe checks the smbd daemon using the smbclient program together with the samba-fault-monitor-user to access the scmondir share.
If smbclient cannot connect, there could be network/server issues causing smbclient to fail. These errors maybe transient and correctable within a few seconds. Therefore before a failure is called by the probe, smbclient is retried within 85% of the available Probe_timeout less 15 seconds, which is approximately the time-out for the first smbclient failure.
However, doing this is only realistic if Probe_timeout=30 seconds or more. If Probe_timeout is below 30 seconds then smbclient is tried only once.