C.4.1 Sun Cluster HA for NFS Fault Probes (Sun Cluster 2.2 System Administration Guide)

Sun Cluster 2.2 System Administration Guide

C.4.1 Sun Cluster HA for NFS Fault Probes

The probing server runs two types of periodic probes against another server's NFS service.

The probing server sends a NULL RPC to all daemon processes on the target node that are required to provide NFS service; these daemons are rpcbind, mountd, nfsd, lockd, and statd.
The probing server does an end-to-end test: it tries to mount an NFS file system from the other node, and then to read and write a file in that file system. It does this end-to-end test for every file system that the other node is currently sharing. Because the mount is expensive, it is executed less often than the other probes.

If any of these probes fail, the probing node will consider doing a takeover from the serving node. However, certain conditions might inhibit the takeover from occurring immediately:

Grace period for local restart - Before doing the takeover, the probing node waits for a short time period that is intended to:
- Give the victim node a chance to notice its own sickness and fix the problem by doing a local restart of its own daemons
- Give the victim node a chance to be less busy (if it is merely overloaded)
After waiting, the prober retries the probe, going on with takeover consideration only if it fails again. In general, two entire timeouts of the basic probe are required for a takeover, to allow for a slow server.
Multiple public networks - If the other node is on multiple public networks, the probing node will try the probe on at least two of them.
Locks - Some backup utilities exploit the lockfs(1M) facility, which locks out various types of updates on a file system, so that backup can take a snapshot of an unchanging file system. Unfortunately, in the NFS context, lockfs(1M) makes a file system appear unavailable; NFS clients will see the condition NFS server not responding. Before doing a takeover, the probing node queries the other node to find out whether the file system is in lockfs state, and, if so, takeover is inhibited. The takeover is inhibited because the lockfs is part of a normal, intended administrative procedure for doing backup. Note that not all backup utilities use lockfs; some permit NFS service to continue uninterrupted.
Daemons - Unresponsiveness of lockd and statd daemons does not cause a takeover. The lockd and statd daemons, together, provide network locking for NFS files. If these daemons are unresponsive, the condition is merely logged to syslog, and a takeover does not occur. lockd and statd, in the course of their normal work, must perform RPCs to client machines, so that a dead or partitioned client can cause lockd and statd to hang for long periods of time. Thus, a bad client can make lockd and statd on the server look sick. And if a takeover by the probing server were to occur, the probing server would probably be stalled by the bad client in the same way. With the current model, a bad client will not cause a false takeover.

After passing these Sun Cluster HA for NFS-specific tests, the process of considering whether or not to do a takeover continues with calls to hactl(1M) (see "C.1.2 Sanity Checking of Probing Node").

The probing server also checks its own NFS service. The logic is similar to the probes of the other server, but instead of doing takeovers, error messages are logged to syslog and an attempt is made to restart any daemons whose process no longer exists. In other words, the restart of a daemon process is performed only when the daemon process has exited or crashed. The restart of a daemon process is not attempted if the daemon process still exists but is not responding, because that would require killing the daemon without knowing which data structures it is updating. The restart is also not done if a local restart has been attempted too recently (within the last hour). Instead, the other server is told to consider doing a takeover (provided the other server passes its own sanity checks). Finally, the rpcbind daemon is never restarted, because there is no way to inform processes that had registered with rpcbind that they need to re-register.