Sun Cluster 3.0 U1 Data Services Installation and Configuration Guide

Sun Cluster HA for NFS Fault Monitor

The Sun Cluster HA for NFS fault monitor involves the following two processes.

NFS system fault monitoring, which involves monitoring the NFS daemons (nfsd, mountd, statd, and mountd) and taking appropriate action when they have problems.
Status check, which is specific to each NFS resource. The fault monitor of each resource checks the status of each shared path to monitor the file systems that the resource exports.

Fault Monitor Startup

An NFS resource start method starts the NFS system fault monitor. This start method first checks if the NFS system fault monitor (nfs_daemons_probe) is already running under the process monitor pmfadm. If not, the start method starts the nfs_daemons_probe process under the control of the process monitor. It then starts the resource fault monitor (nfs_probe), also under the control of the process monitor.

Fault Monitor Stops

The NFS resource Monitor_stop method stops the resource fault monitor. This method also stops the NFS system fault monitor if no other NFS resource fault monitor runs on the local node.

NFS System Fault Monitor Process

To check for the presence of the process and its response to a null rpc call, the system fault monitor probes rpcbind, statd, lockd, nfsd, and mountd. This monitor uses the following NFS extension properties.

`Rpcbind_nullrpc_timeout`	`Lockd_nullrpc_timeout`
`Nfsd_nullrpc_timeout`	`Rpcbind_nullrpc_reboot`
`Mountd_nullrpc_timeout`	`Nfsd_nullrpc_restart`
`Statd_nullrpc_timeout`	`Mountd_nullrpc_restart`

See "Configuring Sun Cluster HA for NFS Extension Properties" to review or set extension properties.

Each system fault monitor probe cycle performs the following steps.

Sleeps for Cheap_probe_interval.
Probes rpcbind.

If the process fails, reboots the system if Failover_mode=HARD.

If a null rpc call fails and if Rpcbind_nullrpc_reboot=True and Failover_mode=HARD, reboots the system.
Probes statd and lockd.

If either of these daemons fails, restarts both.

If a null rpc call fails, logs a message to syslog but does not restart.
Probe nfsd and mountd.

If the process fails, restart it.

If a null rpc call fails, restart mountd if the cluster file system device is available and the extension property Mountd_nullrpc_restart=True.

If any of the NFS daemons fail to restart, the status of all online NFS resources is set to FAULTED. When all NFS daemons are restarted and healthy, the resource status is set to ONLINE again.

NFS Resource Monitor Process

Before starting the resource monitor probes, all shared paths are read from the dfstab file and stored in memory. In each probe cycle, all shared paths are probed in each iteration by performing stat() of the path.

Each resource monitor fault probe performs the following steps.

Sleeps for Thorough_probe_interval.
Refreshes the memory if dfstab has been changed since the last read.
Probes all shared paths in each iteration by preforming stat() of the path.

If any path is not functional, the resource status is set to FAULTED. If all paths are functional, the resource status is set to ONLINE again.