NFS Administration Guide

NFS Troubleshooting Procedures

To determine where the NFS service has failed, you need to follow several procedures to isolate the failure. Check for the following items:

Can the client reach the server?
Can the client contact the NFS services on the server?
Are the NFS services running on the server?

In the process of checking these items, it might become apparent that other portions of the network are not functioning, such as the name service or the physical network hardware. The Solaris Naming Administration Guide contains debugging procedures for the NIS+ name service. Also, during the process it might become obvious that the problem isn't at the client end (for instance, if you get at least one trouble call from every subnet in your work area). In this case, it is much more timely to assume that the problem is the server or the network hardware near the server, and start the debugging process at the server, not at the client.

How to Check Connectivity on an NFS Client

Check that the NFS server is reachable from the client. On the client, type the following command.
% /usr/sbin/ping bee bee is alive
If the command reports that the server is alive, remotely check the NFS server (see "How to Remotely Check the NFS Server").

If the server is not reachable from the client, make sure that the local name service is running. For NIS+ clients type the following:

% /usr/lib/nis/nisping -u
Last updates for directory eng.acme.com. :
Master server is eng-master.acme.com.
        Last update occurred at Mon Jun  5 11:16:10 1995

Replica server is eng1-replica-58.acme.com.
        Last Update seen was Mon Jun  5 11:16:10 1995

If the name service is running, make sure that the client has received the correct host information by typing the following:
% /usr/bin/getent hosts bee 129.144.83.117 bee.eng.acme.com

If the host information is correct, but the server is not reachable from the client, run the ping command from another client.

If the command run from a second client fails, see "How to Verify the NFS Service on the Server".

If the server is reachable from the second client, use ping to check connectivity of the first client to other systems on the local net.

If this fails, check the networking software configuration on the client (/etc/netmasks, /etc/nsswitch.conf, and so forth).

If the software is correct, check the networking hardware.

Try moving the client onto a second net drop.

How to Remotely Check the NFS Server

Check that the NFS services have started on the NFS server by typing the following command:

% rpcinfo -s bee|egrep 'nfs|mountd'
 100003  3,2    tcp,udp                          nfs     superuser
 100005  3,2,1  ticots,ticotsord,tcp,ticlts,udp  mountd  superuser

If the daemons have not been started, see "How to Restart NFS Services".

Check that the server's nfsd processes are responding. On the client, type the following command.
% /usr/bin/rpcinfo -u bee nfs program 100003 version 2 ready and waiting program 100003 version 3 ready and waiting
If the server is running, it prints a list of program and version numbers. Using the -t option tests the TCP connection. If this fails, skip to "How to Verify the NFS Service on the Server".

Check that the server's mountd is responding, by typing the following command.
% /usr/bin/rpcinfo -u bee mountd program 100005 version 1 ready and waiting program 100005 version 2 ready and waiting program 100005 version 3 ready and waiting
Using the -t option tests the TCP connection. If either attempt fails, skip to "How to Verify the NFS Service on the Server".

Check the local autofs service if it is being used:
% cd /net/wasp
Choose a /net or /home mount point that you know should work properly. If this doesn't work, then as root on the client, type the following to restart the autofs service:
# /etc/init.d/autofs stop # /etc/init.d/autofs start

Verify that file system is shared as expected on the server.
% /usr/sbin/showmount -e bee /usr/src eng /export/share/man (everyone)
Check the entry on the server and the local mount entry for errors. Also check the name space. In this instance, if the first client is not in the eng netgroup, then that client would not be able to mount the /usr/src file system.

Check all entries that include mounting informtion in all of the local files. The list includes /etc/vfstab and all the /etc/auto_* files.

How to Verify the NFS Service on the Server

Log on to the server as root.

Check that the server can reach the clients.
# ping lilac lilac is alive

If the client is not reachable from the server, make sure that the local name service is running. For NIS+ clients type the following:

% /usr/lib/nis/nisping -u
Last updates for directory eng.acme.com. :
Master server is eng-master.acme.com.
        Last update occurred at Mon Jun  5 11:16:10 1995

Replica server is eng1-replica-58.acme.com.
        Last Update seen was Mon Jun  5 11:16:10 1995

If the name service is running, check the networking software configuration on the server (/etc/netmasks, /etc/nsswitch.conf, and so forth).

Type the following command to check whether the nfsd daemon is running.

# rpcinfo -u localhost nfs
program 100003 version 2 ready and waiting
program 100003 version 3 ready and waiting
# ps -ef | grep nfsd
root    232      1  0  Apr 07     ?     0:01 /usr/lib/nfs/nfsd -a 16
root   3127   2462  1  09:32:57  pts/3  0:00 grep nfsd

Also use the -t option with rpcinfo to check the TCP connection. If these commands fail, restart the NFS service (see "How to Restart NFS Services").

Type the following command to check whether the mountd daemon is running.

# /usr/bin/rpcinfo -u localhost mountd
program 100005 version 1 ready and waiting
program 100005 version 2 ready and waiting
program 100005 version 3 ready and waiting
# ps -ef | grep mountd
root    145      1 0 Apr 07  ?     21:57 /usr/lib/autofs/automountd
root    234      1 0 Apr 07  ?     0:04  /usr/lib/nfs/mountd
root   3084 2462 1 09:30:20 pts/3  0:00  grep mountd

Also use the -t option with rpcinfo to check the TCP connection. If these commands fail, restart the NFS service (see "How to Restart NFS Services").

Type the following command to check whether the rpcbind daemon is running.
# /usr/bin/rpcinfo -u localhost rpcbind program 100000 version 1 ready and waiting program 100000 version 2 ready and waiting program 100000 version 3 ready and waiting
If rpcbind seems to be hung, either reboot the server or follow the steps in "How to Warm-Start rpcbind".

How to Restart NFS Services

To enable daemons without rebooting, become superuser and type the following commands.

# /etc/init.d/nfs.server stop
# /etc/init.d/nfs.server start

This stops the daemons and restart them, if there is an entry in /etc/dfs/dfstab.

How to Warm-Start `rpcbind`

If the NFS server can not be rebooted because of work in progress, it is possible to restart rpcbind without having to restart all of the services that use RPC by completing a warm start as described in this procedure.

As root on the server, get the PID for rpcbind.

Run ps to get the PID (which is the value in the second column).

# ps -ef |grep rpcbind
    root   115     1  0   May 31 ?        0:14 /usr/sbin/rpcbind
    root 13000  6944  0 11:11:15 pts/3    0:00 grep rpcbind

Send a SIGTERM signal to the rpcbind process.

In this example, term is the signal that is to be sent and 115 is the PID for the program (see the kill(1) man page). This causes rpcbind to create a list of the current registered services in /tmp/portmap.file and /tmp/rpcbind.file.
# kill -s term 115
Note -
If you do not kill the rpcbind process with the -s term option, then you cannot complete a warm start of rpcbind and will have to reboot the server to restore service.

Restart rpcbind.

Do a warm restart of the command so that the files created by the kill command are consulted, and the process resumes without requiring that all of the RPC services be restarted (see the rpcbind(1M) man page).
# /usr/sbin/rpcbind -w

How to Identify Which Host Is Providing NFS File Service

Run the nfsstat command with the -m option to gather current NFS information.

The name of the current server is printed after "currserver=".

% nfsstat -m
/usr/local from bee,wasp:/export/share/local
 Flags: vers=3,proto=tcp,sec=sys,hard,intr,llock,link,synlink,
		acl,rsize=32768,wsize=32678,retrans=5
 Failover: noresponse=0, failover=0, remap=0, currserver=bee