System Administration Guide, Volume 3

Other Useful Commands

These commands can be useful when troubleshooting NFS problems.

nfsstat

You can use this command to gather statistical information about NFS and RPC connections. The syntax of the command is:

nfsstat [ -cmnrsz ]

where -c displays client-side information, -m displays statistics for each NFS mounted file system, -n specifies that NFS information is to be displayed (both client and server side), -r displays RPC statistics, -s displays the server-side information, and -z specifies that the statistics should be set to zero. If no options are supplied on the command line, the -cnrs options are used.

Gathering server-side statistics can be important for debugging problems when new software or hardware is added to the computing environment. Running this command at least once a week, and storing the numbers, provides a good history of previous performance.

Using the nfsstat Command


# nfsstat -s

Server rpc:
Connection oriented:
calls      badcalls   nullrecv   badlen      xdrcall    dupchecks  dupreqs
11420263   0          0          0           0          1428274    19
Connectionless:
calls      badcalls   nullrecv   badlen      xdrcall    dupchecks  dupreqs
14569706   0          0          0           0          953332     1601

Server nfs:
calls      badcalls
24234967   226
Version 2: (13073528 calls)
null       getattr    setattr    root        lookup     readlink   read
138612 1%  1192059 9% 45676 0%   0 0%        9300029 71% 9872 0%   1319897 10%
wrcache    write      create     remove      rename     link       symlink
0 0%       805444 6%  43417 0%   44951 0%    3831 0%    4758 0%    1490 0%
mkdir      rmdir      readdir    statfs
2235 0%    1518 0%    51897 0%   107842 0%
Version 3: (11114810 calls)
null       getattr    setattr    lookup      access     readlink   read
141059 1%  3911728 35% 181185 1% 3395029 30% 1097018 9% 4777 0%   960503 8%
write      create     mkdir      symlink     mknod      remove     rmdir
763996 6%  159257 1%  3997 0%    10532 0%    26 0%      164698 1%  2251 0%
rename     link       readdir    readdirplus fsstat     fsinfo     pathconf
53303 0%   9500 0%    62022 0%   79512 0%    3442 0%    34275 0%   3023 0%
commit    
73677 0%  

Server nfs_acl:
Version 2: (1579 calls)
null       getacl     setacl     getattr     access    
0 0%       3 0%       0 0%       1000 63%    576 36%   
Version 3: (45318 calls)
null       getacl     setacl    
0 0%       45318 100% 0 0%

This is an example of NFS server statistics. The first five lines deal with RPC and the remaining lines report NFS activities. In both sets of statistics, knowing the average number of badcalls or calls and the number of calls per week, can help identify when something is going wrong. The badcalls value reports the number of bad messages from a client and can point out network hardware problems.

Some of the connections generate write activity on the disks. A sudden increase in these statistics could indicate trouble and should be investigated. For NFS version 2 statistics, the connections to note are: setattr, write, create, remove, rename, link, symlink, mkdir, and rmdir. For NFS version 3 statistics, the value to watch is commit. If the commit level is high in one NFS server as compared to another almost identical one, check that the NFS clients have enough memory. The number of commit operations on the server grow when clients do not have available resources.

pstack

This command displays a stack trace for each process. It must be run by root. You can use it to determine where a process is hung. The only option allowed with this command is the PID of the process that you want to check (see the proc(1) man page).

The following example is checking the nfsd process that is running.


# /usr/proc/bin/pstack 243
243:    /usr/lib/nfs/nfsd -a 16
 ef675c04 poll     (24d50, 2, ffffffff)
 000115dc ???????? (24000, 132c4, 276d8, 1329c, 276d8, 0)
 00011390 main     (3, efffff14, 0, 0, ffffffff, 400) + 3c8
 00010fb0 _start   (0, 0, 0, 0, 0, 0) + 5c

It shows that the process is waiting for a new connection request. This is a normal response. If the stack shows that the process is still in poll after a request is made, the process might be hung. Follow the instructions in "How to Restart NFS Services" to fix this problem. Review the instructions in "NFS Troubleshooting Procedures" to fully verify that your problem is a hung program.

rpcinfo

This command generates information about the RPC service running on a system. You can also use it to change the RPC service. Many options are available with this command (see the rpcinfo(1M) man page). This is a shortened synopsis for some of the options that you can use with the command:

rpcinfo [ -m | -s ] [ hostname ]

rpcinfo [ -t | -u ] [ hostname ] [ progname ]

where -m displays a table of statistics of the rpcbind operations, -s displays a concise list of all registered RPC programs, -t displays the RPC programs that use TCP, -u displays the RPC programs that use UDP, hostname selects the host name of the server you need information from, and progname selects the RPC program to gather information about. If no value is given for hostname, the local host name is used. You can substitute the RPC program number for progname, but many users will remember the name and not the number. You can use the -p option in place of the -s option on those systems that do not run the NFS version 3 software.

The data generated by this command can include:

Using the rpcinfo Command

This example gathers information on the RPC services running on a server. The text generated by the command is filtered by the sort command to make it more readable. Several lines listing RPC services have been deleted from the example.


% rpcinfo -s bee |sort -n
   program version(s) netid(s)                         service     owner
    100000  2,3,4     udp,tcp,ticlts,ticotsord,ticots  portmapper  superuser
    100001  4,3,2     ticlts,udp                       rstatd      superuser
    100002  3,2       ticots,ticotsord,tcp,ticlts,udp  rusersd     superuser
    100003  3,2       tcp,udp                          nfs         superuser
    100005  3,2,1     ticots,ticotsord,tcp,ticlts,udp  mountd      superuser
    100008  1         ticlts,udp                       walld       superuser
    100011  1         ticlts,udp                       rquotad     superuser
    100012  1         ticlts,udp                       sprayd      superuser
    100021  4,3,2,1   ticots,ticotsord,ticlts,tcp,udp  nlockmgr    superuser
    100024  1         ticots,ticotsord,ticlts,tcp,udp  status      superuser
    100026  1         ticots,ticotsord,ticlts,tcp,udp  bootparam   superuser
    100029  2,1       ticots,ticotsord,ticlts          keyserv     superuser
    100068  4,3,2     tcp,udp                          cmsd        superuser
    100078  4         ticots,ticotsord,ticlts          kerbd       superuser
    100083  1         tcp,udp                          -           superuser
    100087  11        udp                              adm_agent   superuser
    100088  1         udp,tcp                          -           superuser
    100089  1         tcp                              -           superuser
    100099  1         ticots,ticotsord,ticlts          pld         superuser
    100101  10        tcp,udp                          event       superuser
    100104  10        udp                              sync        superuser
    100105  10        udp                              diskinfo    superuser
    100107  10        udp                              hostperf    superuser
    100109  10        udp                              activity    superuser
	.
	.
    100227  3,2       tcp,udp                          -           superuser
    100301  1         ticlts                           niscachemgr superuser
    390100  3         udp                              -           superuser
1342177279  1,2       tcp                              -           14072

This example shows how to gather information about a particular RPC service using a particular transport on a server.


% rpcinfo -t bee mountd
program 100005 version 1 ready and waiting
program 100005 version 2 ready and waiting
program 100005 version 3 ready and waiting
% rpcinfo -u bee nfs
program 100003 version 2 ready and waiting
program 100003 version 3 ready and waiting

The first example checks the mountd service running over TCP. The second example checks the NFS service running over UDP.

snoop

This command is often used to watch for packets on the network. It must be run as root. It is a good way to make sure that the network hardware is functioning on both the client and the server. Many options are available (see the snoop(1M) man page). A shortened synopsis of the command follows:

snoop [ -d device ] [ -o filename ] [ host hostname ]

where -d device specifies the local network interface, -o filename stores all the captured packets into the named file, and hostname indicates to display only packets going to and from a specific host.

The -d device option is useful on those servers that have multiple network interfaces. You can use many other expressions besides setting the host. A combination of command expressions with grep can often generate data that is specific enough to be useful.

When troubleshooting, make sure that packets are going to and from the proper host. Also, look for error messages. Saving the packets to a file can make it much easier to review the data.

truss

You can use this command to see if a process is hung. It must be run by root. You can use many options with this command (see the truss(1) man page). A shortened syntax of the command is:

truss [ -t syscall ] -p pid

where -t syscall selects system calls to trace, and -p pid indicates the PID of the process to be traced. The syscall may be a comma-separated list of system calls to be traced. Also, starting syscall with a ! selects to exclude the listed system calls from the trace.

This example shows that the process is waiting for another connection request from a new client.


# /usr/bin/truss -p 243
poll(0x00024D50, 2, -1)         (sleeping...)

This is a normal response. If the response does not change after a new connection request has been made, the process could be hung. Follow the instructions in "How to Restart NFS Services" to fix the hung program. Review the instructions in "NFS Troubleshooting Procedures" to fully verify that your problem is a hung program.