Commands for Troubleshooting NFS Problems

Language:

This section describes the commands you can use for troubleshooting NFS problems.

`nfsstat` Command

This command displays statistical information about NFS and RPC connections. Use the following syntax to display NFS server and client statistics:

nfsstat [-cmnrsz]

–c: Displays client-side information.
–m: Displays statistics for each NFS-mounted file system.
–n: Displays the NFS information on both the client side and the server side.
–r: Displays RPC statistics.
–s: Displays the server-side information.
–z: Specifies that the statistics should be set to zero.

If no options are supplied, the –cnrs options are used.

Gathering server-side statistics can be important for debugging problems when new software or new hardware is added to the computing environment. Running this command a minimum of once a week, and storing the numbers, provides a good history of previous performance.

Example 10 Displaying NFS Server Statistics

$ nfsstat -s

Server rpc:
Connection oriented:
calls      badcalls   nullrecv   badlen     xdrcall    dupchecks  dupreqs    
719949194  0          0          0          0          58478624   33         
Connectionless:
calls      badcalls   nullrecv   badlen     xdrcall    dupchecks  dupreqs    
73753609   0          0          0          0          987278     7254       

Server NFSv2:
calls      badcalls   referrals  referlinks
25733      0          0          0

Server NFSv3:
calls      badcalls   referrals  referlinks
132880073  0          0          0

Server NFSv4:
calls      badcalls   referrals  referlinks
488884996  4          0          0
Version 2: (746607 calls)
null       getattr    setattr    root       lookup     readlink   read       
883 0%     60 0%      45 0%      0 0%       177446 23% 1489 0%    537366 71% 
wrcache    write      create     remove     rename     link       symlink    
0 0%       1105 0%    47 0%      59 0%      28 0%      10 0%      9 0%       
mkdir      rmdir      readdir    statfs     
26 0%      0 0%       27926 3%   108 0%     
Version 3: (728863853 calls)
null          getattr       setattr       lookup        access        
1365467 0%    496667075 68% 8864191 1%    66510206 9%   19131659 2%   
readlink      read          write         create        mkdir         
414705 0%     80123469 10%  18740690 2%   4135195 0%    327059 0%     
symlink       mknod         remove        rmdir         rename        
101415 0%     9605 0%       6533288 0%    111810 0%     366267 0%     
link          readdir       readdirplus   fsstat        fsinfo        
2572965 0%    519346 0%     2726631 0%    13320640 1%   60161 0%      
pathconf      commit        
13181 0%      6248828 0%    
Version 4: (54871870 calls)
null                compound            
266963 0%           54604907 99%        
Version 4: (167573814 operations)
reserved            access              close               commit              
0 0%                2663957 1%          2692328 1%          1166001 0%          
create              delegpurge          delegreturn         getattr             
167423 0%           0 0%                1802019 1%          26405254 15%        
getfh               link                lock                lockt               
11534581 6%         113212 0%           207723 0%           265 0%              
locku               lookup              lookupp             nverify             
230430 0%           11059722 6%         423514 0%           21386866 12%        
open                openattr            open_confirm        open_downgrade      
2835459 1%          4138 0%             18959 0%            3106 0%             
putfh               putpubfh            putrootfh           read                
52606920 31%        0 0%                35776 0%            4325432 2%          
readdir             readlink            remove              rename              
606651 0%           38043 0%            560797 0%           248990 0%           
renew               restorefh           savefh              secinfo             
2330092 1%          8711358 5%          11639329 6%         19384 0%            
setattr             setclientid         setclientid_confirm verify              
453126 0%           16349 0%            16356 0%            2484 0%             
write               release_lockowner   illegal             
3247770 1%          0 0%                0 0%                

Server nfs_acl:
Version 2: (694979 calls)
null        getacl      setacl      getattr     access      getxattrdir 
0 0%        42358 6%    0 0%        584553 84%  68068 9%    0 0%        
Version 3: (2465011 calls)
null        getacl      setacl      getxattrdir 
0 0%        1293312 52% 1131 0%     1170568 47%

The example shows how to display the statistics for RPC and NFS activities. In both sets of statistics, knowing the average number of badcalls or calls and the number of calls per week can help identify a problem. The badcalls value reports the number of bad messages from a client. This value can indicate network hardware problems.

Some of the connections generate write activity on the disks. A sudden increase in these statistics could indicate trouble and should be investigated. For NFS Version 2 statistics, the connections to note are setattr, write, create, remove, rename, link, symlink, mkdir, and rmdir. For NFS Version 3 and NFS Version 4 statistics, the value to watch is commit. If the commit level is high in one NFS server compared to another almost identical server, check that the NFS clients have enough memory. The number of commit operations on the server grows when clients do not have available resources.

`pstack` Command

The pstack command displays a stack trace for each process. The pstack command must be run by the owner of the process or by root. You can use the pstack command to determine where a process is hung. The only option that is allowed with this command is the process ID of the process that you want to check. For more information about the pstack command, see the proc(1) man page.

Example 11 Displaying Stack Trace for NFS Process

$ /usr/bin/pgrep nfsd
243
$ /usr/bin/pstack 243
243:    /usr/lib/nfs/nfsd -a 16
 ef675c04 poll     (24d50, 2, ffffffff)
 000115dc ???????? (24000, 132c4, 276d8, 1329c, 276d8, 0)
 00011390 main     (3, efffff14, 0, 0, ffffffff, 400) + 3c8
 00010fb0 _start   (0, 0, 0, 0, 0, 0) + 5c

The example shows that the process is waiting for a new connection request, which is a normal response. If the stack shows that the process is still in poll after a request is made, the process might be hung. For more information about fixing a hung process, see How to Restart NFS Service. For more information about troubleshooting NFS, see NFS Troubleshooting Procedures.

`rpcinfo` Command

The rpcinfo command generates information about the RPC service that is running on a system. Use the following command syntaxes to display information about the RPC service:

rpcinfo [-m|-s] [hostname]
rpcinfo [-T transport] [hostname] [progname]
rpcinfo [-t|-u] [hostname] [progname]

–m: Displays a table of statistics of the rpcbind operations
–s: Displays a concise list of all registered RPC programs
–T transport: Displays information about services that use specific transports or protocols
–t: Probes the RPC programs that use TCP
–u: Probes the RPC programs that use UDP
transport: Specifies the transport or protocol for the services
hostname: Specifies the host name of the server
progname: Specifies the name of the RPC program

For more information about the available options, see the rpcinfo(1M) man page.

If no value is given for hostname, the local host name is used. You can substitute the RPC program number for progname, but the name is more commonly used. You can use the –p option in place of the –s option on those systems that do not run the NFS Version 3 software.

The data that is generated by this command can include the following:

RPC program number
Version number for a specific program
Transport protocol in use
Name of the RPC service
Owner of the RPC service

Example 12 Displaying RPC Service Information

$ rpcinfo -s bee |sort -n
   program version(s) netid(s)                                    service     owner
    100000  2,3,4     udp6,tcp6,udp,tcp,ticlts,ticotsord,ticots   portmapper  superuser
    100001  4,3,2     udp6,udp,ticlts                             rstatd      superuser
    100003  4,3,2     tcp,udp,tcp6,udp6                           nfs         1
    100005  3,2,1     ticots,ticotsord,tcp,tcp6,ticlts,udp,udp6   mountd      superuser
    100007  1,2,3     ticots,ticotsord,ticlts,tcp,udp,tcp6,udp6   ypbind      1
    100011  1         udp6,udp,ticlts                             rquotad     superuser
    100021  4,3,2,1   tcp,udp,tcp6,udp6                           nlockmgr    1
    100024  1         ticots,ticotsord,ticlts,tcp,udp,tcp6,udp6   status      superuser
    100068  5,4,3,2   ticlts                                        -         superuser
    100083  1         ticotsord                                     -         superuser
    100133  1         ticots,ticotsord,ticlts,tcp,udp,tcp6,udp6     -         superuser
    100134  1         ticotsord                                     -         superuser
    100155  1         ticotsord                                   smserverd   superuser
    100169  1         ticots,ticotsord,ticlts                       -         superuser
    100227  3,2       tcp,udp,tcp6,udp6                           nfs_acl     1
    100234  1         ticotsord                                     -         superuser
    390113  1         tcp                                           -         superuser
    390435  1         tcp                                           -         superuser
    390436  1         tcp                                           -         superuser
1073741824  1         tcp,tcp6                                      -         1

The example shows information about the RPC services that are running on a server. The output that is generated by the command is filtered by the sort command by program number to make the information more readable. Several lines that list RPC services have been deleted from the example.

You can gather information about a particular RPC service by selecting a particular transport on a server. The following example checks the mountd service that is running over TCP.

$ rpcinfo -t bee mountd
program 100005 Version 1 ready and waiting
program 100005 Version 2 ready and waiting
program 100005 Version 3 ready and waiting

The following example checks the NFS service that is running over UDP.

$ rpcinfo -u bee nfs
program 100003 Version 2 ready and waiting
program 100003 Version 3 ready and waiting

`snoop` Command

The snoop command is used to monitor packets on the network. The snoop command must be run as the root user. The use of this command is a good way to ensure that the network hardware is functioning on both the NFS client and the NFS server.

Use the following command syntax to monitor packets on the network:

snoop [-d device] [-o filename] [host hostname]

–d device: Specifies the local network interface
–o filename: Stores all the captured packets into the named file
hostname: Displays packets going to and from a specific host only

The –d device option is useful on servers that have multiple network interfaces. You can use many expressions other than setting the host. A combination of command expressions with grep can often generate data that is specific enough to be useful. For more information about the available options, see the snoop(1M) man page.

When troubleshooting, make sure that packets are going to and from the proper host. Also, look for error messages. Saving the packets to a file can simplify the review of the data.

`truss` Command

You can use the truss command to check whether a process is hung. The truss command must be run by the owner of the process or by root.

Use the following command syntax to check whether a process is hung:

truss [-t syscall] -p pid

–t syscall: Selects system calls to trace
–p pid: Indicates the PID of the process to be traced

syscall is a comma-separated list of system calls to be traced. Starting the list with an ! character excludes the listed system calls from the trace. For more information about the available options, see the truss(1) man page.

Example 13 Displaying Process Status

$ /usr/bin/truss -p 243
poll(0x00024D50, 2, -1)         (sleeping...)

The example shows that the process is waiting for another connection request, which is a normal response. If the response does not change after a new connection request has been made, the process could be hung.

For information about restarting the NFS service, see How to Restart NFS Service. For information about troubleshooting a hung process, see NFS Troubleshooting Procedures.

Managing Network File Systems in Oracle® Solaris 11.3