Go to main content

Managing Network File Systems in Oracle® Solaris 11.4

Exit Print View

Updated: August 2021
 
 

Commands for Troubleshooting NFS Problems

This section describes the commands you can use for troubleshooting NFS problems.

nfsstat Command

This command displays statistical information about NFS and RPC connections. Use the following syntax to display NFS server and client statistics:

nfsstat [-cmnrsz]
–c

Displays client-side information

–m

Displays statistics for each NFS-mounted file system

–n

Displays the NFS information on both the client side and the server side

–r

Displays RPC statistics

–s

Displays the server-side information

–z

Specifies that the statistics should be set to zero

If no options are supplied, the –cnrs options are used.

Gathering server-side statistics can be important for debugging problems when new software or new hardware is added to the computing environment. Running this command a minimum of once a week, and storing the numbers, provides a good history of previous performance.

Example 10  Displaying NFS Server Statistics
# nfsstat -s

Server rpc: 
Connection oriented: 
calls   badcalls   nullrecv   badlen     xdrcall    dupchecks  dupreqs 
11459   0          0          0          0          0          0 
Connectionless: 
calls   badcalls   nullrecv   badlen     xdrcall    dupchecks  dupreqs 
0       0          0          0          0          0          0 

Server NFSv2: 
calls      badcalls   referrals  referlinks
0          0          0          0 

Server NFSv3: 
calls      badcalls   referrals  referlinks
0          0          0          0 

Server NFSv4: 
calls      badcalls   referrals  referlinks
11456      3          0          8 
Version 2: (0 calls)
null     getattr  setattr  root     lookup   readlink read     wrcache  
0 0%     0 0%     0 0%     0 0%     0 0%     0 0%     0 0%     0 0% 
write    create   remove   rename   link     symlink  mkdir    rmdir
0 0%     0 0%     0 0%     0 0%     0 0%     0 0%     0 0%     0 0% 
readdir  statfs 
0 0%     0 0% 
Version 3: (0 calls)
null        getattr     setattr     lookup      access   readlink 
0 0%        0 0%        0 0%        0 0%        0 0%     0 0% 
read        write       create   mkdir       symlink     mknod 
0 0%        0 0%        0 0%     0 0%        0 0%        0 0% 
remove     rmdir       rename      link        readdir     readdirplus 
0 0%       0 0%        0 0%        0 0%        0 0%        0 0% 
fsstat      fsinfo      pathconf    commit 
0 0%        0 0%        0 0%        0 0% 
Version 4.0: (10617 calls)
null                compound 
775 7%              9842 92% 
Version 4.0: (42291 operations) 
reserved        access             close 
0 0%            498 1%             28 0% 
commit          create             delegpurge 
14 0%           270 0%             0 0% 
delegreturn     getattr            getfh 
0 0%            4709 11%           5841 13% 
link            lock               lockt 
22 0%           74 0%              25 0% 
locku           lookup             lookupp 
24 0%           10192 24%          9 0% 
nverify         open               openattr 
106 0%          1248 2%            0 0% 
open_confirm    open_downgrade     putfh 
1174 2%         14 0%              4328 10%
putpubfh        putrootfh            read 
2 0%            7607 17%             20 0%
readdir         readlink             remove 
92 0%           8 0%                 1472 3%
rename          renew                restorefh 
35 0%           9 0%                 9 0%
savefh          secinfo              setattr 
64 0%           6 0%                 1792 4%
setclientid     setclientid_confirm  verify 
1234 2%         1228 2%              106 0%
write           release_lockowner 
30 0%           1 0% 
Version 4.1: (827 calls)
null                 compound 
0 0%                 827 100% 
Version 4.1: (1561 operations) 
reserved              access               close 
0 0%                  0 0%                 10 0% 
commit                create               delegpurge
0 0%                  6 0%                 0 0% 
delegreturn           getattr              getfh 
7 0%                  3 0%                 63 4% 
link                  lock                 lockt 
0 0%                  2 0%                 0 0% 
locku                 lookup                lookupp 
2 0%                  262 16%               10 0% 
nverify               open                  openattr 
0 0%                  47 3%                 0 0% 
open_confirm          open_downgrade        putfh 
0 0%                  0 0%                  31 1% 
putpubfh              putrootfh            read 
0 0%                  150 9%               3 0% 
readdir               readlink             remove 
3 0%                  0 0%                 7 0% 
rename                renew                restorefh 
2 0%                  0 0%                 0 0% 
savefh                secinfo              setattr 
1 0%                  0 0%               8 0% 
setclientid           setclientid_confirm  verify 
0 0%                  0 0%                 1 0% 
write                 release_lockowner    backchannel_ctl 
6 0%                  0 0%                 1 0% 
bind_conn_to_session  exchange_id          create_session 
1 0%                  267 17%              223  14% 
destroy_session       free_stateid         get_dir_delegation 
17 1%                 1 0%                 0 0%
getdeviceinfo         getdevicelist        layoutcommit 
0 0%                  0 0%                 0 0%
layoutget             layoutreturn         secinfo_no_name 
0 0%                  0 0%                 3 0%
sequence              set_ssv              test_stateid 
321 20%               0 0%                 4 0%
want_delegation       destroy_clientid     reclaim_complete 
0 0%                  12 0%                87 5% 
illegal
0 0% 

Server nfs_acl: 
Version 2: (0 calls) 
null        getacl      setacl      getattr     access      getxattrdir 
0 0%        0 0%        0 0%        0 0%        0 0%        0 0% 
Version 3: (0 calls) 
null        getacl      setacl      getxattrdir 
0 0%        0 0%        0 0%        0 0% 

The example shows how to display the statistics for RPC and NFS activities. In both sets of statistics, knowing the average number of badcalls or calls and the number of calls per week can help identify a problem. The badcalls value reports the number of bad messages from a client. This value can indicate network hardware problems.

Some of the connections generate write activity on the disks. A sudden increase in these statistics could indicate trouble and should be investigated. For NFS Version 2 statistics, the connections to note are setattr, write, create, remove, rename, link, symlink, mkdir, and rmdir. For NFS Version 3 and NFS Version 4 statistics, the value to watch is commit. If the commit level is high in one NFS server compared to another almost identical server, check that the NFS clients have enough memory. The number of commit operations on the server grows when clients do not have available resources. This example displays the statistics for both NFS Version 4.0 and NFS Version 4.1.

pstack Command

The pstack command displays a stack trace for each process. The pstack command must be run by the owner of the process or by root. You can use the pstack command to determine where a process is hung. The only argument that needs to be provided is the ID of the process that you want to check or the name of a core file. For more information about the pstack command, see the proc(1) man page.

Example 11  Displaying Stack Trace for NFS Process
# /usr/bin/pgrep nfsd
243
# /usr/bin/pstack 243
243:    /usr/lib/nfs/nfsd -a 16
 ef675c04 poll     (24d50, 2, ffffffff)
 000115dc ???????? (24000, 132c4, 276d8, 1329c, 276d8, 0)
 00011390 main     (3, efffff14, 0, 0, ffffffff, 400) + 3c8
 00010fb0 _start   (0, 0, 0, 0, 0, 0) + 5c

The example shows that the process is waiting for a new connection request, which is a normal response. If the stack shows that the process is still in poll after a request is made, the process might be hung. For more information about fixing a hung process, see How to Restart NFS Service. For more information about troubleshooting NFS, see NFS Troubleshooting Procedures.

rpcinfo Command

The rpcinfo command generates information about the RPC service that is running on a system. Use the following command syntaxes to display information about the RPC service:

rpcinfo [-m|-s] [hostname]
rpcinfo [-T transport] [hostname] [progname]
rpcinfo [-t|-u] [hostname] [progname]
–m

Displays a table of statistics of the rpcbind operations

–s

Displays a concise list of all registered RPC programs

–T

Displays information about services that use specific transports or protocols

–t

Probes the RPC programs that use TCP

–u

Probes the RPC programs that use UDP

transport

Specifies the transport or protocol for the services

hostname

Specifies the host name of the server

progname

Specifies the name of the RPC program

For more information about the available options, see the rpcinfo(8) man page.

If no value is given for hostname, the local host name is used. You can substitute the RPC program number for progname, but the name is more commonly used. You can use the –p option in place of the –s option on those systems that do not run the NFS Version 3 software.

    The data that is generated by this command can include the following:

  • RPC program number

  • Version number for a specific program

  • Transport protocol in use

  • Name of the RPC service

  • Owner of the RPC service

Example 12  Displaying RPC Service Information
# rpcinfo -s bee |sort -n
   program version(s) netid(s)                                    service     owner
    100000  2,3,4     udp6,tcp6,udp,tcp,ticlts,ticotsord,ticots   portmapper  superuser
    100001  4,3,2     udp6,udp,ticlts                             rstatd      superuser
    100003  4,3,2     tcp,udp,tcp6,udp6                           nfs         1
    100005  3,2,1     ticots,ticotsord,tcp,tcp6,ticlts,udp,udp6   mountd      superuser
    100007  1,2,3     ticots,ticotsord,ticlts,tcp,udp,tcp6,udp6   ypbind      1
    100011  1         udp6,udp,ticlts                             rquotad     superuser
    100021  4,3,2,1   tcp,udp,tcp6,udp6                           nlockmgr    1
    100024  1         ticots,ticotsord,ticlts,tcp,udp,tcp6,udp6   status      superuser
    100068  5,4,3,2   ticlts                                        -         superuser
    100083  1         ticotsord                                     -         superuser
    100133  1         ticots,ticotsord,ticlts,tcp,udp,tcp6,udp6     -         superuser
    100134  1         ticotsord                                     -         superuser
    100155  1         ticotsord                                   smserverd   superuser
    100169  1         ticots,ticotsord,ticlts                       -         superuser
    100227  3,2       tcp,udp,tcp6,udp6                           nfs_acl     1
    100234  1         ticotsord                                     -         superuser
    390113  1         tcp                                           -         superuser
    390435  1         tcp                                           -         superuser
    390436  1         tcp                                           -         superuser
1073741824  1         tcp,tcp6                                      -         1

The example shows information about the RPC services that are running on a server. The output that is generated by the command is filtered by the sort command by program number to make the information more readable. Several lines that list RPC services have been deleted from the example.

You can gather information about a particular RPC service by selecting a particular transport on a server. The following example checks the mountd service that is running over TCP.

# rpcinfo -t bee mountd
program 100005 Version 1 ready and waiting
program 100005 Version 2 ready and waiting
program 100005 Version 3 ready and waiting

The following example checks the NFS service that is running over UDP.

# rpcinfo -u bee nfs
program 100003 Version 2 ready and waiting
program 100003 Version 3 ready and waiting

snoop Command

The snoop command is used to monitor packets on the network. The snoop command must be run as the root user. The use of this command is a good way to ensure that the network hardware is functioning on both the client and the server.

Use the following command syntax to monitor packets on the network:

snoop [-d device] [-o filename] [host hostname]
–d device

Specifies the local network interface

–o filename

Stores all the captured packets into the named file

hostname

Displays packets going to and from a specific host only

The –d device option is useful on servers that have multiple network interfaces. You can use many expressions other than setting the host. A combination of command expressions with grep can often generate data that is specific enough to be useful. For more information about the available options, see the snoop(8) man page.

When troubleshooting, make sure that packets are going to and from the proper host. Also, look for error messages. Saving the packets to a file can simplify the review of the data.

truss Command

You can use the truss command to check if a process is hung. The truss command must be run by the owner of the process or by root.

Use the following command syntax to check if a process is hung:

# truss [ -t syscall ] p pid
–t syscall

Selects system calls to trace

–p pid

Indicates the PID of the process to be traced

syscall is a comma-separated list of system calls to be traced. Starting the list with an ! character excludes the listed system calls from the trace.

Example 13  Displaying Process Status
# /usr/bin/truss -p 243
poll(0x00024D50, 2, -1)         (sleeping...)

The example shows that the process is waiting for another connection request, which is a normal response. If the response does not change after a new connection request has been made, the process could be hung.

For information about restarting the NFS service, see How to Restart NFS Service. For information about troubleshooting a hung process, see NFS Troubleshooting Procedures.

You can also use the following options with the truss command to find out about timings of each system call executed by the process.

–d

Represents time in seconds relative to the beginning of the trace

–A

Prints current time of day at the end of a system call

–D

Represents the elapsed time for the LWP that incurred an event, since the last reported event incurred by that LWP

–E

Represents the difference in time elapsed between the beginning and end of a system call

The options are used with the following command syntax:

# truss [ -dADE ] <pid>
Example 14  Displaying List of System Calls with the Time Stamps
# /usr/bin/truss -dDE ls
Base time stamp:  1467139321.491964  [ Tue Jun 28 11:42:01 PDT 2016 ]
 0.000000     0.000000     0.000000    execve("/usr/xpg6/bin/ls", 0xFFFF80E3E89E6698, 0xFFFF80E3E89E66A8)  argc = 1
 0.001893     0.001893     0.000042    sysinfo(SI_MACHINE, "i86pc", 257)                = 6

The example shows the list of system calls called by the ls program. The first line of the output gives you the current timestamp at the start of the program. The first column shows the time relative to the base time stamp. The second column represents the elapsed time for the LWP that incurred the event since the last reported event incurred by that LWP. The third column shows the amount of time spent within that system call.

For more information about the available options, see the truss(1) man page.