The overall tuning process must include client tuning. Sometimes, tuning the client yields more improvement than fixing the server. For example, adding 4 Mbytes of memory to each of 100 clients dramatically decreases the load on an NFS server.
Check the client statistics for NFS problems by typing nfsstat -c at the % prompt.
Look for errors and retransmits.
client % nfsstat -c Client rpc: calls badcalls retrans badxids timeouts waits newcreds 384687 1 52 7 52 0 0 badverfs timers toobig nomem cantsend bufulocks 0 384 0 0 0 0 Client nfs: calls badcalls clgets cltoomany 379496 0 379558 0 Version 2: (379599 calls) null getattr setattr root lookup readlink read 0 0% 178150 46% 614 0% 0 0% 39852 10% 28 0% 89617 23% wrcache write create remove rename link symlink 0 0% 56078 14% 1183 0% 1175 0% 71 0% 51 0% 0 0% mkdir rmdir readdir statfs 49 0% 0 0% 987 0% 11744 3%
The output of the nfsstat -c command shows that there were only 52 retransmits (retrans ) and 52 time-outs (timeout) out of 384687 calls.
The nfsstat -c display contains the following fields:
Table 3-9 Output of the nfsstat -c Command
calls |
Total number of calls sent |
badcalls |
Total number of calls rejected by RPC |
retrans |
Total number of retransmissions |
badxid |
Number of times that a duplicate acknowledgment was received for a single NFS request |
timeout |
Number of calls that timed out |
wait |
Number of times a call had to wait because no client handle was available |
newcred |
Number of times the authentication information had to be refreshed |
Table 3-9, shown earlier in this chapter, describes the NFS operations. Table 3-10 explains the output of the nfsstat -c command and what action to take.
Table 3-10 Description of the nfsstat -c Command Output
If |
Then |
---|---|
retrans > 5% of the calls |
The requests are not reaching the server. |
badxid is approximately equal to badcalls |
The network is slow. Consider installing a faster network or installing subnets. |
badxid is approximately equal to timeouts |
Most requests are reaching the server but the server is slower than expected. Watch expected times using nfsstat -m. |
badxid is close to 0 |
The network is dropping requests. Reduce rsize and wsize in the mount options. |
null > 0 |
A large amount of null calls suggests that the automounter is retrying the mount frequently. The timeout values for the mount are too short. Increase the mount timeout parameter, timeo, on the automounter command line |
The third-party tools you can use for NFS and networks include:
NetMetrix (Hewlett-Packard)
SharpShooter (Network General)
Display statistics for each NFS mounted file system by typing nfsstat -m.
The statistics include the server name and address, mount flags, current read and write sizes, transmission count, and the timers used for dynamic transmission.
client % nfsstat -m /export/home from server:/export/home Flags: vers=2,hard,intr,dynamic,rsize=8192,wsize=8192,retrans=5 Lookups: srtt=10 (25ms), dev=4 (20ms), cur=3 (60ms) Reads: srtt=9 (22ms), dev=7 (35ms), cur=4 (80ms) Writes: srtt=7 (17ms), dev=3 (15ms), cur=2 (40ms) All: srtt=11 (27ms), dev=4 (20ms), cur=3 (60ms)
Descriptions of the following terms, used in the output of the nfsstat -m command, follow:
Table 3-11 Description of the Output of the nfsstat -m Command
srtt |
Smoothed round-trip time |
dev |
Estimated deviation |
cur |
Current backed-off timeout value |
The numbers in parentheses in the previous code example are the actual times in milliseconds. The other values are unscaled values kept by the operating system kernel. You can ignore the unscaled values. Response times are shown for lookups, reads, writes, and a combination of all of these operations (all). Table 3-12 shows the appropriate action for the nfsstat -m command.
Table 3-12 Results of the nfsstat -m Command
If |
Then |
---|---|
srtt > 50 ms |
That mount point is slow. Check the network and the server for the disk(s) that provide that mount point. See "To Check the Network"" and "To Check the NFS Server"" earlier in this chapter. |
The message "NFS server not responding" is displayed |
Try increasing the timeo parameter in the /etc/vfstab file to eliminate the messages and improve performance. Doubling the initial timeo parameter value is a good baseline. After changing the timeo value in the vfstab file, invoke the nfsstat -c command and observe the badxid value returned by the command. Follow the recommendations for the nfsstat -c command earlier in this section. |
Lookups: cur > 80 ms |
The requests are taking too long to process. This indicates a slow network or a slow server. |
Reads: cur > 150 ms |
The requests are taking too long to process. This indicates a slow network or a slow server. |
Writes: cur > 250 ms |
The requests are taking too long to process. This indicates a slow network or a slow server. |