The server used in the following examples is a large SPARCserver 690 configuration.
Determine what is being exported by typing share.
server% share - /export/home rw=netgroup "" - /var/mail rw=netgroup "" - /cdrom/solaris_2_3_ab ro ""
Display the file systems mounted and the disk drive on which the file system is mounted by typing df -k.
If a file system is over 100 percent full, it may cause NFS write errors on the clients.
server% df -k Filesystem kbytes used avail capacity Mounted on /dev/dsk/c1t0d0s0 73097 36739 29058 56% / /dev/dsk/c1t0d0s3 214638 159948 33230 83% /usr /proc 0 0 0 0% /proc fd 0 0 0 0% /dev/fd swap 501684 32 501652 0% /tmp /dev/dsk/c1t0d0s4 582128 302556 267930 53% /var/mail /dev/md/dsk/d100 7299223 687386 279377 96% /export/home /vol/dev/dsk/c0t6/solaris_2_3_ab 113512 113514 0 100% /cdrom/solaris_2_3_ab
For this example, the /var/mail and /export/home file systems are used.
Determine on which disk the file systems returned by the df -k command are stored.
In the previous example, note that /var/mail is stored on /dev/dsk/c1t0d0s4 and /export/home is stored on /dev/md/dsk/d100, an Online: DiskSuite metadisk.
Determine the disk number if an Online: DiskSuite metadisk is returned by the df -k command by typing/usr/opt/SUNWmd/sbin/metastat disknumber.
In the previous example, /usr/opt/SUNWmd/sbin/metastat d100 determines what physical disk /dev/md/dsk/d100 uses.
The d100 disk is a mirrored disk. Each mirror is made up of three striped disks of one size concatenated with four striped disks of another size. There is also a hot spare disk. This system uses IPI disks (idX). SCSI disks (sdX) are treated identically.
server% /usr/opt/SUNWmd/sbin/metastat d100 d100: metamirror Submirror 0: d10 State: Okay Submirror 1: d20 State: Okay Regions which are dirty: 0% d10: Submirror of d100 State: Okay Hot spare pool: hsp001 Size: 15536742 blocks Stripe 0: (interlace : 96 blocks) Device Start Block Dbase State Hot Spare /dev/dsk/c1t1d0s7 0 No Okay /dev/dsk/c2t2d0s7 0 No Okay /dev/dsk/c1t3d0s7 0 No Okay Stripe 1: (interlace : 64 blocks) Device Start Block Dbase State Hot Spare /dev/dsk/c3t1d0s7 0 No Okay /dev/dsk/c4t2d0s7 0 No Okay /dev/dsk/c3t3d0s7 0 No Okay /dev/dsk/c4t4d0s7 0 No Okay d20: Submirror of d100 State: Okay Hot spare pool: hsp001 Size: 15536742 blocks Stripe 0: (interlace : 96 blocks) Device Start Block Dbase State Hot Spare /dev/dsk/c2t1d0s7 0 No Okay /dev/dsk/c1t2d0s7 0 No Okay /dev/dsk/c2t3d0s7 0 No Okay Stripe 1: (interlace : 64 blocks) Device Start Block Dbase State Hot Spare /dev/dsk/c4t1d0s7 0 No Okay /dev/dsk/c3t2d0s7 0 No Okay /dev/dsk/c4t3d0s7 0 No Okay /dev/dsk/c3t4d0s7 0 No Okay /dev/dsk/c2t4d0s7
Determine the /dev/dsk entries for each exported file system. Use either the whatdev script to find the instance or nickname for the drive or type ls -lL /dev/dsk/c1t0d0s4 and more /etc/path_to_inst to find the /dev/dsk entries. An explanation of these steps follows.
If you will determine the /dev/dsk entries for exported file systems with the whatdev script, follow these steps:
Type the following whatdev script using a text editor.
#!/bin/csh # print out the drive name - st0 or sd0 - given the /dev entry # first get something like "/iommu/.../.../sd@0,0" set dev = `/bin/ls -l $1 | nawk '{ n = split($11, a, "/"); split(a[n],b,":"); for(i = 4; i < n; i++) printf("/%s",a[i]); printf("/%s\n", b[1]) }'` if ( $dev == "" ) exit # then get the instance number and concatenate with the "sd" nawk -v dev=$dev '$1 ~ dev { n = split(dev, a, "/"); split(a[n], \ b, "@"); printf("%s%s\n", b[1], $2) }' /etc/path_to_inst
Determine the /dev/dsk entry for the file system by typing df /filesystemname.
In this example you would type df /var/mail.
furious% df /var/mail Filesystem kbytes used avail capacity Mounted on /dev/dsk/c1t0d0s4 582128 302556 267930 53% /var/mail
Determine the disk number by typing whatdev diskname (the disk name returned by the df /filesystemname command).
In this example you would type whatdev /dev/dsk/c1t0d0s4. Disk number id8 is returned, which is IPI disk 8.
server% whatdev /dev/dsk/c1t0d0s4 id8
Repeat steps b and c for each file system not stored on a metadisk (dev/md/dsk).
If the file system is stored on a meta disk, (dev/md/dsk), look at the metastat output and run the whatdev script on all drives included in the metadisk.
In this example type whatdev /dev/dsk/c2t1d0s7.
There are 14 disks in the /export/home file system. Running the whatdev script on the /dev/dsk/c2t1d0s7 disk, one of the 14 disks comprising the /export/home file system, returns the following display.
server% whatdev /dev/dsk/c2t1d0s7 id17
Note that /dev/dsk/c2t1d0s7 is disk id17; this is IPI disk 17.
Go to Step 7.
If you didn't determine the /dev/dsk entries for exported file systems with the whatdev script, you need to identify the /dev/dsk entries for exported file systems with ls -lL. Follow these steps:
List the drive and its major and minor device numbers by typing ls -lL disknumber.
For example, for the /var/mail file system, type: ls -lL /dev/dsk/c1t0d0s4.
ls -lL /dev/dsk/c1t0d0s4 brw-r----- 1 root 66, 68 Dec 22 21:51 /dev/dsk/c1t0d0s4
Locate the minor device number in the ls -lL output.
In the previous screen example, the first number following the file ownership (root), 66, is the major number. The second number, 68, is the minor device number.
Determine the disk number.
Divide the minor device number, 68 in the previous example, by 8 (68/8 = 8.5).
Truncate the fraction. The number 8 is the disk number.
Determine the slice (partition) number.
Look at the number following the s (for slice) in the disk number. For example, in /dev/dsk/c1t0d0s4, the 4 following the s refers to slice 4.
Now you know that the disk number is 8 and the slice number is 4. This disk is either sd8 (SCSI) or ip8 (IPI).
View the disk statistics for each disk by typing iostat -x 15. The -x option supplies extended disk statistics. The 15 means disk statistics are gathered every 15 seconds.
server% iostat -x 15 extended disk statistics disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b id10 0.1 0.2 0.4 1.0 0.0 0.0 24.1 0 1 id11 0.1 0.2 0.4 0.9 0.0 0.0 24.5 0 1 id17 0.1 0.2 0.4 1.0 0.0 0.0 31.1 0 1 id18 0.1 0.2 0.4 1.0 0.0 0.0 24.6 0 1 id19 0.1 0.2 0.4 0.9 0.0 0.0 24.8 0 1 id20 0.0 0.0 0.1 0.3 0.0 0.0 25.4 0 0 id25 0.0 0.0 0.1 0.2 0.0 0.0 31.0 0 0 id26 0.0 0.0 0.1 0.2 0.0 0.0 30.9 0 0 id27 0.0 0.0 0.1 0.3 0.0 0.0 31.6 0 0 id28 0.0 0.0 0.0 0.0 0.0 0.0 5.1 0 0 id33 0.0 0.0 0.1 0.2 0.0 0.0 36.1 0 0 id34 0.0 0.2 0.1 0.3 0.0 0.0 25.3 0 1 id35 0.0 0.2 0.1 0.4 0.0 0.0 26.5 0 1 id36 0.0 0.0 0.1 0.3 0.0 0.0 35.6 0 0 id8 0.0 0.1 0.2 0.7 0.0 0.0 47.8 0 0 id9 0.1 0.2 0.4 1.0 0.0 0.0 24.8 0 1 sd15 0.1 0.1 0.3 0.5 0.0 0.0 84.4 0 0 sd16 0.1 0.1 0.3 0.5 0.0 0.0 93.0 0 0 sd17 0.1 0.1 0.3 0.5 0.0 0.0 79.7 0 0 sd18 0.1 0.1 0.3 0.5 0.0 0.0 95.3 0 0 sd6 0.0 0.0 0.0 0.0 0.0 0.0 109.1 0 0
Use the iostat -x 15 command to see the disk number for the extended disk statistics. In the next procedure you will use a sed script to translate the disk names into disk numbers.
The output for the extended disk statistics is:
Table 3-3 Ouput of the iostat -x 15 Command (Extended Disk Statistics)
r/s |
Reads per second |
w/s |
Writes per second |
Kr/s |
Kbytes read per second |
Kw/s |
Kbytes written per second |
wait |
Average number of transactions waiting for service (queue length) |
actv |
Average number of transactions actively being serviced |
svc_t |
Average service time, (milliseconds) |
%w |
Percentage of time the queue is not empty |
%b |
Percentage of time the disk is busy |
Translate disk names into disk numbers
Use iostat and sar. One quick way to do this is to use a sed script:.
Type a sed script using a text editor similar to the following d2fs.server sed script.
Your sed script should substitute the file system name for the disk number.
In this example, disk id8 is substituted for /var/mail and disks id9, id10, id11, id17, id18, id19, id25, id26, id27, id28, id33, id34, id35, and id36 are substituted for /export/home.
sed `s/id8 /var/mail/ s/id9 /export/home/ s/id10 /export/home/ s/id11 /export/home/ s/id17 /export/home/ s/id18 /export/home/ s/id25 /export/home/ s/id26 /export/home/ s/id27 /export/home/ s/id28 /export/home/ s/id33 /export/home/ s/id34 /export/home/ s/id35 /export/home/ s/id36 /export/home/'
Run the iostat -xc 15 command through the sed script by typing iostat -xc 15 | d2fs.server.
The options to the previous iostat -xc 15 | d2fs.server command are explained below.
Table 3-4 Options to the iostat -xc 15 | d2fs.server Command
-x |
Supplies extended disk statistics |
-c |
Reports the percentage of time the system was in user mode (us), system mode (sy), waiting for I/O (wt), and idling (id) |
15 |
Means disk statistics are gathered every 15 seconds |
The following explains the output and headings of the iostat -xc 15 | d2f2.server command.
% iostat -xc 15 | d2fs.server extended disk statistics cpu disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b us sy wt id export/home 0.1 0.2 0.4 1.0 0.0 0.0 24.1 0 1 0 11 2 86 export/home 0.1 0.2 0.4 0.9 0.0 0.0 24.5 0 1 export/home 0.1 0.2 0.4 1.0 0.0 0.0 31.1 0 1 export/home 0.1 0.2 0.4 1.0 0.0 0.0 24.6 0 1 export/home 0.1 0.2 0.4 0.9 0.0 0.0 24.8 0 1 id20 0.0 0.0 0.1 0.3 0.0 0.0 25.4 0 0 export/home 0.0 0.0 0.1 0.2 0.0 0.0 31.0 0 0 export/home 0.0 0.0 0.1 0.2 0.0 0.0 30.9 0 0 export/home 0.0 0.0 0.1 0.3 0.0 0.0 31.6 0 0 export/home 0.0 0.0 0.0 0.0 0.0 0.0 5.1 0 0 export/home 0.0 0.0 0.1 0.2 0.0 0.0 36.1 0 0 export/home 0.0 0.2 0.1 0.3 0.0 0.0 25.3 0 1 export/home 0.0 0.2 0.1 0.4 0.0 0.0 26.5 0 1 export/home 0.0 0.0 0.1 0.3 0.0 0.0 35.6 0 0 var/mail 0.0 0.1 0.2 0.7 0.0 0.0 47.8 0 0 id9 0.1 0.2 0.4 1.0 0.0 0.0 24.8 0 1 sd15 0.1 0.1 0.3 0.5 0.0 0.0 84.4 0 0 sd16 0.1 0.1 0.3 0.5 0.0 0.0 93.0 0 0 sd17 0.1 0.1 0.3 0.5 0.0 0.0 79.7 0 0 sd18 0.1 0.1 0.3 0.5 0.0 0.0 95.3 0 0 sd6 0.0 0.0 0.0 0.0 0.0 0.0 109.1 0 0
The following is a description of the output for the iostat -xc 15 | d2fs.server command.
Table 3-5 Output for the iostat -xc 15 Command
disk |
Name of disk device |
r/s |
Average read operations per second |
w/s |
Average write operations per second |
Kr/s |
Average Kbytes read per second |
Kw/s |
Average Kbytes written per second |
wait |
Number of requests outstanding in the device driver queue |
actv |
Number of requests active in the disk hardware queue |
%w |
Occupancy of the wait queue |
%b |
Occupancy of the active queue--device busy |
svc_t |
Average service time in milliseconds for a complete disk request; includes wait time, active queue time, seek rotation, and transfer latency |
us |
CPU time |
sy |
System time |
wt |
Wait for I/O time |
id |
Idle time |
Run the sar -d 15 1000 command through the sed script by typing sar -d 15 1000 | d2fs.server.
server% sar -d 15 1000 | d2fs.server 12:44:17 device %busy avque r+w/s blks/s avwait avserv 12:44:18 export/home 0 0.0 0 0 0.0 0.0 export/home 0 0.0 0 0 0.0 0.0 export/home 0 0.0 0 0 0.0 0.0 export/home 0 0.0 0 0 0.0 0.0 export/home 0 0.0 0 0 0.0 0.0 id20 0 0.0 0 0 0.0 0.0 export/home 0 0.0 0 0 0.0 0.0 export/home 0 0.0 0 0 0.0 0.0 export/home 0 0.0 0 0 0.0 0.0 export/home 0 0.0 0 0 0.0 0.0 export/home 0 0.0 0 0 0.0 0.0 export/home 0 0.0 0 0 0.0 0.0 export/home 0 0.0 0 0 0.0 0.0 export/home 0 0.0 0 0 0.0 0.0 var/mail 0 0.0 0 0 0.0 0.0 export/home 0 0.0 0 0 0.0 0.0 sd15 7 0.1 4 127 0.0 17.6 sd16 6 0.1 3 174 0.0 21.6 sd17 5 0.0 3 127 0.0 15.5
In the sar -d option reports the activities of the disk devices. The 15 means that data is collected every 15 seconds. The 1000 means that data is collected 1000 times. The following terms and abbreviations explain the output.
Table 3-6 Output of the sar -d 15 1000 | d2fs.server Command
device |
Name of the disk device being monitored |
%busy |
Percentage of time the device spent servicing a transfer request (same as iostat %b) |
avque |
Average number of requests outstanding during the monitored period (measured only when the queue was occupied) (same as iostat actv) |
r+w/s |
Number of read and write transfers to the device, per second (same as iostat r/s + w/s) |
blks/s |
Number of 512-byte blocks transferred to the device, per second (same as iostat 2*(Kr/s + Kw/s)) |
avwait |
Average time, in milliseconds, that transfer requests wait in the queue (measured only when the queue is occupied) (iostat wait gives the length of this queue) |
avserv |
Average time, in milliseconds, for a transfer request to be completed by the device (for disks, this includes seek, rotational latency, and data transfer times) |
For file systems that are exported via NFS, check the %b/%busy value.
If it is more than 30 percent, check the svc_t value.
The %b value, the percentage of time the disk is busy, is returned by the iostat command. The %busyvalue, the percentage of time the device spent servicing a transfer request, is returned by the sar command. If the %b and the %busy values are greater than 30 percent, go to Step e. Otherwise, go to Step 9.
Calculate the svc_t/(avserv + avwait) value.
The svc_t value, the average service time in milliseconds, is returned by the iostat command. The avserv value, the average time (milliseconds) for a transfer request to be completed by the device, is returned by the sar command. Add the avwait to get the same measure as svc_t.
If the svc_t value, the average total service time in milliseconds, is more than 40 ms, the disk is taking a long time to respond. An NFS request with disk I/O will appear to be slow by the NFS clients. The NFS response time should be less than 50 ms on average, to allow for NFS protocol processing and network transmission time. The disk response should be less than 40 ms.
The average service time in milliseconds is a function of the disk. If you have fast disks, the average service time should be less if you have slow disks.
Collect data on a regular basis by uncommenting the lines in the user's sys crontab file so that sar collects the data for one month.
Performance data will be continuously collected to provide a history of sar results.
root# crontab -l sys #ident "@(#)sys 1.5 92/07/14 SMI" /* SVr4.0 1.2 */ # # The sys crontab should be used to do performance collection. # See cron and performance manual pages for details on startup. 0 * * * 0-6 /usr/lib/sa/sa1 20,40 8-17 * * 1-5 /usr/lib/sa/sa1 5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A
Performance data is continuously collected to provide you with a history of sar results.
A few hundred Kbytes will be used at most in /var/adm/sa.
Spread the load over the disks.
Stripe the file system over multiple disks if the disks are overloaded using Solstice DiskSuite or Online: DiskSuite. Reduce the number of accesses and spread peak access loads out in time using a Prestoserve write cache (see "Using Solstice DiskSuite or Online: DiskSuite to Spread Disk Access Load" in Chapter 4, Configuring the Server and the Client to Maximize NFS Performance.)
Adjust the buffer cache if you have read-only file systems (see "Adjusting the Buffer Cache (bufhwm)" in Chapter 4, Configuring the Server and the Client to Maximize NFS Performance.
Display server statistics to identify NFS problems by typing nfsstat -s.
The -s option displays server statistics.
server% nfsstat -s Server rpc: calls badcalls nullrecv badlen xdrcall 480421 0 0 0 0 Server nfs: calls badcalls 480421 2 null getattr setattr root lookup readlink read 95 0% 140354 29% 10782 2% 0 0% 110489 23% 286 0% 63095 13% wrcache write create remove rename link symlink 0 0% 139865 29% 7188 1% 2140 0% 91 0% 19 0% 231 0% mkdir rmdir readdir statfs 435 0% 127 0% 2514 1% 2710 1%
The NFS server display shows the number of NFS calls received (calls) and rejected (badcalls), and the counts and percentages for the various calls that were made. The number and percentage of calls returned by the nfsstat -s command are shown in the following table.
The following terms explain the output of the nfsstat -s command.
Table 3-7 Description of the Output of the nfsstat -s Command
calls |
Total number of RPC calls received |
badcalls |
Total number of calls rejected by the RPC layer (the sum of badlen and xdrcall) |
nullrecv |
Number of times an RPC call was not available when it was thought to be received |
badlen |
Number of RPC calls with a length shorter than a minimum-sized RPC call |
xdrcall |
Number of RPC calls whose header could not be XDR decoded |
Table 3-8 explains the nfsstat -s command output and what actions to take.
Table 3-8 Description of the nfsstat -s Command Output
If |
Then |
---|---|
writes > 5%** |
Install a Prestoserve NFS accelerator (SBus card or NVRAM-NVSIMM) for peak performance. See "Prestoserve NFS Accelerator" in Chapter 4, Configuring the Server and the Client to Maximize NFS Performance. |
There are any badcalls |
Badcalls are calls rejected by the RPC layer and are the sum of badlen and xdrcall. The network may be overloaded. Identify an overloaded network using network interface statistics. |
readlink > 10% of total lookup calls on NFS servers |
NFS clients are using excessive symbolic links that are on the file systems exported by the server. Replace the symbolic link with a directory. Mount both the underlying file system and the symbolic link's target on the NFS client. See Step 11. |
getattr > 40% |
Increase the client attribute cache using the actimeo option. Make sure that the DNLC and inode caches are large. Use vmstat -s to determine the percent hit rate (cache hits) for the DNLC and, if needed, increase ncsize in the /etc/system file. See Step 12 later in this chapter and "Directory Name Lookup Cache (DNLC)"in Chapter 4, Configuring the Server and the Client to Maximize NFS Performance |
[** The number of writes, 29% is very high.] |
Eliminate symbolic links.
If symlink is greater than ten percent in the output of the nfsstat -s command, eliminate symbolic links. In the following example, /usr/tools/dist/sun4 is a symbolic link for /usr/dist/bin.
Type rm /usr/dist/bin to eliminate the symbolic link for /usr/dist/bin.
# rm /usr/dist/bin
Make /usr/dist/bin a directory by typing mkdir /usr/dist/bin.
# mkdir /usr/dist/bin
Mount the directories and type the following:
client# mount server: /usr/dist/bin client# mount server: /usr/tools/dist/sun4 client# mount
View the Directory Name Lookup Cache (DNLC) hit rate by typing vmstat -s.
This command returns the hit rate (cache hits).
% vmstat -s ... lines omitted 79062 total name lookups (cache hits 94%) 16 toolong
If the hit rate is less than 90 percent and there is no problem with the number of longnames, increase the ncsize variable in the /etc/system file by typing:
set ncsize=5000
Directory names less than 30 characters long are cached and names that are too long to be cached are also reported.
The default value of ncsize is: ncsize (name cache) = 17 * maxusers + 90
For NFS server benchmarks ncsize has been set as high as 16000.
For maxusers = 2048 ncsize would be set at 34906.
For more information on the Directory Name Lookup Cache, see "Directory Name Lookup Cache (DNLC)" in Chapter 4, Configuring the Server and the Client to Maximize NFS Performance.
Reboot the system.
Check the system state if the system has a Prestoserve NFS accelerator by typing /usr/sbin/presto. Verify that it is in the UP state.
server% /usr/sbin/presto state = UP, size = 0xfff80 bytes statistics interval: 1 day, 23:17:50 (170270 seconds) write cache efficiency: 65% All 2 batteries are ok
If it is in the error state, see the Prestoserve User's Guide.
This completes the steps you use to check the server. Continue by checking each client.