Solaris Tunable Parameters Reference Manual

UFS

bufhwm

Description

Maximum amount of memory for caching I/O buffers. The buffers are used for writing file system metadata (superblocks, inodes, indirect blocks, and directories). Buffers are allocated as needed until the amount to be allocated would exceed bufhwm. At this point, enough buffers are reclaimed to satisfy the request.

For historical reasons, this parameter does not require the ufs: prefix.

Data Type

Signed integer

Default

2% of physical memory

Range

80 Kbytes to 20% of physical memory

Units

Kbytes

Dynamic?

No. Value is used to compute hash bucket sizes and is then stored into a data structure that adjusts the value in the field as buffers are allocated and deallocated. Attempting to adjust this value without following the locking protocol on a running system can lead to incorrect operation.

Validation

If bufhwm is less than 80 Kbytes or greater than the lesser of 20% of physical memory or twice the current amount of kernel heap, it is reset to the lesser of 20% of physical memory or twice the current amount of kernel heap. The following message appears on the system console and in the /var/adm/messages file.


"binit: bufhwm out of range (value attempted). Using N."

Value attempted refers to the value entered in /etc/system or by using the kadb -d command. N is the value computed by the system based on available system memory.

When to Change

Since buffers are only allocated as they are needed, the overhead from the default setting is the allocation of a number of control structures to handle the maximum possible number of buffers. These structures consume 52 bytes per potential buffer on a 32–bit kernel and 104 bytes per potential buffer on a 64–bit kernel. On a 512 Mbyte 64–bit kernel this consumes 104*10144 bytes, or 1 Mbyte. The header allocations assumes buffers are 1 Kbyte in size, although in most cases, the buffer size is larger.

The amount of memory, which has not been allocated in the buffer pool, can be found by looking at the bfreelist structure in the kernel with a kernel debugger. The field of interest in the structure is bufsize, which is the possible remaining memory in bytes. Looking at it with the buf macro by using mdb:


# mdb -k
Loading modules: [ unix krtld genunix ip nfs ipc ]
> bfreelist$<buf
bfreelist:
[ elided ]
bfreelist + 0x78:	bufsize			[ elided ]
				 	      75734016

bufhwm on this system, with 6 Gbytes of memory, is 122277. It is not directly possible to determine the number of header structures used since the actual buffer size requested is usually larger than 1 Kbyte. However, some space might be profitably reclaimed from control structure allocation for this system.

The same structure on the 512 Mbyte system shows that only 4 Kbytes of 10144 Kbytes has not been allocated. When the biostats kstat is examined with kstat -n biostats, it is seen that the system had a reasonable ratio of buffer_cache_hits to buffer_cache_lookups as well. This indicates that the default setting is reasonable for that system.

Commitment Level

Unstable

ndquot

Description

Number of quota structures for the UFS file system that should be allocated. Relevant only if quotas are enabled on one or more UFS file systems. Because of historical reasons, the ufs: prefix is not needed.

Data Type

Signed integer

Default

((maxusers x 40) / 4) + max_nprocs

Range

0 to MAXINT

Units

Quota structures

Dynamic?

No

Validation

None. Excessively large values hang the system.

When to Change

When the default number of quota structures is not enough. This situation is indicated by the following message displayed on the console or written in the message log.


dquot table full
Commitment Level

Unstable

ufs_ninode

Description

Number of inodes to be held in memory. Inodes are cached globally (for UFS), not on a per-file system basis.

A key variable in this situation is ufs_ninode. This parameter is used to compute two key limits that affect the handling of inode caching. A high watermark of ufs_ninode / 2 and a low water mark of ufs_ninode / 4 are computed.

When the system is done with an inode, one of two things can happen:

  1. The file referred to by the inode is no longer on the system so the inode is deleted. After it is deleted, the space goes back into the inode cache for use by another inode (which is read from disk or created for a new file).

  2. The file still exists but is no longer referenced by a running process. The inode is then placed on the idle queue. Any referenced pages are still in memory.

When inodes are idled, the kernel defers the idling process to a later time. If a file system is a logging file system the kernel also defers deletion of inodes. Two kernel threads do this. Each thread is responsible for one of the queues.

When the deferred processing is done, the system drops the inode onto either a delete or idle queue, each of which has a thread that can run to process it. When the inode is placed on the queue, the queue occupancy is checked against the low watermark. If it is in excess of the low watermark, the thread associated with the queue is awakened. After it is awakened, the thread runs through the queue and forces any pages associated with the inode out to disk and frees the inode. The thread stops when it has removed 50% of the inodes on the queue at the time it was awakened.

A second mechanism is in place if the idle thread is unable to keep up with the load. When the system needs to find a vnode, it goes through the ufs_vget routine. The first thing vget does is check the length of the idle queue. If the length is above the high watermark, then it pops two inodes off the idle queue and “idles” them (flushes pages and frees inodes). It does this before it gets an inode for its own use.

The system does attempt to optimize by placing inodes with no in-core pages at the head of the idle list and inodes with pages at the end of the idle list, but it does no other ordering of the list. Inodes are always removed from the front of the idle queue.

The only time that inodes are removed from the queues as a whole is when a sync, unmount, or remount occur.

For historical reasons, this parameter does not require the ufs: prefix.

Data Type

Signed integer

Default

ncsize

Range

0 to MAXINT

Units

Inodes

Dynamic?

Yes

Validation

If ufs_ninode is less than or equal to zero, the value is set to ncsize.

When to Change

When the default number of inodes is not enough. If the maxsize reached field as reported by kstat -n inode_cache is larger than the maxsize field in the kstat, the value of ufs_ninode may be too small. Excessive inode idling (described previously) can also be a problem.

This situation can be identified by using kstat -n inode_cache to look at the inode_cache kstat. Thread idles are inodes idled by the background threads while vget idles are idles by the requesting process before using an inode.

Commitment Level

Unstable

ufs:ufs_WRITES

Description

If ufs_WRITES is non-zero, the number of bytes outstanding for writes on a file is checked. See ufs_HW subsequently to determine whether the write should be issued or should be deferred until only ufs_LW bytes are outstanding. The total number of bytes outstanding is tracked on a per-file basis so that if the limit is passed for one file, it won't affect writes to other files.

Data Type

Signed integer

Default

1 (enabled)

Range

0 (disabled), 1 (enabled)

Units

Toggle (on/off)

Dynamic?

Yes

Validation

None

When to Change

When you want UFS write throttling turned off entirely. If sufficient I/O capacity does not exist, disabling this parameter can result in long service queues for disks.

Commitment Level

Unstable

ufs:ufs_LW and ufs:ufs_HW

Description

ufs_HW is the number of bytes outstanding on a single file barrier value. If the number of bytes outstanding is greater than this value and ufs_WRITES is set, then the write is deferred. The write is deferred by putting the thread issuing the write to sleep on a condition variable.

ufs_LW is the barrier for the number of bytes outstanding on a single file below which the condition variable on which other sleeping processes are toggled. When a write completes and the number of bytes is less than ufs_LW, then the condition variable is toggled, which causes all threads waiting on the variable to awaken and try to issue their writes.

Data Type

Signed integer

Default

8 x 1024 x 1024 for ufs_LW and 16 x 1024 x 1024 for ufs_HW

Range

0 to MAXINT

Units

Bytes

Dynamic?

Yes

Validation

None

Implicit

ufs_LW and ufs_HW have meaning only if ufs_WRITES is not equal to zero. ufs_HW and ufs_LW should be changed together to avoid needless churning when processes awake and find that they either cannot issue a write (when ufs_LW and ufs_HW are too close) or when they might have waited longer than necessary (when ufs_LW and ufs_HW are too far apart).

When to Change

Consider changing these values when file systems consist of striped volumes. The aggregate bandwidth available can easily exceed the current value of ufs_HW. Unfortunately, this is not a per-file system setting.

When ufs_throttles is a non-trivial number. ufs_throttles can currently be accessed only with a kernel debugger.

Commitment Level

Unstable