5 Buffers and Buffering

Data buffering and management is an essential service that is provided by the DTrace framework for it clients, for example, the dtrace command. This chapter explores data buffering in detail and describes options that you can use to change DTrace's buffer management policies.

Principal Buffers

By default, the principal buffer is present in every DTrace invocation and is the buffer to which tracing actions record their data. These actions include the following: printa, printf, stack, trace, and tracemem.

The principal buffers are always allocated on a per-CPU basis. This policy is not tunable, but you can restrict tracing and buffer allocation to a single CPU by using the cpu option.

Principal Buffer Policies

DTrace permits tracing in highly constrained contexts in the kernel. In particular, DTrace permits tracing in contexts in which kernel software might not reliably allocate memory. One consequence of this flexibility of context is that there always exists a possibility that DTrace might attempt to trace data when there is no space available. DTrace must have a policy to deal with such situations as they arise. However, you might choose to tune the policy based on the needs of a given experiment. Sometimes the appropriate policy might be to discard the new data. Other times, it might be desirable to reuse the space containing the oldest recorded data to enable the tracing of new data. Most often, the desired policy is to minimize the likelihood of running out of available space in the first place. To accommodate these varying demands, DTrace supports several different buffer policies. This support is implemented with the bufpolicy option and can be set on a per-consumer basis. See Options and Tunables for more details.

switch Policy

By default, the principal buffer has a switch buffer policy. Under this policy, per-CPU buffers are allocated in pairs, where one buffer is active and the other buffer is inactive. When a DTrace consumer attempts to read a buffer, the kernel first switches the inactive and active buffers. Buffer switching is done in such a manner that there is no window in which tracing data can be lost. When the buffers are switched, the newly inactive buffer is copied out to the DTrace consumer. This policy assures that the consumer always sees a self-consistent buffer. Note that a buffer is never simultaneously traced to and copied out. This technique also avoids introducing a window of time in which tracing is paused or otherwise prevented. The rate at which the buffer is switched and read out is controlled by the consumer with the switchrate option. As with any rate option, switchrate can be specified with the any time suffix, but defaults to rate-per-second. For more information about switchrate and other options, see Options and Tunables.

Under the switch policy, if a given enabled probe would trace more data than there is space available in the active principal buffer, the data is dropped and a per-CPU drop count is incremented. In the event of one or more drops, dtrace displays a message similar to the following:

dtrace: 11 drops on CPU 0

If a given record is larger than the total buffer size, the record is dropped, regardless of buffer policy. You can reduce or eliminate drops, either by increasing the size of the principal buffer with the bufsize option, or by increasing the switching rate with the switchrate option.

Under the switch policy, scratch memory for DTrace subroutines is allocated out of the active buffer.

fill Policy

For some problems, you might want to use a single, in-kernel buffer. While this approach can be implemented with the switch policy and appropriate D constructs by incrementing a variable in D and predicating an exit action appropriately, such an implementation does not eliminate the possibility of drops. To request a single, large in-kernel buffer and continue tracing until one or more of the per-CPU buffers has filled, use the fill buffer policy. Under this policy, tracing continues until an enabled probe attempts to trace more data than can fit in the remaining principal buffer space. When insufficient space remains, the buffer is marked as filled and the consumer is notified that at least one of its per-CPU buffers is filled. When dtrace detects a single filled buffer, tracing is stopped, all buffers are processed, and dtrace exits. No further data is traced to a filled buffer even if the data would fit in the buffer.

To use the fill policy, set the bufpolicy option to fill. For example, the following command traces every system call entry into a per-CPU 2 KB buffer with the buffer policy set to fill:

# dtrace -n syscall:::entry -b 2k -x bufpolicy=fill

fill Policy and END Probes

END probes usually do not fire until tracing has been explicitly stopped by the DTrace consumer. END probes are guaranteed to fire only on one CPU, but the CPU on which the probe fires is undefined. With fill buffers, tracing is explicitly stopped when at least one of the per-CPU principal buffers has been marked as filled. If the fill policy is selected, the END probe might fire on a CPU that has a filled buffer. To accommodate END tracing in fill buffers, DTrace calculates the amount of space that is potentially consumed by END probes and subtracts this space from the size of the principal buffer. If the net size is negative, DTrace does not start and dtrace outputs the following error message:

dtrace: END enablings exceed size of principal buffer

The reservation mechanism ensures that a full buffer always has sufficient space for any END probes.

ring Policy

The DTrace ring buffer policy assists with tracing the events leading up to a failure. If reproducing the failure takes hours or days, you might want to keep only the most recent data. When a principal buffer has filled, tracing wraps around to the first entry, overwriting older tracing data. You establish the ring buffer by specifying bufpolicy=ring as follows:

# dtrace -s foo.d -x bufpolicy=ring

When used to create a ring buffer, dtrace does not display any output until the process is terminated. At that time, the ring buffer is consumed and processed. The dtrace command processes each ring buffer in CPU order. Within a CPU's buffer, trace records are displayed in order from oldest to youngest. Just as with the switch buffering policy, no ordering exists between records from different CPUs. If such an ordering is required, you should trace the timestamp variable as part of your tracing request.

The following example demonstrates the use of a #pragma option directive to enable ring buffering:

#pragma D option bufpolicy=ring
#pragma D option bufsize=16k

syscall:::entry
/execname == $1/
{
  trace(timestamp);
}

syscall::exit:entry
{
  exit(0);
}

Other Buffers

Principal buffers exist in every DTrace enabling. Beyond principal buffers, some DTrace consumers might have additional in-kernel data buffers, such as an aggregation buffer, and one or more speculative buffers. See Aggregations and Speculative Tracing for more details.

Buffer Sizes

The size of each buffer can be tuned on a per-consumer basis. Separate options are provided to tune each buffer size, as shown in the following table.

Buffer Size Option

Aggregation

aggsize

Principal

bufsize

Speculative

specsize

Each of these options is set with a value that denotes the size. As with any size option, the value might have an optional size suffix. See Options and Tunables for more details.

For example, you would set the buffer size to 10 megabytes on the dtrace command line as follows:

# dtrace -P syscall -x bufsize=10m

Alternatively, you can use the -b option with the dtrace command:

# dtrace -P syscall -b 10m

Finally, you can set bufsize by using a pragma, for example:

#pragma D option bufsize=10m

The buffer size that you select denotes the size of the buffer on each CPU. Moreover, for the switch buffer policy, bufsize denotes the size of each buffer on each CPU. The default buffer size is four megabytes.

Buffer Resizing Policy

Occasionally, the system might not have adequate free kernel memory to allocate a buffer of the desired size, either because not enough memory is available or because the DTrace consumer has exceeded one of the tunable limits that are described in Options and Tunables. You can configure the policy for buffer allocation failure by using the bufresize option, which defaults to auto. Under the auto buffer resize policy, the size of a buffer is halved until a successful allocation occurs. dtrace generates a message if a buffer, as allocated, is smaller than the requested size, as shown in the following example:

# dtrace -P syscall -b 4g
dtrace: description 'syscall' matched 430 probes
dtrace: buffer size lowered to 128m ...

Or, a message similar to the following is generated:

# dtrace -P syscall'{@a[probefunc] = count()}' -x aggsize=1g
dtrace: description 'syscall' matched 430 probes
dtrace: aggregation size lowered to 128m ...

Alternatively, you can require manual intervention after buffer allocation failure by setting bufresize to manual. Under this policy, an allocation failure prevents DTrace from starting:

# dtrace -P syscall -x bufsize=1g -x bufresize=manual
dtrace: description 'syscall' matched 430 probes
dtrace: could not enable tracing: Not enough space
#

The buffer resizing policy for all buffers (principal, speculative and aggregation) is dictated by the bufresize option.