Solaris Dynamic Tracing Guide

Chapter 38 Performance Considerations

Because DTrace causes additional work in the system, enabling DTrace always affects system performance in some way. Often, this effect is negligible, but it can become substantial if many probes are enabled with costly enablings. This chapter describes techniques for minimizing the performance effect of DTrace.

Limit Enabled Probes

Dynamic instrumentation techniques enable DTrace to provide unparalleled tracing coverage of the kernel and of arbitrary user processes. While this coverage allows revolutionary new insight into system behavior, it also can cause enormous probe effect. If tens of thousands or hundreds of thousands of probes are enabled, the effect on the system can easily be substantial. Therefore, you should only enable as many probes as you need to solve a problem. You should not, for example, enable all FBT probes if a more concise enabling will answer your question. For example, your question might allow you to concentrate on a specific module of interest or a specific function.

When using the pid provider, you should be especially careful. Because the pid provider can instrument every instruction, you could enable millions of probes in an application, and therefore slow the target process to a crawl.

DTrace can also be used in situations where large numbers of probes must be enabled for a question to be answered. Enabling a large number of probes might slow down the system quite a bit, but it will never induce fatal failure on the machine. You should therefore not hesitate to enable many probes if required.

Use Aggregations

As discussed in Chapter 9, Aggregations, DTrace's aggregations allow for a scalable way of aggregating data. Associative arrays might appear to offer similar functionality to aggregations. However, by nature of being global, general-purpose variables, they cannot offer the linear scalability of aggregations. You should therefore prefer to use aggregations over associative arrays when possible. The following example is not recommended:

syscall:::entry
{
	totals[execname]++;
}

syscall::rexit:entry
{
	printf("%40s %d\n", execname, totals[execname]);
	totals[execname] = 0;
}

The following example is preferable:

syscall:::entry
{
	@totals[execname] = count();
}

END
{
	printa("%40s %@d\n", @totals);
}

Use Cacheable Predicates

DTrace predicates are used to filter unwanted data from the experiment by tracing data is only traced if a specified condition is found to be true. When enabling many probes, you generally use predicates of a form that identifies a specific thread or threads of interest, such as /self->traceme/ or /pid == 12345/. Although many of these predicates evaluate to a false value for most threads in most probes, the evaluation itself can become costly when done for many thousands of probes. To reduce this cost, DTrace caches the evaluation of a predicate if it includes only thread-local variables (for example, /self->traceme/) or immutable variables (for example, /pid == 12345/). The cost of evaluating a cached predicate is much smaller than the cost of evaluating a non-cached predicate, especially if the predicate involves thread-local variables, string comparisons, or other relatively costly operations. While predicate caching is transparent to the user, it does imply some guidelines for constructing optimal predicates, as shown in the following table:

Cacheable	Uncacheable
`self->mumble`	`mumble[curthread]`, `mumble[pid, tid]`
`execname`	`curpsinfo->pr_fname`, `curthread->t_procp->p_user.u_comm`
`pid`	`curpsinfo->pr_pid`, `curthread->t_procp->p_pipd->pid_id`
`tid`	`curlwpsinfo->pr_lwpid`, `curthread->t_tid`
`curthread`	`curthread->any member`, `curlwpsinfo->any member`, `curpsinfo->any member`

The following example is not recommended:

syscall::read:entry
{
	follow[pid, tid] = 1;
}

fbt:::
/follow[pid, tid]/
{}

syscall::read:return
/follow[pid, tid]/
{
	follow[pid, tid] = 0;
}

The following example using thread-local variables is preferable:

syscall::read:entry
{
	self->follow = 1;
}

fbt:::
/self->follow/
{}

syscall::read:return
/self->follow/
{
	self->follow = 0;
}

A predicate must consist exclusively of cacheable expressions in order to be cacheable. The following predicates are all cacheable:

/execname == "myprogram"/
/execname == $$1/
/pid == 12345/
/pid == $1/
/self->traceme == 1/

The following examples, which use global variables, are not cacheable:

/execname == one_to_watch/
/traceme[execname]/
/pid == pid_i_care_about/
/self->traceme == my_global/