Chapter 15 Performance Considerations

DTrace creates additional work in the system. Therefore, enabling DTrace always affects system performance in some way. Often, this effect is negligible, but it can become substantial if many probes with significant enablings are enabled. This chapter describes some techniques for minimizing the performance effect of DTrace.

15.1 Limit Enabled Probes

Dynamic instrumentation techniques enable DTrace to provide unparalleled tracing coverage of the kernel and arbitrary user processes. While this coverage provides revolutionary new insight into system behavior, it also can cause enormous probe effect. If tens of thousands or hundreds of thousands of probes are enabled, the effect on the system can easily be substantial. Therefore, you should only enable as many probes as you need to solve a problem. For example, you should not enable all syscall probes if a more concise enabling can answer your question. Your question might require that you concentrate on a specific module of interest or a specific function.

Caution

When using the pid provider, be especially careful. Because the pid provider can instrument every instruction, you could enable millions of probes in an application and therefore slow the target process to a crawl.

You can also use DTrace in situations where large numbers of probes must be enabled to answer a question. Enabling a large number of probes might slow down the system significantly, but it never induces fatal failure on the system. You should therefore not hesitate to enable many probes, if so required.

15.2 Using Aggregations

As discussed in Chapter 3, Aggregations, DTrace aggregations provide a scalable way to aggregate data. Associative arrays might appear to offer functionality that is similar to aggregations, but because general-purpose variables are global by nature, associative arrays cannot offer the linear scalability of aggregations. Therefore, the preference is to use aggregations over associative arrays whenever possible. For example, the following D program uses an associative array to aggregate data:

syscall:::entry
{
  totals[execname]++;
}

syscall::rexit:entry
{
  printf("%40s %d\n", execname, totals[execname]);
  totals[execname] = 0;
}

Whereas, the following D program is preferred, as it uses an aggregation to achieve the same result:

syscall:::entry
{
  @totals[execname] = count();
}

END
{
  printa("%40s %@d\n", @totals);
}

15.3 Using Cacheable Predicates

You use DTrace predicates to filter unwanted data from the experiment by tracing data only if a specified condition is found to be true. When enabling many probes, you generally use predicates of a form that identifies a specific thread, or threads of interest, such as /self->traceme/ or /pid == 12345/. Although many of these predicates evaluate to a false value for most threads in most probes, the evaluation itself can become costly when done for many thousands of probes. To reduce this cost, DTrace caches the evaluation of a predicate if it includes only thread-local variables, such as /self->traceme/, or for immutable variables, such as /pid == 12345/. The cost of evaluating a cached predicate is much less than the cost of evaluating a non-cached predicate, especially if the predicate involves thread-local variables, string comparisons, or other relatively costly operations. While predicate caching is transparent to the user, it does require some guidelines for constructing optimal predicates. Some guidelines for constructing optimal predicates are outlined in the following table.

Cacheable

Uncacheable

self->mumble

mumblecurthread

mumblepid

tid

execname

curpsinfo->pr_fname

((struct task_struct *)curthread)->comm

pid

curpsinfo->pr_pid

((struct task_struct *)curthread)->pid

tid

curlwpsinfo->pr_lwpid

((struct task_struct *)curthread)->pid

curthread

curthread->any_member

curlwpsinfo->any_member

curpsinfo->any_member

The following example uses an associative array in the predicate and is not cacheable:

syscall::read:entry
{
  follow[pid, tid] = 1;
}

lockstat:::
/follow[pid, tid]/
{}

syscall::read:return
/follow[pid, tid]/
{
  follow[pid, tid] = 0;
}

Using a cacheable, thread-local variable, per the following example, is preferable:

syscall::read:entry
{
  self->follow = 1;
}

lockstat:::
/self->follow/
{}

syscall::read:return
/self->follow/
{
  self->follow = 0;
}

For a predicate to be cacheable, it must consist exclusively of cacheable expressions. All of the following predicates all cacheable:

/execname == "myprogram"/

/execname == $$1/

/pid == 12345/

/pid == $1/

/self->traceme == 1/

The following examples, which use global variables, are not cacheable:

/execname == one_to_watch/

/traceme[execname]/

/pid == pid_i_care_about/

/self->traceme == my_global/