fbt and Tail-Call Optimization

When one function ends by calling another function, the compiler can engage in tail-call optimization, in which the function being called reuses the caller's stack frame. This procedure is most commonly used in the SPARC architecture, where the compiler reuses the caller's register window in the function being called in order to minimize register window pressure.

The presence of tail-call optimization causes the return probe of the calling function to fire before the entry probe of the called function. This ordering can lead to confusion. For example, if you wanted to record all functions called from a particular function and any functions that this function calls, you might use the following script:

fbt::foo:entry
{
        self->traceme = 1;
}

fbt:::entry
/self->traceme/
{
        printf("called %s", probefunc);
}

fbt::foo:return
/self->traceme/
{
        self->traceme = 0;
}

However, if foo ends in an optimized tail-call, the tail-called function, and therefore any functions that it calls, will not be captured. The kernel cannot be dynamically de-optimized, and DTrace does not engage in a lie about how code is structured. Therefore, you must be aware of when tail-call optimization might be used.

Tail-call optimization is likely to be used in source code similar to the following example:

 return (bar());

Tail-call optimization can also be used in a source code similar to the following example:

(void) bar();
        return;

Conversely, function source code that ends like the following example cannot have its call to bar optimized, because the call to bar is not a tail-call:

bar();
        return (rval);

You can determine whether a call has been tail-call optimized using the following techniques:

  • While running DTrace, trace arg0 of the return probe in question. arg0 contains the offset of the returning instruction in the function.

  • After DTrace has stopped, use mdb to look at the function. If the traced offset contains a call to another function instead of an instruction to return from the function, the call has been tail-call optimized.

Due to the instruction set architecture, tail-call optimization is far more common on SPARC systems than on x86 systems. The following example uses mdb to discover tail-call optimization in the kernel's dup function:

# dtrace -q -n fbt::dup:return'{printf("%s+0x%x", probefunc, arg0);}'

While this command is running, run a program that performs a dup(2), such as a bash process. The preceding command should provide output similar to the following example:

dup+0x10
^C

Now examine the function with mdb:

# echo "dup::dis" | mdb -k
dup:                            sra       %o0, 0, %o0
dup+4:                          mov       %o7, %g1
dup+8:                          clr       %o2
dup+0xc:                        clr       %o1
dup+0x10:                       call      -0x1278       <fcntl>
dup+0x14:                       mov       %g1, %o7

The output shows that dup+0x10 is a call to the fcntl() function and not a ret instruction. Therefore, the call to fcntl is an example of tail-call optimization.