Solaris Dynamic Tracing Guide

Tail-call Optimization

When one function ends by calling another function, the compiler can engage in tail-call optimization, in which the function being called reuses the caller's stack frame. This procedure is most commonly used in the SPARC architecture, where the compiler reuses the caller's register window in the function being called in order to minimize register window pressure.

The presence of this optimization causes the return probe of the calling function to fire before the entry probe of the called function. This ordering can lead to quite a bit of confusion. For example, if you wanted to record all functions called from a particular function and any functions that this function calls, you might use the following script:

fbt::foo:entry
{
	self->traceme = 1;
}

fbt:::entry
/self->traceme/
{
	printf("called %s", probefunc);
}

fbt::foo:return
/self->traceme/
{
	self->traceme = 0;
}

However, if foo() ends in an optimized tail-call, the tail-called function, and therefore any functions that it calls, will not be captured. The kernel cannot be dynamically deoptimized on the fly, and DTrace does not wish to engage in a lie about how code is structured. Therefore, you should be aware of when tail-call optimization might be used.

Tail-call optimization is likely to be used in source code similar to the following example:

	return (bar());

Or in source code similar to the following example:

	(void) bar();
	return;

Conversely, function source code that ends like the following example cannot have its call to bar() optimized, because the call to bar() is not a tail-call:

	bar();
	return (rval);

You can determine whether a call has been tail-call optimized using the following technique:

While running DTrace, trace arg0 of the return probe in question. arg0 contains the offset of the returning instruction in the function.
After DTrace has stopped, use mdb(1) to look at the function. If the traced offset contains a call to another function instead of an instruction to return from the function, the call has been tail-call optimized.

Due to the instruction set architecture, tail-call optimization is far more common on SPARC systems than on x86 systems. The following example uses mdb to discover tail-call optimization in the kernel's dup() function:

# dtrace -q -n fbt::dup:return'{printf("%s+0x%x", probefunc, arg0);}'

While this command is running, run a program that performs a dup(2), such as a bash process. The above command should provide output similar to the following example:

dup+0x10
^C

Now examine the function with mdb:

# echo "dup::dis" | mdb -k
dup:                            sra       %o0, 0, %o0
dup+4:                          mov       %o7, %g1
dup+8:                          clr       %o2
dup+0xc:                        clr       %o1
dup+0x10:                       call      -0x1278       <fcntl>
dup+0x14:                       mov       %g1, %o7

The output shows that dup+0x10 is a call to the fcntl() function and not a ret instruction. Therefore, the call to fcntl() is an example of tail-call optimization.