Solaris Dynamic Tracing Guide

Chapter 20 `fbt` Provider

This chapter describes the Function Boundary Tracing (FBT) provider, which provides probes associated with the entry to and return from most functions in the Solaris kernel. The function is the fundamental unit of program text. In a well-designed system, each function performs a discrete and well-defined operation on a specified object or series of like objects. Therefore, even on the smallest Solaris systems, FBT will provide on the order of 20,000 probes.

Similar to other DTrace providers, FBT has no probe effect when it is not explicitly enabled. When enabled, FBT only induces a probe effect in probed functions. While the FBT implementation is highly specific to the instruction set architecture, FBT has been implemented on both SPARC and x86 platforms. For each instruction set, there are a small number of functions that do not call other functions and are highly optimized by the compiler (so-called leaf functions) that cannot be instrumented by FBT. Probes for these functions are not present in DTrace.

Effective use of FBT probes requires knowledge of the operating system implementation. Therefore, it is recommended that you use FBT only when developing kernel software or when other providers are not sufficient. Other DTrace providers, including syscall, sched, proc, and io, can be used to answer most system analysis questions without requiring operating system implementation knowledge.

Probes

FBT provides a probe at the boundary of most functions in the kernel. The boundary of a function is crossed by entering the function and by returning from the function. FBT thus provides two functions for every function in the kernel: one upon entry to the function, and one upon return from the function. These probes are named entry and return, respectively. The function name, and module name are specified as part of the probe. All FBT probes specify a function name and module name.

Probe arguments

`entry` probes

The arguments to entry probes are the same as the arguments to the corresponding operating system kernel function. These arguments may be accessed in a typed fashion by using the args[] array. These arguments may be accessed as int64_t's by using the arg0 .. argn variables.

`return` probes

While a given function only has a single point of entry, it may have many different points where it returns to its caller. You are usually interested in either the value that a function returned or the fact that the function returned at all rather than the specific return path taken. FBT therefore collects a function's multiple return sites into a single return probe. If the exact return path is of interest, you can examine the return probe args[0] value, which indicates the offset (in bytes) of the returning instruction in the function text.

If the function has a return value, the return value is stored in args[1]. If a function does not have a return value, args[1] is not defined.

Examples

You can use FBT to easily explore the kernel's implementation. The following example script records the first ioctl(2) from any xclock process and then follows the subsequent code path through the kernel:

/*
 * To make the output more readable, we want to indent every function entry
 * (and unindent every function return).  This is done by setting the
 * "flowindent" option.
 */
#pragma D option flowindent

syscall::ioctl:entry
/execname == "xclock" && guard++ == 0/
{
	self->traceme = 1;
	printf("fd: %d", arg0);
}

fbt:::
/self->traceme/
{}

syscall::ioctl:return
/self->traceme/
{
	self->traceme = 0;
	exit(0);
}

Running this script results in output similar to the following example:

# dtrace -s ./xioctl.d
dtrace: script './xioctl.d' matched 26254 probes
CPU FUNCTION                                 
  0  => ioctl                                 fd: 3
  0    -> ioctl                               
  0      -> getf                              
  0        -> set_active_fd                   
  0        <- set_active_fd                   
  0      <- getf                              
  0      -> fop_ioctl                         
  0        -> sock_ioctl                      
  0          -> strioctl                      
  0            -> job_control_type            
  0            <- job_control_type            
  0            -> strcopyout                  
  0              -> copyout                   
  0              <- copyout                   
  0            <- strcopyout                  
  0          <- strioctl                      
  0        <- sock_ioctl                      
  0      <- fop_ioctl                         
  0      -> releasef                          
  0        -> clear_active_fd                 
  0        <- clear_active_fd                 
  0        -> cv_broadcast                    
  0        <- cv_broadcast                    
  0      <- releasef                          
  0    <- ioctl                               
  0  <= ioctl

The output shows that an xclock process called ioctl() on a file descriptor that appears to be associated with a socket.

You can also use FBT when trying to understand kernel drivers. For example, the ssd(7D) driver has many code paths by which EIO may be returned. FBT can be easily used to determine the precise code path that resulted in an error condition, as shown in the following example:

fbt:ssd::return
/arg1 == EIO/
{
	printf("%s+%x returned EIO.", probefunc, arg0);
}

For more information on any one return of EIO, one may wish to speculatively trace all fbt probes, and then commit()(or discard()) based on the return value of a specific function. See Chapter 13, Speculative Tracing for details on speculative tracing.

Alternatively, you can use FBT to understand the functions called within a specified module. The following example lists all of the functions called in UFS:

# dtrace -n fbt:ufs::entry'{@a[probefunc] = count()}'
dtrace: description 'fbt:ufs::entry' matched 353 probes
^C
  ufs_ioctl                                                         1
  ufs_statvfs                                                       1
  ufs_readlink                                                      1
  ufs_trans_touch                                                   1
  wrip                                                              1
  ufs_dirlook                                                       1
  bmap_write                                                        1
  ufs_fsync                                                         1
  ufs_iget                                                          1
  ufs_trans_push_inode                                              1
  ufs_putpages                                                      1
  ufs_putpage                                                       1
  ufs_syncip                                                        1
  ufs_write                                                         1
  ufs_trans_write_resv                                              1
  ufs_log_amt                                                       1
  ufs_getpage_miss                                                  1
  ufs_trans_syncip                                                  1
  getinoquota                                                       1
  ufs_inode_cache_constructor                                       1
  ufs_alloc_inode                                                   1
  ufs_iget_alloced                                                  1
  ufs_iget_internal                                                 2
  ufs_reset_vnode                                                   2
  ufs_notclean                                                      2
  ufs_iupdat                                                        2
  blkatoff                                                          3
  ufs_close                                                         5
  ufs_open                                                          5
  ufs_access                                                        6
  ufs_map                                                           8
  ufs_seek                                                         11
  ufs_addmap                                                       15
  rdip                                                             15
  ufs_read                                                         15
  ufs_rwunlock                                                     16
  ufs_rwlock                                                       16
  ufs_delmap                                                       18
  ufs_getattr                                                      19
  ufs_getpage_ra                                                   24
  bmap_read                                                        25
  findextent                                                       25
  ufs_lockfs_begin                                                 27
  ufs_lookup                                                       46
  ufs_iaccess                                                      51
  ufs_imark                                                        92
  ufs_lockfs_begin_getpage                                        102
  bmap_has_holes                                                  102
  ufs_getpage                                                     102
  ufs_itimes_nolock                                               107
  ufs_lockfs_end                                                  125
  dirmangled                                                      498
  dirbadname                                                      498

If you know the purpose or arguments of a kernel function, you can use FBT to understand how or why the function is being called. For example, putnext(9F) takes a pointer to a queue(9S) structure as its first member. The q_qinfo member of the queue structure is a pointer to a qinit(9S) structure. The qi_minfo member of the qinit structure has a pointer to a module_info(9S) structure, which contains the module name in its mi_idname member. The following example puts this information together by using the FBT probe in putnext to track putnext(9F) calls by module name:

fbt::putnext:entry
{
	@calls[stringof(args[0]->q_qinfo->qi_minfo->mi_idname)] = count();
}

Running the above script results in output similar to the following example:

# dtrace -s ./putnext.d
^C

  iprb                                                              1
  rpcmod                                                            1
  pfmod                                                             1
  timod                                                             2
  vpnmod                                                            2
  pts                                                              40
  conskbd                                                          42
  kb8042                                                           42
  tl                                                               58
  arp                                                             108
  tcp                                                             126
  ptm                                                             249
  ip                                                              313
  ptem                                                            340
  vuid2ps2                                                        361
  ttcompat                                                        412
  ldterm                                                          413
  udp                                                             569
  strwhead                                                        624
  mouse8042                                                       726

You can also use FBT to determine the time spent in a particular function. The following example shows how to determine the callers of the DDI delaying routines drv_usecwait(9F) and delay(9F).

fbt::delay:entry,
fbt::drv_usecwait:entry
{
	self->in = timestamp
}

fbt::delay:return,
fbt::drv_usecwait:return
/self->in/
{
	@snoozers[stack()] = quantize(timestamp - self->in);
	self->in = 0;
}

This example script is particularly interesting to run during boot. Chapter 36, Anonymous Tracing describes the procedure for performing anonymous tracing during system boot. Upon reboot, you might see output similar to the following example:

# dtrace -ae

              ata`ata_wait+0x34
              ata`ata_id_common+0xf5
              ata`ata_disk_id+0x20
              ata`ata_drive_type+0x9a
              ata`ata_init_drive+0xa2
              ata`ata_attach+0x50
              genunix`devi_attach+0x75
              genunix`attach_node+0xb2
              genunix`i_ndi_config_node+0x97
              genunix`i_ddi_attachchild+0x4b
              genunix`devi_attach_node+0x3d
              genunix`devi_config_one+0x1d0
              genunix`ndi_devi_config_one+0xb0
              devfs`dv_find+0x125
              devfs`devfs_lookup+0x40
              genunix`fop_lookup+0x21
              genunix`lookuppnvp+0x236
              genunix`lookuppnat+0xe7
              genunix`lookupnameat+0x87
              genunix`cstatat_getvp+0x134

           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@@@@@@@@@@@@@@@@@@@@@                    4105     
            8192 |@@@@                                     783      
           16384 |@@@@@@@@@@@@@@                           2793     
           32768 |                                         16       
           65536 |                                         0


              kb8042`kb8042_wait_poweron+0x29
              kb8042`kb8042_init+0x22
              kb8042`kb8042_attach+0xd6
              genunix`devi_attach+0x75
              genunix`attach_node+0xb2
              genunix`i_ndi_config_node+0x97
              genunix`i_ddi_attachchild+0x4b
              genunix`devi_attach_node+0x3d
              genunix`devi_config_one+0x1d0
              genunix`ndi_devi_config_one+0xb0
              genunix`resolve_pathname+0xa5
              genunix`ddi_pathname_to_dev_t+0x16
              consconfig_dacf`consconfig_load_drivers+0x14
              consconfig_dacf`dynamic_console_config+0x6c
              consconfig`consconfig+0x8
              unix`stubs_common_code+0x3b

           value  ------------- Distribution ------------- count    
          262144 |                                         0        
          524288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      221      
         1048576 |@@@@                                     29       
         2097152 |                                         0        


              usba`hubd_enable_all_port_power+0xed
              usba`hubd_check_ports+0x8e
              usba`usba_hubdi_attach+0x275
              usba`usba_hubdi_bind_root_hub+0x168
              uhci`uhci_attach+0x191
              genunix`devi_attach+0x75
              genunix`attach_node+0xb2
              genunix`i_ndi_config_node+0x97
              genunix`i_ddi_attachchild+0x4b
              genunix`i_ddi_attach_node_hierarchy+0x49
              genunix`attach_driver_nodes+0x49
              genunix`ddi_hold_installed_driver+0xe3
              genunix`attach_drivers+0x28

           value  ------------- Distribution ------------- count    
        33554432 |                                         0        
        67108864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3        
       134217728 |                                         0

Tail-call Optimization

When one function ends by calling another function, the compiler can engage in tail-call optimization, in which the function being called reuses the caller's stack frame. This procedure is most commonly used in the SPARC architecture, where the compiler reuses the caller's register window in the function being called in order to minimize register window pressure.

The presence of this optimization causes the return probe of the calling function to fire before the entry probe of the called function. This ordering can lead to quite a bit of confusion. For example, if you wanted to record all functions called from a particular function and any functions that this function calls, you might use the following script:

fbt::foo:entry
{
	self->traceme = 1;
}

fbt:::entry
/self->traceme/
{
	printf("called %s", probefunc);
}

fbt::foo:return
/self->traceme/
{
	self->traceme = 0;
}

However, if foo() ends in an optimized tail-call, the tail-called function, and therefore any functions that it calls, will not be captured. The kernel cannot be dynamically deoptimized on the fly, and DTrace does not wish to engage in a lie about how code is structured. Therefore, you should be aware of when tail-call optimization might be used.

Tail-call optimization is likely to be used in source code similar to the following example:

	return (bar());

Or in source code similar to the following example:

	(void) bar();
	return;

Conversely, function source code that ends like the following example cannot have its call to bar() optimized, because the call to bar() is not a tail-call:

	bar();
	return (rval);

You can determine whether a call has been tail-call optimized using the following technique:

While running DTrace, trace arg0 of the return probe in question. arg0 contains the offset of the returning instruction in the function.
After DTrace has stopped, use mdb(1) to look at the function. If the traced offset contains a call to another function instead of an instruction to return from the function, the call has been tail-call optimized.

Due to the instruction set architecture, tail-call optimization is far more common on SPARC systems than on x86 systems. The following example uses mdb to discover tail-call optimization in the kernel's dup() function:

# dtrace -q -n fbt::dup:return'{printf("%s+0x%x", probefunc, arg0);}'

While this command is running, run a program that performs a dup(2), such as a bash process. The above command should provide output similar to the following example:

dup+0x10
^C

Now examine the function with mdb:

# echo "dup::dis" | mdb -k
dup:                            sra       %o0, 0, %o0
dup+4:                          mov       %o7, %g1
dup+8:                          clr       %o2
dup+0xc:                        clr       %o1
dup+0x10:                       call      -0x1278       <fcntl>
dup+0x14:                       mov       %g1, %o7

The output shows that dup+0x10 is a call to the fcntl() function and not a ret instruction. Therefore, the call to fcntl() is an example of tail-call optimization.

Assembly Functions

You might observe functions that seem to enter but never return or vice versa. Such rare functions are generally hand-coded assembly routines that branch to the middle of other hand-coded assembly functions. These functions should not impede analysis: the branched-to function must still return to the caller of the branched-from function. That is, if you enable all FBT probes, you should see the entry to one function and the return from another function at the same stack depth.

Instruction Set Limitations

Some functions cannot be instrumented by FBT. The exact nature of uninstrumentable functions is specific to the instruction set architecture.

x86 Limitations

Functions that do not create a stack frame on x86 systems cannot be instrumented by FBT. Because the register set for x86 is extraordinarily small, most functions must put data on the stack and therefore create a stack frame. However, some x86 functions do not create a stack frame and therefore cannot be instrumented. Actual numbers vary, but typically fewer than five percent of functions cannot be instrumented on the x86 platform.

SPARC Limitations

Leaf routines hand-coded in assembly language on SPARC systems cannot be instrumented by FBT. The majority of the kernel is written in C, and all functions written in C can be instrumented by FBT.

Breakpoint Interaction

FBT works by dynamically modifying kernel text. Because kernel breakpoints also work by modifying kernel text, if a kernel breakpoint is placed at an entry or return site before loading DTrace, FBT will refuse to provide a probe for the function, even if the kernel breakpoint is subsequently removed. If the kernel breakpoint is placed after loading DTrace, both the kernel breakpoint and the DTrace probe will correspond to the same point in text. In this situation, the breakpoint will trigger first, and then the probe will fire when the debugger resumes the kernel. It is recommended that kernel breakpoints not be used concurrently with DTrace. If breakpoints are required, use the DTrace breakpoint() action instead.

Module Loading

The Solaris kernel can dynamic load and unload kernel modules. When FBT is loaded and a module is dynamically loaded, FBT automatically provides new probes associated with the new module. If a loaded module has unenabled FBT probes, the module may be unloaded; the corresponding probes will be destroyed as the module is unloaded. If a loaded module has enabled FBT probes, the module is considered busy, and cannot be unloaded.

Stability

The FBT provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see Chapter 39, Stability.

Element	Name stability	Data stability	Dependency class
Provider	Evolving	Evolving	ISA
Module	Private	Private	Unknown
Function	Private	Private	Unknown
Name	Evolving	Evolving	ISA
Arguments	Private	Private	ISA

As FBT exposes the kernel implementation, nothing about it is Stable — and the Module and Function name and data stability are explicitly Private. The data stability for Provider and Name are Evolving, but all other data stabilities are Private: they are artifacts of the current implementation. The dependency class for FBT is ISA: while FBT is available on all current instruction set architectures, there is no guarantee that FBT will be available on arbitrary future instruction set architectures.

Chapter 20 fbt Provider