DTrace User Guide

Chapter 4 Using DTrace

This chapter examines the use of DTrace for common basic tasks, and has information on several different types of tracing.

Performance Monitoring

Several DTrace providers implement probes that correspond to existing performance monitoring tools:

You can use the DTrace facility to extract the same information that the bundled tools provide, but with greater flexibility. The DTrace facility provides arbitrary kernel information that is available at the time that the probes fire. The DTrace facility enables you to receive information such as process identification, thread identification, and stack traces.

Examining Performance Problems With The sysinfo Provider

The sysinfo provider makes available probes that correspond to the sys kernel statistics. These statistics provide the input for system monitoring utilities such as mpstat. The sysinfo provider probes fire immediately before the sys named kstat increments. The probes that are provided by the sysinfo provider are in the following list.

bawrite

Probe that fires whenever a buffer is about to be asynchronously written out to a device.

bread

Probe that fires whenever a buffer is physically read from a device. bread fires after the buffer has been requested from the device, but before blocking pending its completion.

bwrite

Probe that fires whenever a buffer is about to be written out to a device, whether synchronously or asynchronously.

cpu_ticks_idle

Probe that fires when the periodic system clock has made the determination that a CPU is idle. Note that this probe fires in the context of the system clock and therefore fires on the CPU running the system clock. The cpu_t argument (arg2) indicates the CPU that has been deemed idle.

cpu_ticks_kernel

Probe that fires when the periodic system clock has made the determination that a CPU is executing in the kernel. This probe fires in the context of the system clock and therefore fires on the CPU running the system clock. The cpu_t argument (arg2) indicates the CPU that has been deemed to be executing in the kernel.

cpu_ticks_user

Probe that fires when the periodic system clock has made the determination that a CPU is executing in user mode. This probe fires in the context of the system clock and therefore fires on the CPU running the system clock. The cpu_t argument (arg2) indicates the CPU that has been deemed to be running in user-mode.

cpu_ticks_wait

Probe that fires when the periodic system clock has made the determination that a CPU is otherwise idle, but some threads are waiting for I/O on the CPU. This probe fires in the context of the system clock and therefore fires on the CPU running the system clock. The cpu_t argument (arg2) indicates the CPU that has been deemed waiting on I/O.

idlethread

Probe that fires whenever a CPU enters the idle loop.

intrblk

Probe that fires whenever an interrupt thread blocks.

inv_swtch

Probe that fires whenever a running thread is forced to involuntarily give up the CPU.

lread

Probe that fires whenever a buffer is logically read from a device.

lwrite

Probe that fires whenever a buffer is logically written to a device

modload

Probe that fires whenever a kernel module is loaded.

modunload

Probe that fires whenever a kernel module is unloaded.

msg

Probe that fires whenever a msgsnd(2) or msgrcv(2) system call is made, but before the message queue operations have been performed.

mutex_adenters

Probe that fires whenever an attempt is made to acquire an owned adaptive lock. If this probe fires, one of the lockstat provider's adaptive-block or adaptive-spin probes also fires.

namei

Probe that fires whenever a name lookup is attempted in the filesystem.

nthreads

Probe that fires whenever a thread is created.

phread

Probe that fires whenever a raw I/O read is about to be performed.

phwrite

Probe that fires whenever a raw I/O write is about to be performed.

procovf

Probe that fires whenever a new process cannot be created because the system is out of process table entries.

pswitch

Probe that fires whenever a CPU switches from executing one thread to executing another.

readch

Probe that fires after each successful read, but before control is returned to the thread that is performing the read. A read can occur through the read(2), readv(2) or pread(2) system calls. arg0 contains the number of bytes that were successfully read.

rw_rdfails

Probe that fires whenever an attempt is made to read-lock a reader or writer when the lock is held by a writer or desired by a writer. If this probe fires, the lockstat provider's rw-block probe also fires.

rw_wrfails

Probe that fires whenever an attempt is made to write-lock a reader or writer lock when the lock is held by readers or by another writer. If this probe fires, the lockstat provider's rw-block probe also fires.

sema

Probe that fires whenever a semop(2) system call is made, but before any semaphore operations have been performed.

sysexec

Probe that fires whenever an exec(2) system call is made.

sysfork

Probe that fires whenever a fork(2) system call is made.

sysread

Probe that fires whenever a read, readv, or pread system call is made.

sysvfork

Probe that fires whenever a vfork(2) system call is made.

syswrite

Probe that fires whenever a write(2), writev(2), or pwrite(2) system call is made.

trap

Probe that fires whenever a processor trap occurs. Note that some processors, in particular UltraSPARC variants, handle some lightweight traps through a mechanism that does not cause this probe to fire.

ufsdirblk

Probe that fires whenever a directory block is read from the UFS file system. See ufs(7FS) for details on UFS.

ufsiget

Probe that fires whenever an inode is retrieved. See ufs(7FS) for details on UFS.

ufsinopage

Probe that fires after an in-core inode without any associated data pages has been made available for reuse. See ufs(7FS) for details on UFS.

ufsipage

Probe that fires after an in-core inode with associated data pages has been made available for reuse. This probe fires after the associated data pages have been flushed to disk. See ufs(7FS) for details on UFS.

wait_ticks_io

Probe that fires when the periodic system clock has made the determination that a CPU is otherwise idle but some threads are waiting for I/O on the CPU. This probe fires in the context of the system clock and therefore fires on the CPU running the system clock. The cpu_t argument (arg2) indicates the CPU that is described as waiting for I/O. No semantic difference between wait_ticks_io and cpu_ticks_wait; wait_ticks_io exists solely for historical reasons.

writech

Probe that fires after each successful write, but before control is returned to the thread performing the write. A write can occur through the write, writev, or pwrite system calls. arg0 contains the number of bytes that were successfully written.

xcalls

Probe that fires whenever a cross-call is about to be made. A cross-call is the operating system's mechanism for one CPU to request immediate work of another CPU.


Example 4–1 Using the quantize Aggregation Function With the sysinfo Probes

The quantize aggregation function displays a power-of-two frequency distribution bar graph of its argument. The following example uses the quantize function to determine the size of the read calls that are performed by all processes on the system over a period of ten seconds. The arg0 argument for the sysinfo probes states the amount by which to increment the statistic. This value is 1 for most sysinfo probes. Two exceptions are the readch and writech probes. For these probes, the arg0 argument is set to the actual number of bytes that are read or are written, respectively.


# cat -n read.d
   1  #!/usr/sbin/dtrace -s
   2  sysinfo:::readch
   3  {
   4     @[execname] = quantize(arg0);
   5  }
   6
   7  tick-10sec
   8  {
   9     exit(0);
  10  }

# dtrace -s read.d
dtrace: script 'read.d' matched 5 probes
CPU        ID                    FUNCTION:NAME
  0     36754                      :tick-10sec

  bash
         value  ---------- Distribution ---------- count
             0 |                                   0
             1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 13
             2 |                                   0

  file
         value  ---------- Distribution ---------- count
            -1 |                                   0
             0 |                                   2
             1 |                                   0
             2 |                                   0
             4 |                                   6
             8 |                                   0
            16 |                                   0
            32 |                                   6
            64 |                                   6
           128 |@@                                 16
           256 |@@@@                               30
           512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      199
          1024 |                                   0
          2048 |                                   0
          4096 |                                   1
          8192 |                                   1
         16384 |                                   0

  grep
         value  ---------- Distribution ---------- count
            -1 |                                   0
             0 |@@@@@@@@@@@@@@@@@@@                99
             1 |                                   0
             2 |                                   0
             4 |                                   0
             8 |                                   0
            16 |                                   0
            32 |                                   0
            64 |                                   0
           128 |                                   1
           256 |@@@@                               25
           512 |@@@@                               23
          1024 |@@@@                               24
          2048 |@@@@                               22
          4096 |                                   4
          8192 |                                   3
         16384 |                                   0


Example 4–2 Finding the Source of Cross-Calls

In this example, consider the following output form the mpstat(1M) command:


CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0 2189   0 1302    14    1  215   12   54   28    0 12995   13  14   0  73
  1 3385   0 1137   218  104  195   13   58   33    0 14486   19  15   0  66
  2 1918   0 1039    12    1  226   15   49   22    0 13251   13  12   0  75
  3 2430   0 1284   220  113  201   10   50   26    0 13926   10  15   0  75

The values in the xcal and syscl columns are atypically high, reflecting a possible drain on system performance. The system is relatively idle and is not spending an unusual amount of time waiting for I/O. The numbers in the xcal column are scaled per second and are read from the xcalls field of the sys kstat. To see which executables are responsible for the cross-calls, enter the following dtrace command:


# dtrace -n 'xcalls {@[execname] = count()}'
dtrace: description 'xcalls ' matched 3 probes
^C
  find                                                   2
  cut                                                    2
  snmpd                                                  2
  mpstat                                                22
  sendmail                                             101
  grep                                                 123
  bash                                                 175
  dtrace                                               435
  sched                                                784
  xargs                                              22308
  file                                               89889
#

This output indicates that the bulk of the cross calls are originating from file(1) and xargs(1) processes. You can find these processes with the pgrep(1) and ptree(1) commands.


# pgrep xargs
15973
# ptree 15973
204   /usr/sbin/inetd -s
  5650  in.telnetd
    5653  -sh
      5657  bash
        15970 /bin/sh ./findtxt configuration
          15971 cut -f1 -d:
            15973 xargs file
              16686 file /usr/bin/tbl /usr/bin/troff /usr/bin/ul /usr/bin/vgrind /usr/bin/catman

This output indicates that the xargs and file commands form part of a custom user shell script. To locate this script, you can perform the following commands:


# find / -name findtxt
/usrs1/james/findtxt
# cat /usrs1/james/findtxt
#!/bin/sh
find / -type f | xargs file | grep text | cut -f1 -d: > /tmp/findtxt$$
cat /tmp/findtxt$$ | xargs grep $1
rm /tmp/findtxt$$
#

This script runs many process concurrently. A large amount of interprocess communication is happening through pipes. The number of pipes makes the script resource intensive. The script attempts to find every text file on the system and then searches each file for a specific text.


Tracing User Processes

This section focuses on the DTrace facilities that are useful for tracing user process activity and provides examples to illustrate their use.

Using the copyin() and copyinstr() Subroutines

DTrace probes execute in the Solaris kernel. Probes use the copyin() or copyinstr() subroutines to copy user process data into the kernel's address space.

Consider the following write() system call:

ssize_t write(int fd, const void *buf, size_t nbytes);

The following D program illustrates an incorrect attempt to print the contents of a string that is passed to the write system call:

syscall::write:entry
{
	printf("%s", stringof(arg1)); /* incorrect use of arg1 */
}

When you run this script, DTrace produces error messages similar to the following example.


dtrace: error on enabled probe ID 1 (ID 37: syscall::write:entry): \
    invalid address (0x10038a000) in action #1

The arg1 variable is an address that refers to memory in the process that is executing the system call. Use the copyinstr() subroutine to read the string at that address. Record the result with the printf() action:

syscall::write:entry
{
	printf("%s", copyinstr(arg1)); /* correct use of arg1 */

The output of this script shows all of the strings that are passed to the write system call.

Avoiding Errors

The copyin() and copyinstr() subroutines cannot read from user addresses which have not yet been touched. A valid address might cause an error if the page that contains that address has not been faulted in by an access attempt. Consider the following example:


# dtrace -n syscall::open:entry'{ trace(copyinstr(arg0)); }'
dtrace: description 'syscall::open:entry' matched 1 probe
CPU     ID                    FUNCTION:NAME
dtrace: error on enabled probe ID 2 (ID 50: syscall::open:entry): invalid address
(0x9af1b) in action #1 at DIF offset 52

In the output from the previous example, the application was functioning properly and the address in arg0 was valid. However, the address in arg0 referred to a page that the corresponding process had not accessed. To resolve this issue, wait for the kernel or application to use the data before tracing the data. For example, you might wait until the system call returns to apply copyinstr(), as shown in the following example:


# dtrace -n syscall::open:entry'{ self->file = arg0; }' \
-n syscall::open:return'{ trace(copyinstr(self->file)); self->file = 0; }'
dtrace: description 'syscall::open:entry' matched 1 probe
CPU     ID                    FUNCTION:NAME
  2     51                      open:return   /dev/null                        

Eliminating dtrace Interference

If you trace every call to the write system call, you will cause a cascade of output. Each call to the write() function causes the dtrace command to call the write() function as it displays the output. This feedback loop is a good example of how the dtrace command can interfere with the desired data. You can use a simple predicate to avoid this behavior, as shown in the following example:

syscall::write:entry
/pid != $pid/
{
	printf("%s", stringof(copyin(arg1, arg2)));
}

The $pid macro variable expands to the process identifier of the process that enabled the probes. The pid variable contains the process identifier of the process whose thread was running on the CPU where the probe was fired. The predicate /pid != $pid/ ensures that the script does not trace any events related to the running of this script.

syscall Provider

The syscall provider enables you to trace every system call entry and return. You can use the prstat(1M) command to see examine process behavior.


$ prstat -m -p 31337
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
 13499 user1     53  44 0.0 0.0 0.0 0.0 2.5 0.0  4K  24  9K   0 mystery/6

This example shows that the process is consuming a large amount of system time. One possible explanation for this behavior is that the process is executing a large number of system calls. You can use a simple D program specified on the command line to see which system calls are happening most often:


# dtrace -n syscall:::entry'/pid == 31337/{ @syscalls[probefunc] = count(); }'
dtrace: description 'syscall:::entry' matched 215 probes
^C

  open                                                              1
  lwp_park                                                          2
  times                                                             4
  fcntl                                                             5
  close                                                             6
  sigaction                                                         6
  read                                                             10
  ioctl                                                            14
  sigprocmask                                                     106
  write                                                          1092

This report shows a large number of system calls to the write() function. You can use the syscall provider to further examine the source of all the write() system calls:


# dtrace -n syscall::write:entry'/pid == 31337/{ @writes[arg2] = quantize(); }'
dtrace: description 'syscall::write:entry' matched 1 probe
^C

           value  ------------- Distribution ------------- count    
               0 |                                         0        
               1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   1037
               2 |@                                        3        
               4 |                                         0        
               8 |                                         0        
              16 |                                         0        
              32 |@                                        3        
              64 |                                         0        
             128 |                                         0        
             256 |                                         0        
             512 |                                         0        
            1024 |@                                        5        
            2048 |                                         0        

The output shows that the process is executing many write() system calls with a relatively small amount of data.

The ustack() Action

The ustack() action traces the user thread's stack. If a process that opens many files occasionally fails in the open() system call, you can use the ustack() action to discover the code path that executes the failed open():

syscall::open:entry
/pid == $1/
{
	self->path = copyinstr(arg0);
}

syscall::open:return
/self->path != NULL && arg1 == -1/
{
	printf("open for '%s' failed", self->path);
	ustack();
}

This script also illustrates the use of the $1 macro variable. This macro variable takes the value of the first operand that is specified on the dtrace command line:


# dtrace -s ./badopen.d 31337
dtrace: script './badopen.d' matched 2 probes
CPU     ID                    FUNCTION:NAME
  0     40                      open:return open for '/usr/lib/foo' failed
              libc.so.1`__open+0x4
              libc.so.1`open+0x6c
              420b0
              tcsh`dosource+0xe0
              tcsh`execute+0x978
              tcsh`execute+0xba0
              tcsh`process+0x50c
              tcsh`main+0x1d54
              tcsh`_start+0xdc

The ustack() action records program counter (PC) values for the stack. The dtrace command resolves those PC values to symbol names by looking though the process's symbol tables. The dtrace command prints out PC values that cannot be resolved as hexadecimal integers.

When a process exits or is killed before the ustack() data is formatted for output, the dtrace command might be unable to convert the PC values in the stack trace to symbol names. In that event the dtrace command displays these values as hexadecimal integers. To work around this limitation, specify a process of interest with the -c or -p option to dtrace. If the process ID or command is not known in advance, the following example D program that can be used to work around the limitation. The example uses the open system call probe, but this technique can be used with any script that uses the ustack action.

 syscall::open:entry
{
	ustack();
	stop_pids[pid] = 1;
}

syscall::rexit:entry
/stop_pids[pid] != 0/
{
	printf("stopping pid %d", pid);
	stop();
	stop_pids[pid] = 0;
}

The previous script stops a process just before the process exits, if the ustack() action has been applied to a thread in that process. This technique ensures that the dtrace command can resolve the PC values to symbolic names. The value of stop_pids[pid] is set to 0 after clearing the dynamic variable.

The pid Provider

The pid provider enables you to trace any instruction in a process. Unlike most other providers, pid probes are created on demand, based on the probe descriptions found in your D programs.

User Function Boundary Tracing

The simplest mode of operation for the pid provider is as the user space analogue to the fbt provider. The following example program traces all function entries and returns that are made from a single function. The $1 macro variable expands to the first operand on the command line. This macro variable is the process ID for the process to trace. The $2 macro variable expands to the second operand on the command line. This macro variable is the name of the function that all function calls are traced from.


Example 4–3 userfunc.d: Trace User Function Entry and Return

pid$1::$2:entry
{
	self->trace = 1;
}

pid$1::$2:return
/self->trace/
{
	self->trace = 0;
}

pid$1:::entry,
pid$1:::return
/self->trace/
{
}

This script produces output that is similar to the following example:


# ./userfunc.d 15032 execute
dtrace: script './userfunc.d' matched 11594 probes
  0  -> execute                               
  0    -> execute                             
  0      -> Dfix                              
  0      <- Dfix                              
  0      -> s_strsave                         
  0        -> malloc                          
  0        <- malloc                          
  0      <- s_strsave                         
  0      -> set                               
  0        -> malloc                          
  0        <- malloc                          
  0      <- set                               
  0      -> set1                              
  0        -> tglob                           
  0        <- tglob                           
  0      <- set1                              
  0      -> setq                              
  0        -> s_strcmp                        
  0        <- s_strcmp                        
...

The pid provider can only be used on processes that are already running. You can use the $target macro variable and the dtrace options -c and -p to create and instrument processes of interest using the dtrace facility. The following D script determines the distribution of function calls that are made to libc by a particular subject process:

pid$target:libc.so::entry
{
	@[probefunc] = count();
}

To determine the distribution of such calls made by the date(1) command, execute the following command:


# dtrace -s libc.d -c date
dtrace: script 'libc.d' matched 2476 probes
Fri Jul 30 14:08:54 PDT 2004
dtrace: pid 109196 has exited

  pthread_rwlock_unlock                                             1
  _fflush_u                                                         1
  rwlock_lock                                                       1
  rw_write_held                                                     1
  strftime                                                          1
  _close                                                            1
  _read                                                             1
  __open                                                            1
  _open                                                             1
  strstr                                                            1
  load_zoneinfo                                                     1

...
  _ti_bind_guard                                                   47
  _ti_bind_clear                                                   94

Tracing Arbitrary Instructions

You can use the pid provider to trace any instruction in any user function. Upon demand, the pid provider creates a probe for every instruction in a function. The name of each probe is the offset of its corresponding instruction in the function expressed as a hexadecimal integer. To enable a probe that is associated with the instruction at offset 0x1c in function foo of module bar.so in the process with PID 123, use the following command.


# dtrace -n pid123:bar.so:foo:1c

To enable all of the probes in the function foo, including the probe for each instruction, you can use the command:


# dtrace -n pid123:bar.so:foo:

The following example demonstrates how to combine the pid provider with speculative tracing to trace every instruction in a function.


Example 4–4 errorpath.d: Trace User Function Call Error Path

pid$1::$2:entry
{
	self->spec = speculation();
	speculate(self->spec);
	printf("%x %x %x %x %x", arg0, arg1, arg2, arg3, arg4);
}

pid$1::$2:
/self->spec/
{
	speculate(self->spec);
}

pid$1::$2:return
/self->spec && arg1 == 0/
{
	discard(self->spec);
	self->spec = 0;
}

pid$1::$2:return
/self->spec && arg1 != 0/
{
	commit(self->spec);
	self->spec = 0;
}

When errorpath.d executes, the output of the script is similar to the following example.


# ./errorpath.d 100461 _chdir
dtrace: script './errorpath.d' matched 19 probes
CPU     ID                    FUNCTION:NAME
  0  25253                     _chdir:entry 81e08 6d140 ffbfcb20 656c73 0
  0  25253                     _chdir:entry
  0  25269                         _chdir:0
  0  25270                         _chdir:4
  0  25271                         _chdir:8
  0  25272                         _chdir:c
  0  25273                        _chdir:10
  0  25274                        _chdir:14
  0  25275                        _chdir:18
  0  25276                        _chdir:1c
  0  25277                        _chdir:20
  0  25278                        _chdir:24
  0  25279                        _chdir:28
  0  25280                        _chdir:2c
  0  25268                    _chdir:return

Anonymous Tracing

This section describes tracing that is not associated with any DTrace consumer. Anonymous tracing is used in situations when no DTrace consumer processes can run. Only the super user may create an anonymous enabling. Only one anonymous enabling can exist at any time.

Anonymous Enablings

To create an anonymous enabling, use the -A option with a dtrace command invocation that specifies the desired probes, predicates, actions and options. The dtrace command adds a series of driver properties that represent your request to the configuration file for the dtrace(7D) driver. The configuration file is typically /kernel/drv/dtrace.conf. The dtrace driver reads these properties when the driver is loaded. The driver enables the specified probes with the specified actions and creates an anonymous state to associate with the new enabling. The dtrace driver is normally loaded on demand, along with any drivers that act as dtrace providers. To allow tracing during boot, the dtrace driver must be loaded as early as possible. The dtrace command adds the necessary forceload statements to /etc/system (see system(4) for each required dtrace provider and for the dtrace driver.

When the system boots, the dtrace driver sends a message indicating that the configuration file has been successfully processed. An anonymous enabling can set any of the options that are available during normal use of the dtrace command.

To remove an anonymous enabling, specify the -A option to the dtrace command without any probe descriptions.

Claiming Anonymous State

When the machine has completely booted, you can claim an existing anonymous state by specifying the -a option with the dtrace command. By default, the -a option claims the anonymous state and processes the existing data, then continues to run. To consume the anonymous state and exit, add the -e option.

When the anonymous state has been consumed from the kernel, the anonymous state cannot be replaced. If you attempt to claim an anonymous tracing state that does not exist, the dtrace command generates a message that is similar to the following example:


dtrace: could not enable tracing: No anonymous tracing state

If drops or errors occur, the dtrace command generates the appropriate messages when the anonymous state is claimed. The messages for drops and errors are the same for both anonymous and non-anonymous state.

Anonymous Tracing Examples

The following example shows an anonymous DTrace enabling for every probe in the iprb(7D) module:


# dtrace -A -m iprb
dtrace: saved anonymous enabling in /kernel/drv/dtrace.conf
dtrace: added forceload directives to /etc/system
dtrace: run update_drv(1M) or reboot to enable changes
# reboot

After rebooting, the dtrace driver prints a message on the console to indicate that the driver is enabling the specified probes:


  ...
  Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
  Use is subject to license terms.
  NOTICE: enabling probe 0 (:iprb::)
  NOTICE: enabling probe 1 (dtrace:::ERROR)
  configuring IPv4 interfaces: iprb0.
  ...

After rebooting the machine, specifying the -a option with the dtrace command consumes the anonymous state:


# dtrace -a
  CPU     ID                    FUNCTION:NAME
    0  22954                      _init:entry 
    0  22955                     _init:return 
    0  22800                  iprbprobe:entry 
    0  22934          iprb_get_dev_type:entry 
    0  22935         iprb_get_dev_type:return 
    0  22801                 iprbprobe:return 
    0  22802                 iprbattach:entry 
    0  22874               iprb_getprop:entry 
    0  22875              iprb_getprop:return 
    0  22934          iprb_get_dev_type:entry 
    0  22935         iprb_get_dev_type:return 
    0  22870             iprb_self_test:entry 
    0  22871            iprb_self_test:return 
    0  22958            iprb_hard_reset:entry 
    0  22959           iprb_hard_reset:return 
    0  22862       iprb_get_eeprom_size:entry 
    0  22826              iprb_shiftout:entry 
    0  22828            iprb_raiseclock:entry 
    0  22829           iprb_raiseclock:return 
  ...

The following example focuses only on functions that are called from iprbattach().

fbt::iprbattach:entry
{
	self->trace = 1;
}

fbt:::
/self->trace/
{}

fbt::iprbattach:return
{
	self->trace = 0;
}

Run the following commands to clear the previous settings from the driver configuration file, install the new anonymous tracing request, and reboot:


# dtrace -AFs iprb.d
dtrace: cleaned up old anonymous enabling in /kernel/drv/dtrace.conf
dtrace: cleaned up forceload directives in /etc/system
dtrace: saved anonymous enabling in /kernel/drv/dtrace.conf
dtrace: added forceload directives to /etc/system
dtrace: run update_drv(1M) or reboot to enable changes
# reboot

After rebooting, the dtrace driver prints a different message on the console to indicate the slightly different enabling:


  ...
  Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
  Use is subject to license terms.
  NOTICE: enabling probe 0 (fbt::iprbattach:entry)
  NOTICE: enabling probe 1 (fbt:::)
  NOTICE: enabling probe 2 (fbt::iprbattach:return)
  NOTICE: enabling probe 3 (dtrace:::ERROR)
  configuring IPv4 interfaces: iprb0.
  ...

After the machine has finished booting, run the dtrace command with the -a and the -e options to consume the anonymous data and then exit.


# dtrace -ae
  CPU FUNCTION                                 
    0  -> iprbattach                            
    0    -> gld_mac_alloc                       
    0      -> kmem_zalloc                       
    0        -> kmem_cache_alloc                
    0          -> kmem_cache_alloc_debug        
    0            -> verify_and_copy_pattern     
    0            <- verify_and_copy_pattern     
    0            -> tsc_gethrtime               
    0            <- tsc_gethrtime               
    0            -> getpcstack                  
    0            <- getpcstack                  
    0            -> kmem_log_enter              
    0            <- kmem_log_enter              
    0          <- kmem_cache_alloc_debug        
    0        <- kmem_cache_alloc                
    0      <- kmem_zalloc                       
    0    <- gld_mac_alloc                       
    0    -> kmem_zalloc                         
    0      -> kmem_alloc                        
    0        -> vmem_alloc                      
    0          -> highbit                       
    0          <- highbit                       
    0          -> lowbit                        
    0          <- lowbit                        
    0          -> vmem_xalloc                   
    0            -> highbit                     
    0            <- highbit                     
    0            -> lowbit                      
    0            <- lowbit                      
    0            -> segkmem_alloc               
    0              -> segkmem_xalloc            
    0                -> vmem_alloc              
    0                  -> highbit               
    0                  <- highbit               
    0                  -> lowbit                
    0                  <- lowbit                
    0                  -> vmem_seg_alloc        
    0                    -> highbit             
    0                    <- highbit             
    0                    -> highbit             
    0                    <- highbit             
    0                    -> vmem_seg_create     
  ...

Speculative Tracing

This section discusses the DTrace facility for speculative tracing. Speculative tracing is the ability to tentatively trace data and decide whether to commit the data to a tracing buffer or discard it. The primary mechanism to filter out uninteresting events is the predicate mechanism. Predicates are useful when you know at the time that a probe fires whether or not the probe event is of interest. Predicates are not well suited to dealing with situations where you do not know if a given probe event is of interest or not until after the probe fires.

If a system call is occasionally failing with a common error code, you might want to examine the code path that leads to the error condition. You can use the speculative tracing facility to tentatively trace data at one or more probe locations, then decide to commit the data to the principal buffer at another probe location. The resulting trace data contains only the output of interest and requires no postprocessing.

Speculation Interfaces

The following table describes the DTrace speculation functions.

Table 4–1 DTrace Speculation Functions

Function Name 

Arguments 

Description 

speculation

None 

Returns an identifier for a new speculative buffer 

speculate

ID 

Denotes that the remainder of the clause should be traced to the speculative buffer specified by ID 

commit

ID 

Commits the speculative buffer that is associated with ID 

discard

ID 

Discards the speculative buffer associated with ID 

Creating a Speculation

The speculation() function allocates a speculative buffer and returns a speculation identifier. Use the speculation identifier in subsequent calls to the speculate() function. A speculation identifier of zero is always invalid, but can be passed to speculate(), commit() or discard(). If a call to speculation() fails, the dtrace command generates a message that is similar to the following example.


dtrace: 2 failed speculations (no speculative buffer space available)

Using a Speculation

To use a speculation, use a clause to pass an identifier that has been returned from speculation() to the speculate() function before any data-recording actions. All data-recording actions in a clause that contains a speculate() are speculatively traced. The D compiler generates a compile-time error if a call to speculate() follows data recording actions in a D probe clause. Clauses can contain either speculative tracing requests or non-speculative tracing requests, but not both.

Aggregating actions, destructive actions, and the exit action may never be speculative. Any attempt to take one of these actions in a clause that contains a speculate() results in a compile-time error. A speculate() function may not follow a previous speculate() function. Only one speculation is permitted per clause. A clause that contains only a speculate() function will speculatively trace the default action, which is defined to trace only the enabled probe ID.

The typical use of the speculation() function is to assign the result of the speculation() function to a thread-local variable. That thread-local variable acts as a subsequent predicate to other probes, as well as an argument to speculate().


Example 4–5 Typical Use of The speculation() Function

syscall::open:entry
{
	self->spec = speculation();
}

syscall:::
/self->spec/
{
	speculate(self->spec);
	printf("this is speculative");
}

Committing a Speculation

Commit speculations by using the commit() function. When you commit a speculative buffer the buffer's data is copied into the principal buffer. If the data in the speculative buffer exceeds the available space in the principal buffer, no data is copied and the drop count for the buffer increments. If the buffer has been speculatively traced on more than one CPU, the speculative data on the committing CPU is copied immediately, while speculative data on other CPUs is copied after the commit().

A speculative buffer that is being committed is not available to subsequent speculation() calls until each per-CPU speculative buffer is completely copied into its corresponding per-CPU principal buffer. Subsequent attempts to write the results of a speculate() function call to the committing buffer discard the data without generating an error. Subsequent calls to commit() or discard() also fail without generating an error. A clause that contains a commit() function cannot contain a data recording action, but a clause can contain multiple commit() calls to commit disjoint buffers.

Discarding a Speculation

Discard speculations by using the discard() function. If the speculation has only been active on the CPU that is calling the discard() function, the buffer is immediately available for subsequent calls to the speculation() function. If the speculation has been active on more than one CPU, the discarded buffer will be available for subsequent calls to the speculation() function after the call to discard(). If no speculative buffers are available at the time that the speculation() function is called adtrace message that is similar to the following example is generated:


dtrace: 905 failed speculations (available buffer(s) still busy)

Speculation Example

One potential use for speculations is to highlight a particular code path. The following example shows the entire code path under the open(2) system call when the open() fails.


Example 4–6 specopen.d: Code Flow for Failed open()

#!/usr/sbin/dtrace -Fs

syscall::open:entry,
syscall::open64:entry
{
	/*
	 * The call to speculation() creates a new speculation.  If this fails,
	 * dtrace(1M) will generate an error message indicating the reason for
	 * the failed speculation(), but subsequent speculative tracing will be
	 * silently discarded.
	 */
	self->spec = speculation();
	speculate(self->spec);

	/*
	 * Because this printf() follows the speculate(), it is being 
	 * speculatively traced; it will only appear in the data buffer if the
	 * speculation is subsequently commited.
	 */
	printf("%s", stringof(copyinstr(arg0)));
}

fbt:::
/self->spec/
{
	/*
	 * A speculate() with no other actions speculates the default action:
	 * tracing the EPID.
	 */
	speculate(self->spec);
}

syscall::open:return,
syscall::open64:return
/self->spec/
{
	/*
	 * To balance the output with the -F option, we want to be sure that
	 * every entry has a matching return.  Because we speculated the
	 * open entry above, we want to also speculate the open return.
	 * This is also a convenient time to trace the errno value.
	 */
	speculate(self->spec);
	trace(errno);
}

syscall::open:return,
syscall::open64:return
/self->spec && errno != 0/
{
	/*
	 * If errno is non-zero, we want to commit the speculation.
	 */
	commit(self->spec);
	self->spec = 0;
}

syscall::open:return,
syscall::open64:return
/self->spec && errno == 0/
{
	/*
	 * If errno is not set, we discard the speculation.
	 */
	discard(self->spec);
	self->spec = 0;
}

When you run the previous script, the script generates output that is similar to the following example.


# ./specopen.d
dtrace: script './specopen.d' matched 24282 probes
CPU FUNCTION                                 
  1  => open                                  /var/ld/ld.config
  1    -> open                                
  1      -> copen                             
  1        -> falloc                          
  1          -> ufalloc                       
  1            -> fd_find                     
  1              -> mutex_owned               
  1              <- mutex_owned               
  1            <- fd_find                     
  1            -> fd_reserve                  
  1              -> mutex_owned               
  1              <- mutex_owned               
  1              -> mutex_owned               
  1              <- mutex_owned               
  1            <- fd_reserve                  
  1          <- ufalloc                       
  1          -> kmem_cache_alloc              
  1            -> kmem_cache_alloc_debug      
  1              -> verify_and_copy_pattern   
  1              <- verify_and_copy_pattern   
  1              -> file_cache_constructor    
  1                -> mutex_init              
  1                <- mutex_init              
  1              <- file_cache_constructor    
  1              -> tsc_gethrtime             
  1              <- tsc_gethrtime             
  1              -> getpcstack                
  1              <- getpcstack                
  1              -> kmem_log_enter            
  1              <- kmem_log_enter            
  1            <- kmem_cache_alloc_debug      
  1          <- kmem_cache_alloc              
  1          -> crhold                        
  1          <- crhold                        
  1        <- falloc                          
  1        -> vn_openat                       
  1          -> lookupnameat                  
  1            -> copyinstr                   
  1            <- copyinstr                   
  1            -> lookuppnat                  
  1              -> lookuppnvp                
  1                -> pn_fixslash             
  1                <- pn_fixslash             
  1                -> pn_getcomponent         
  1                <- pn_getcomponent         
  1                -> ufs_lookup              
  1                  -> dnlc_lookup           
  1                    -> bcmp                
  1                    <- bcmp                
  1                  <- dnlc_lookup           
  1                  -> ufs_iaccess           
  1                    -> crgetuid            
  1                    <- crgetuid            
  1                    -> groupmember         
  1                      -> supgroupmember    
  1                      <- supgroupmember
  1                    <- groupmember         
  1                  <- ufs_iaccess           
  1                <- ufs_lookup              
  1                -> vn_rele                 
  1                <- vn_rele                 
  1                -> pn_getcomponent         
  1                <- pn_getcomponent         
  1                -> ufs_lookup              
  1                  -> dnlc_lookup           
  1                    -> bcmp                
  1                    <- bcmp                
  1                  <- dnlc_lookup           
  1                  -> ufs_iaccess           
  1                    -> crgetuid            
  1                    <- crgetuid            
  1                  <- ufs_iaccess           
  1                <- ufs_lookup              
  1                -> vn_rele                 
  1                <- vn_rele                 
  1                -> pn_getcomponent         
  1                <- pn_getcomponent         
  1                -> ufs_lookup              
  1                  -> dnlc_lookup           
  1                    -> bcmp                
  1                    <- bcmp                
  1                  <- dnlc_lookup           
  1                  -> ufs_iaccess           
  1                    -> crgetuid            
  1                    <- crgetuid            
  1                  <- ufs_iaccess           
  1                  -> vn_rele               
  1                  <- vn_rele               
  1                <- ufs_lookup              
  1                -> vn_rele                 
  1                <- vn_rele                 
  1              <- lookuppnvp                
  1            <- lookuppnat                  
  1          <- lookupnameat                  
  1        <- vn_openat                       
  1        -> setf                            
  1          -> fd_reserve                    
  1            -> mutex_owned                 
  1            <- mutex_owned                 
  1            -> mutex_owned                 
  1            <- mutex_owned                 
  1          <- fd_reserve                    
  1          -> cv_broadcast                  
  1          <- cv_broadcast                  
  1        <- setf                            
  1        -> unfalloc                        
  1          -> mutex_owned                   
  1          <- mutex_owned                   
  1          -> crfree                        
  1          <- crfree                        
  1          -> kmem_cache_free               
  1            -> kmem_cache_free_debug       
  1              -> kmem_log_enter            
  1              <- kmem_log_enter            
  1              -> tsc_gethrtime             
  1              <- tsc_gethrtime             
  1              -> getpcstack                
  1              <- getpcstack                
  1              -> kmem_log_enter            
  1              <- kmem_log_enter
  1              -> file_cache_destructor     
  1                -> mutex_destroy           
  1                <- mutex_destroy           
  1              <- file_cache_destructor     
  1              -> copy_pattern              
  1              <- copy_pattern              
  1            <- kmem_cache_free_debug       
  1          <- kmem_cache_free               
  1        <- unfalloc                        
  1        -> set_errno                       
  1        <- set_errno                       
  1      <- copen                             
  1    <- open                                
  1  <= open                                          2