Tracing User Processes

This section focuses on the DTrace facilities that are useful for tracing user process activity and provides examples to illustrate their use.

Using the `copyin()` and `copyinstr()` Subroutines

DTrace probes execute in the Solaris kernel. Probes use the copyin() or copyinstr() subroutines to copy user process data into the kernel's address space.

Consider the following write() system call:

ssize_t write(int fd, const void *buf, size_t nbytes);

The following D program illustrates an incorrect attempt to print the contents of a string that is passed to the write system call:

syscall::write:entry
{
    printf("%s", stringof(arg1)); /* incorrect use of arg1 */
}

When you run this script, DTrace produces error messages similar to the following example.

dtrace: error on enabled probe ID 1 (ID 37: syscall::write:entry): \
    invalid address (0x10038a000) in action #1

The arg1 variable is an address that refers to memory in the process that is executing the system call. Use the copyinstr() subroutine to read the string at that address. Record the result with the printf() action:

syscall::write:entry
{
    printf("%s", copyinstr(arg1)); /* correct use of arg1 */

The output of this script shows all of the strings that are passed to the write system call.

Avoiding Errors

The copyin() and copyinstr() subroutines cannot read from user addresses which have not yet been touched. A valid address might cause an error if the page that contains that address has not been faulted in by an access attempt. Consider the following example:

# dtrace -n syscall::open:entry'{ trace(copyinstr(arg0)); }'
dtrace: description 'syscall::open:entry' matched 1 probe
CPU     ID                    FUNCTION:NAME
dtrace: error on enabled probe ID 2 (ID 50: syscall::open:entry): invalid address
(0x9af1b) in action #1 at DIF offset 52

In the output from the previous example, the application was functioning properly and the address in arg0 was valid. However, the address in arg0 referred to a page that the corresponding process had not accessed. To resolve this issue, wait for the kernel or application to use the data before tracing the data. For example, you might wait until the system call returns to apply copyinstr(), as shown in the following example:

# dtrace -n syscall::open:entry'{ self->file = arg0; }' \ -n syscall::open:return'{ trace(copyinstr(self->file)); self->file = 0; }'
dtrace: description 'syscall::open:entry' matched 1 probe
CPU     ID                    FUNCTION:NAME
  2     51                      open:return   /dev/null

Eliminating `dtrace` Interference

If you trace every call to the write system call, you will cause a cascade of output. Each call to the write() function causes the dtrace command to call the write() function as it displays the output. This feedback loop is a good example of how the dtrace command can interfere with the desired data. You can use a simple predicate to avoid this behavior, as shown in the following example:

syscall::write:entry
/pid != $pid/
{
    printf("%s", stringof(copyin(arg1, arg2)));
}

The $pid macro variable expands to the process identifier of the process that enabled the probes. The pid variable contains the process identifier of the process whose thread was running on the CPU where the probe was fired. The predicate /pid != $pid/ ensures that the script does not trace any events related to the running of this script.

`syscall` Provider

The syscall provider enables you to trace every system call entry and return. You can use the prstat(1M) command to see examine process behavior.

$ prstat -m -p 31337
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
 13499 user1     53  44 0.0 0.0 0.0 0.0 2.5 0.0  4K  24  9K   0 mystery/6

This example shows that the process is consuming a large amount of system time. One possible explanation for this behavior is that the process is executing a large number of system calls. You can use a simple D program specified on the command line to see which system calls are happening most often:

# dtrace -n syscall:::entry'/pid == 31337/{ @syscalls[probefunc] = count(); }'
dtrace: description 'syscall:::entry' matched 215 probes
^C

  open                                                              1
  lwp_park                                                          2
  times                                                             4
  fcntl                                                             5
  close                                                             6
  sigaction                                                         6
  read                                                             10
  ioctl                                                            14
  sigprocmask                                                     106
  write                                                          1092

This report shows a large number of system calls to the write() function. You can use the syscall provider to further examine the source of all the write() system calls:

# dtrace -n syscall::write:entry'/pid == 31337/{ @writes[arg2] = quantize(); }'
dtrace: description 'syscall::write:entry' matched 1 probe
^C

           value  ------------- Distribution ------------- count    
               0 |                                         0        
               1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   1037
               2 |@                                        3        
               4 |                                         0        
               8 |                                         0        
              16 |                                         0        
              32 |@                                        3        
              64 |                                         0        
             128 |                                         0        
             256 |                                         0        
             512 |                                         0        
            1024 |@                                        5        
            2048 |                                         0

The output shows that the process is executing many write() system calls with a relatively small amount of data.

The `ustack()` Action

The ustack() action traces the user thread's stack. If a process that opens many files occasionally fails in the open() system call, you can use the ustack() action to discover the code path that executes the failed open():

syscall::open:entry
/pid == $1/
{
    self->path = copyinstr(arg0);
}

syscall::open:return
/self->path != NULL && arg1 == -1/
{
    printf("open for '%s' failed", self->path);
    ustack();
}

This script also illustrates the use of the $1 macro variable. This macro variable takes the value of the first operand that is specified on the dtrace command line:

# dtrace -s ./badopen.d 31337
dtrace: script './badopen.d' matched 2 probes
CPU     ID                    FUNCTION:NAME
  0     40                      open:return open for '/usr/lib/foo' failed
              libc.so.1`__open+0x4
              libc.so.1`open+0x6c
              420b0
              tcsh`dosource+0xe0
              tcsh`execute+0x978
              tcsh`execute+0xba0
              tcsh`process+0x50c
              tcsh`main+0x1d54
              tcsh`_start+0xdc

The ustack() action records program counter (PC) values for the stack. The dtrace command resolves those PC values to symbol names by looking though the process's symbol tables. The dtrace command prints out PC values that cannot be resolved as hexadecimal integers.

When a process exits or is killed before the ustack() data is formatted for output, the dtrace command might be unable to convert the PC values in the stack trace to symbol names. In that event the dtrace command displays these values as hexadecimal integers. To work around this limitation, specify a process of interest with the -c or -p option to dtrace. If the process ID or command is not known in advance, the following example D program that can be used to work around the limitation. The example uses the open system call probe, but this technique can be used with any script that uses the ustack action.

 syscall::open:entry
{
    ustack();
    stop_pids[pid] = 1;
}

syscall::rexit:entry
/stop_pids[pid] != 0/
{
    printf("stopping pid %d", pid);
    stop();
    stop_pids[pid] = 0;
}

The previous script stops a process just before the process exits, if the ustack() action has been applied to a thread in that process. This technique ensures that the dtrace command can resolve the PC values to symbolic names. The value of stop_pids[pid] is set to 0 after clearing the dynamic variable.

The `pid` Provider

The pid provider enables you to trace any instruction in a process. Unlike most other providers, pid probes are created on demand, based on the probe descriptions found in your D programs.

User Function Boundary Tracing

The simplest mode of operation for the pid provider is as the user space analogue to the fbt provider. The following example program traces all function entries and returns that are made from a single function. The $1 macro variable expands to the first operand on the command line. This macro variable is the process ID for the process to trace. The $2 macro variable expands to the second operand on the command line. This macro variable is the name of the function that all function calls are traced from.

Example 4-3 userfunc.d: Trace User Function Entry and Return

pid$1::$2:entry
{
    self->trace = 1;
}

pid$1::$2:return
/self->trace/
{
    self->trace = 0;
}

pid$1:::entry,
pid$1:::return
/self->trace/
{
}

This script produces output that is similar to the following example:

# ./userfunc.d 15032 execute
dtrace: script './userfunc.d' matched 11594 probes
  0  -> execute                               
  0    -> execute                             
  0      -> Dfix                              
  0      <- Dfix                              
  0      -> s_strsave                         
  0        -> malloc                          
  0        <- malloc                          
  0      <- s_strsave                         
  0      -> set                               
  0        -> malloc                          
  0        <- malloc                          
  0      <- set                               
  0      -> set1                              
  0        -> tglob                           
  0        <- tglob                           
  0      <- set1                              
  0      -> setq                              
  0        -> s_strcmp                        
  0        <- s_strcmp                        
...

The pid provider can only be used on processes that are already running. You can use the $target macro variable and the dtrace options -c and -p to create and instrument processes of interest using the dtrace facility. The following D script determines the distribution of function calls that are made to libc by a particular subject process:

pid$target:libc.so::entry
{
    @[probefunc] = count();
}

To determine the distribution of such calls made by the date(1) command, execute the following command:

# dtrace -s libc.d -c date
dtrace: script 'libc.d' matched 2476 probes
Fri Jul 30 14:08:54 PDT 2004
dtrace: pid 109196 has exited

  pthread_rwlock_unlock                                             1
  _fflush_u                                                         1
  rwlock_lock                                                       1
  rw_write_held                                                     1
  strftime                                                          1
  _close                                                            1
  _read                                                             1
  __open                                                            1
  _open                                                             1
  strstr                                                            1
  load_zoneinfo                                                     1

...
  _ti_bind_guard                                                   47
  _ti_bind_clear                                                   94

Tracing Arbitrary Instructions

You can use the pid provider to trace any instruction in any user function. Upon demand, the pid provider creates a probe for every instruction in a function. The name of each probe is the offset of its corresponding instruction in the function expressed as a hexadecimal integer. To enable a probe that is associated with the instruction at offset 0x1c in function foo of module bar.so in the process with PID 123, use the following command.

# dtrace -n pid123:bar.so:foo:1c

To enable all of the probes in the function foo, including the probe for each instruction, you can use the command:

# dtrace -n pid123:bar.so:foo:

The following example demonstrates how to combine the pid provider with speculative tracing to trace every instruction in a function.

Example 4-4 errorpath.d: Trace User Function Call Error Path

pid$1::$2:entry
{
    self->spec = speculation();
    speculate(self->spec);
    printf("%x %x %x %x %x", arg0, arg1, arg2, arg3, arg4);
}

pid$1::$2:
/self->spec/
{
    speculate(self->spec);
}

pid$1::$2:return
/self->spec && arg1 == 0/
{
    discard(self->spec);
    self->spec = 0;
}

pid$1::$2:return
/self->spec && arg1 != 0/
{
    commit(self->spec);
    self->spec = 0;
}

When errorpath.d executes, the output of the script is similar to the following example.

# ./errorpath.d 100461 _chdir
dtrace: script './errorpath.d' matched 19 probes
CPU     ID                    FUNCTION:NAME
  0  25253                     _chdir:entry 81e08 6d140 ffbfcb20 656c73 0
  0  25253                     _chdir:entry
  0  25269                         _chdir:0
  0  25270                         _chdir:4
  0  25271                         _chdir:8
  0  25272                         _chdir:c
  0  25273                        _chdir:10
  0  25274                        _chdir:14
  0  25275                        _chdir:18
  0  25276                        _chdir:1c
  0  25277                        _chdir:20
  0  25278                        _chdir:24
  0  25279                        _chdir:28
  0  25280                        _chdir:2c
  0  25268                    _chdir:return