About Aggregations

DTrace stores the results of aggregating functions in objects called aggregations. The aggregation results are indexed using a tuple of expressions similar to those used for associative arrays. In D, the syntax for an aggregation is:

@name[ keys ] = aggfunc ( args );

where name is the name of the aggregation, keys is a comma-separated list of D expressions, aggfunc is one of the DTrace aggregating functions, and args is a comma-separated list of arguments appropriate for the aggregating function. The aggregation name is a D identifier that is prefixed with the special character @. All aggregations named in your D programs are global variables; there are no thread- or clause-local aggregations. The aggregation names are kept in a separate identifier namespace from other D global variables. Remember that a and @a are not the same variable if you reuse names. The special aggregation name @ can be used to name an anonymous aggregation in simple D programs. The D compiler treats this name as an alias for the aggregation name @_.

The DTrace aggregating functions are shown in the following table. Most aggregating functions take just a single argument that represents the new data.

Table 3-1 DTrace Aggregating Functions

Function	Arguments	Result
`count`	none	The number of times called.
`sum`	scalar expression	The total value of the specified expressions.
`avg`	scalar expression	The arithmetic average of the specified expressions.
`min`	scalar expression	The smallest value among the specified expressions.
`max`	scalar expression	The largest value among the specified expressions.
`stddev`	scalar expression	The standard deviation of the specified expressions.
`lquantize`	scalar expression, lower bound, upper bound, step value	A linear frequency distribution, sized by the specified range, of the values of the specified expressions. Increments the value in the highest bucket that is less than the specified expression.
`quantize`	scalar expression	A power-of-two frequency distribution of the values of the specified expressions. Increments the value in the highest power-of-two bucket that is less than the specified expression.
`llquantize`	scalar expression, a factor (base of the logarithm), lower bound, upper bound, step value	A frequency distribution table of the values specified in the expression. Increments the value in the highest bucket that is less than the specified expression. The values are grouped logarithmically at first and then grouped linearly.

For example, to count the number of write system calls in the system, you can use an informative string as a key and the count aggregating function.

syscall::write:entry
{
        @counts["write system calls"] = count();
}

The dtrace command prints aggregation results by default when the process terminates, either as the result of an explicit END action or when the user presses Control-C. The following example output shows the result of running this command, waiting for a few seconds, and pressing Control-C.

# dtrace -s writes.d
dtrace: script './writes.d' matched 1 probe
^C
  write system calls 179
#

You can count system calls per process name by using the execname variable as the key to an aggregation:

syscall::write:entry
{
        @counts[execname] = count();
}

The following example output shows the result of running this command, waiting for a few seconds, and pressing Control-C:

# dtrace -s writesbycmd.d
dtrace: script './writesbycmd.d' matched 1 probe
^C
  dtrace                                                            1
  cat                                                               4
  sed                                                               9
  head                                                              9
  grep                                                             14
  find                                                             15
  tail                                                             25
  mountd                                                           28
  expr                                                             72
  sh                                                              291
  tee                                                             814
  def.dir.flp                                                    1996
  make.bin                                                       2010
#

Alternatively, you might want to further examine writes organized by both executable name and file descriptor. The file descriptor is the first argument to write, so the following example uses a key consisting of both execname and arg0:

syscall::write:entry
{
        @counts[execname, arg0] = count();
}

Running this command results in a table with both executable name and file descriptor, as shown in the following example:

# dtrace -s writesbycmdfd.d
dtrace: script './writesbycmdfd.d' matched 1 probe
^C
  cat                                                               1      58
  sed                                                               1      60
  grep                                                              1      89
  tee                                                               1     156
  tee                                                               3     156
  make.bin                                                          5     164
  acomp                                                             1     263
  macrogen                                                          4     286
  cg                                                                1     397
  acomp                                                             3     736
  make.bin                                                          1     880
  iropt                                                             4    1731
#

The following example displays the average time spent in the write system call, organized by process name. This example uses the avg aggregating function, specifying the expression to average as the argument. The example averages the wall clock time spent in the system call:

syscall::write:entry
{
        self->ts = timestamp;
}

syscall::write:return
/self->ts/
{
        @time[execname] = avg(timestamp - self->ts);
        self->ts = 0;
}

The following example output shows the result of running this command, waiting for a few seconds, and pressing Control-C:

# dtrace -s writetime.d
dtrace: script './writetime.d' matched 2 probes
^C
  iropt                                                         31315
  acomp                                                         37037
  make.bin                                                      63736
  tee                                                           68702
  date                                                          84020
  sh                                                            91632
  dtrace                                                       159200
  ctfmerge                                                     321560
  install                                                      343300
  mcs                                                          394400
  get                                                          413695
  ctfconvert                                                   594400
  bringover                                                   1332465
  tail                                                        1335260
#

The average can be useful, but often does not provide sufficient detail to understand the distribution of data points. To understand the distribution in further detail, use the quantize aggregating function as shown in the following example:

syscall::write:entry
{
        self->ts = timestamp;
}

syscall::write:return
/self->ts/
{
        @time[execname] = quantize(timestamp - self->ts);
        self->ts = 0;
}

Because each line of output becomes a frequency distribution diagram, the output of this script is substantially longer than previous ones. The following example shows a selection of sample output:

lint
           value  ------------- Distribution ------------- count
            8192 |                                         0
           16384 |                                         2
           32768 |                                         0
           65536 |@@@@@@@@@@@@@@@@@@@                      74
          131072 |@@@@@@@@@@@@@@@                          59
          262144 |@@@                                      14
          524288 |                                         0

  acomp
           value  ------------- Distribution ------------- count
            4096 |                                         0
            8192 |@@@@@@@@@@@@                             840
           16384 |@@@@@@@@@@@                              750
           32768 |@@                                       165
           65536 |@@@@@@                                   460
          131072 |@@@@@@                                   446
          262144 |                                         16
          524288 |                                         0
         1048576 |                                         1
         2097152 |                                         0

  iropt
           value  ------------- Distribution ------------- count
            4096 |                                         0
            8192 |@@@@@@@@@@@@@@@@@@@@@@@                  4149
           16384 |@@@@@@@@@@                               1798
           32768 |@                                        332
           65536 |@                                        325
          131072 |@@                                       431
          262144 |                                         3
          524288 |                                         2
         1048576 |                                         1
         2097152 |                                         0

Notice that the rows for the frequency distribution are always power-of-two values. Each row indicates the count of the number of elements greater than or equal to the corresponding value, but less than the next larger row value. For example, the preceding output shows that iropt had 4,149 writes taking between 8,192 nanoseconds and 16,383 nanoseconds, inclusive.

While quantize is useful for getting quick insight into the data, you might want to examine a distribution across linear values instead. To display a linear value distribution, use the lquantize aggregating function. The lquantize() function takes three arguments in addition to a D expression: a lower bound, an upper bound, and a step. For example, if you wanted to look at the distribution of writes by file descriptor, a power-of-two quantization would not be effective. Instead, use a linear quantization with a small range, as shown in the following example:

syscall::write:entry
{
        @fds[execname] = lquantize(arg0, 0, 100, 1);
}

Running this script for several seconds yields a large amount of information. The following example shows a selection of typical output:

mountd
           value  ------------- Distribution ------------- count
              11 |                                         0
              12 |@                                        4
              13 |                                         0
              14 |@@@@@@@@@@@@@@@@@@@@@@@@@                70
              15 |                                         0
              16 |@@@@@@@@@@@@                             34
              17 |                                         0

  xemacs-20.4
           value  ------------- Distribution ------------- count
               6 |                                         0
               7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  521
               8 |                                         0
               9 |                                         1
              10 |                                         0

  make.bin
           value  ------------- Distribution ------------- count
               0 |                                         0
               1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  3596
               2 |                                         0
               3 |                                         0
               4 |                                         42
               5 |                                         50
               6 |                                         0

  acomp
           value  ------------- Distribution ------------- count
               0 |                                         0
               1 |@@@@@                                    1156
               2 |                                         0
               3 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         6635
               4 |@                                        297
               5 |                                         0

  iropt
           value  ------------- Distribution ------------- count
               2 |                                         0
               3 |                                         299
               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  20144
               5 |                                         0

You can also use the lquantize aggregating function to aggregate on time since some point in the past. This technique allows you to observe a change in behavior over time. The following example displays the change in system call behavior over the lifetime of a process executing the date command:

syscall::exec:return,
syscall::exece:return
/execname == "date"/
{
        self->start = vtimestamp;
}

syscall:::entry
/self->start/
{
        /*
        * You linearly quantize on the current virtual time minus the
        * process's start time. Divide by 1000 to yield microseconds
        * rather than nanoseconds. The range runs from 0 to 10 milliseconds
        * in steps of 100 microseconds; you can expect that no date process
        * will take longer than 10 milliseconds to complete.
        */
        @a["system calls over time"] =
            lquantize((vtimestamp - self->start) / 1000, 0, 10000, 100);
}

syscall::rexit:entry
/self->start/
{
        self->start = 0;
}

The preceding script provides greater insight into system call behavior when many date processes are executed. To see this result, run sh -c 'while true; do date >/dev/null; done' in one window, while executing the D script in another. The script produces a profile of the system call behavior of the date command:

# dtrace -s dateprof.d
dtrace: script './dateprof.d' matched 218 probes
^C
  system calls over time
           value  ------------- Distribution ------------- count
             < 0 |                                         0
               0 |@@                                       20530
             100 |@@@@@@                                   48814
             200 |@@@                                      28119
             300 |@                                        14646
             400 |@@@@@                                    41237
             500 |                                         1259
             600 |                                         218
             700 |                                         116
             800 |@                                        12783
             900 |@@@                                      28133
            1000 |                                         7897
            1100 |@                                        14065
            1200 |@@@                                      27549
            1300 |@@@                                      25715
            1400 |@@@@                                     35011
            1500 |@@                                       16734
            1600 |                                         498
            1700 |                                         256
            1800 |                                         369
            1900 |                                         404
            2000 |                                         320
            2100 |                                         555
            2200 |                                         54
            2300 |                                         17
            2400 |                                         5
            2500 |                                         1
            2600 |                                         7
            2700 |                                         0

This output provides a rough idea of the different phases of the date command with respect to the services required of the kernel. To better understand these phases, you might want to understand which system calls are being called when. If so, you could change the D script to aggregate on the variable probefunc instead of a constant string.

You can use the llquantize aggregating function to group data both logarithmically and linearly. The arguments to the llquantize() function in order are: the expression, the base, the low exponent, the high exponent, and the number of steps per order of magnitude. The following example displays system call latencies in the microsecond range. Note that timestamp is in nanoseconds and 1000 nanoseconds = 1 microsecond.

syscall:::entry
{
        self->ts = timestamp;
}
	
syscall:::return
/ self->ts /
{
        /* The expression "timestamp - self->ts" gives the system call latency in nanoseconds */
            
        @ = llquantize(timestamp - self->ts, 10, 3, 5, 10);
        self->ts = 0;
}

In this example the expression @ = llquantize(timestamp - self->ts, 10, 3, 5, 10); generates a frequency distribution of the observed system call latencies. The distribution runs from 10³ nanoseconds through 10⁵ nanoseconds (inclusive). For each order of the magnitude, the data is displayed linearly in groups of 10. This example generates an output similar to the following:

    value  ------------- Distribution ------------- count    
          < 1000 |@@@@@@                                   12899    
            1000 |@@@@@@@@@@@@                             26357    
            2000 |@                                        3202     
            3000 |@                                        1869     
            4000 |@                                        2110     
            5000 |@@                                       4716     
            6000 |@@                                       3998     
            7000 |@                                        1617     
            8000 |@@                                       4924     
            9000 |@                                        2515     
           10000 |@@@@@@@                                  15307    
           20000 |@                                        2240     
           30000 |@                                        1327     
           40000 |@                                        1369     
           50000 |                                         990      
           60000 |                                         1057     
           70000 |                                         631      
           80000 |                                         453      
           90000 |                                         434      
          100000 |@                                        1570     
          200000 |                                         228      
          300000 |                                         45       
          400000 |                                         59       
          500000 |                                         60       
          600000 |                                         52       
          700000 |                                         30       
          800000 |                                         22       
          900000 |                                         17       
      >= 1000000 |                                         513

You can use the stddev aggregating function to characterize the distribution of data points. This example shows the average and standard deviation of the time it takes to execute processes:

syscall::exece:entry
{
        self->ts = timestamp;
}

syscall::exece:return

/ self->ts /
{
        t = timestamp - self->ts;
        @execavg[probefunc] = avg(t);
        @execsd[probefunc] = stddev(t);

        self->ts = 0;
}

END
{
        printf("AVERAGE:");
        printa(@execavg);
        printf("nSTDDEV:");
        printa(@execsd);
}

The sample output as follows:

# dtrace -s ./stddev.d
dtrace: script './stddev.d' matched 3 probes
^C
CPU     ID                    FUNCTION:NAME
  0      2                             :END AVERAGE:
  exece                                                       7053786
STDDEV:
  exece                                                       9470351

For aggregations using count() and sum() the result may be seen by using the agghist runtime option. The following example shows how agghist might be used when counting the number of system calls invoked by each executable:

syscall:::entry
{
@[execname] = count();
}

# dtrace -s syscall.d -x agghist
dtrace: description 'syscall:::entry' matched 221 probes
^C
.
               key -------------- Distribution ------------- count    
               fmd |                                         1        
             inetd |                                         1        
               rad |                                         1        
               cat |                                         47      
  gdm-simple-greet |                                         107      
              mail |                                         119      
             httpd |@                                        174      
           sstored |@                                        288      
             quota |@                                        316      
       svc.configd |@                                        351      
              Xorg |@                                        392      
              bash |@@                                       550      
              nscd |@@@@                                     1299    
              sshd |@@@@@@@@                                 2794    
            dtrace |@@@@@@@@@@@@@@@@@@@@@                    7218

By default, aggregation histograms are displayed such that the full width available for a bar represents 100% of the measured variable. The aggzoom option scales the display such that the result with the highest frequency count consumes the full width available. Relative proportions are preserved but smaller bars may become more visible. The following example shows the result of adding aggzoom:

# dtrace -s syscall.d -x agghist -x aggzoom
dtrace: description 'syscall:::entry' matched 221 probes
^C
.
               key -------------- Distribution ------------- count    
               fmd |                                         1        
             inetd |                                         1        
               rad |                                         1        
               cat |                                         47      
       VBoxService |                                         74      
  gdm-simple-greet |@                                        110      
              mail |@                                        119      
             httpd |@                                        207      
           sstored |@@                                       310      
             quota |@@                                       316      
       svc.configd |@@                                       351      
              Xorg |@@                                       425      
              bash |@@@                                      692      
              nscd |@@@@@@                                   1299    
              sshd |@@@@@@@@@@@@@@                           2979    
            dtrace |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 8246

The quantize() and lquantize() aggregating actions can sometimes generate an large amount of output. For these two actions, the aggpack option provides a compressed output and improves the readability. The following example measures the distribution of times spent within each system call:

syscall:::entry
{
self->ts = timestamp;
}
.
syscall:::return
/self->ts/
{
@[probefunc] = quantize(timestamp - self->ts);
self->ts = 0;
}

# dtrace -s syscalls.d -x aggpack
dtrace: script 'syscalls.d' matched 442 probes
^C
.
               key  min -------------------------- max       | count    
         sigaction  256 :  X_                    : 536870912 | 4        
           fstatat  256 :     X                  : 536870912 | 1        
            getuid  256 :    X                   : 536870912 | 2        
              mmap  256 :     X                  : 536870912 | 1        
            uadmin  256 :    X                   : 536870912 | 2        
              zone  256 :    X                   : 536870912 | 2        
         getrandom  256 :    xx                  : 536870912 | 2        
           waitsys  256 :     X                  : 536870912 | 2        
         sysconfig  256 :  _x__                  : 536870912 | 7        
         setitimer  256 :    xx                  : 536870912 | 4        
             lseek  256 :   x __                 : 536870912 | 4        
               brk  256 :  _ ___                 : 536870912 | 4        
       lwp_sigmask  256 :   X_                   : 536870912 | 13      
              read  256 :     Xx                 : 536870912 | 3        
          schedctl  256 :       X                : 536870912 | 1        
        setcontext  256 :       X                : 536870912 | 1        
            writev  256 :       X                : 536870912 | 2        
           recvmsg  256 :    _x__                : 536870912 | 10      
     clock_gettime  256 :   X__                  : 536870912 | 93      
             write  256 :       _ X              : 536870912 | 4        
          p_online  256 : X__  _                 : 536870912 | 1024    
          getdents  256 :           xx           : 536870912 | 4        
             ioctl  256 :  xx____ _  __          : 536870912 | 2068    
          lwp_park  256 :                     xx : 536870912 | 2        
           pollsys  256 :      _x   _         __ : 536870912 | 13      
         nanosleep  256 :                     _X : 536870912 | 5        
            portfs  256 :     _             X  _ : 536870912 | 39

Each histogram now occupies a single line in which one of the characters _, x and X is used to approximate bars for populated intervals. The aggpack and aggzoom options may be combined to draw attention to otherwise emptier intervals. Repeating the previous example output.

key  min ----------------------------- max        | count    
             fcntl   64 :      X                    : 4294967296 | 1        
         sigaction   64 :    Xx                     : 4294967296 | 4        
         getrandom   64 :       X                   : 4294967296 | 1        
              mmap   64 :       X                   : 4294967296 | 1        
              pset   64 :     X x                   : 4294967296 | 3        
           fstatat   64 :     XXX                   : 4294967296 | 3        
               brk   64 :    x Xx                   : 4294967296 | 4        
            getuid   64 :      X                    : 4294967296 | 7        
            uadmin   64 :      X                    : 4294967296 | 7        
              zone   64 :      X                    : 4294967296 | 7        
        setcontext   64 :        X                  : 4294967296 | 1        
         sysconfig   64 :    _XX_                   : 4294967296 | 17      
           waitsys   64 :      xX                   : 4294967296 | 7        
       lwp_sigmask   64 :     X__                   : 4294967296 | 17      
             lseek   64 : x  _XxX                   : 4294967296 | 15      
              read   64 :      xX                   : 4294967296 | 13      
             close   64 :      XXXX                 : 4294967296 | 4        
             pread   64 :          X                : 4294967296 | 1        
          schedctl   64 :          X                : 4294967296 | 1        
         so_socket   64 :          X                : 4294967296 | 1        
           sendmsg   64 :       X_                  : 4294967296 | 8        
         setitimer   64 :     _XX                   : 4294967296 | 26      
            openat   64 :       X X X               : 4294967296 | 3        
             write   64 :      XX_ _                : 4294967296 | 15      
            writev   64 :       _X_                 : 4294967296 | 14      
          p_online   64 :   X__                     : 4294967296 | 1024    
           recvmsg   64 :     _xX_                  : 4294967296 | 83      
     clock_gettime   64 :    _X__                   : 4294967296 | 509      
          getdents   64 :           _X_             : 4294967296 | 14      
             ioctl   64 :    _X_________            : 4294967296 | 8263    
          lwp_park   64 :                   _  X__  : 4294967296 | 12      
            portfs   64 :       _           X  _    : 4294967296 | 223      
           pollsys   64 :       xX_ __       _xX__  : 4294967296 | 82      
         nanosleep   64 :                     xX  _ : 4294967296 | 44