3.3.5 Using the quantize Function

The average and standard deviation can be useful for crude characterization, but often do not provide sufficient detail to understand the distribution of data points. To understand the distribution in further detail, use the quantize aggregating function, as shown in the following example, which is saved in a file named wrquantize.d:

syscall::write:entry
{
  self->ts = timestamp;
}

syscall::write:return
/self->ts/
{
  @time[execname] = quantize(timestamp - self->ts);
  self->ts = 0;
}

Because each line of output becomes a frequency distribution diagram, the output of this script is substantially longer than previous scripts. The following example shows a selection of sample output:

# dtrace -s wrquantize.d 
dtrace: script 'wrquantize.d' matched 2 probes
^C
...
  bash                                              
           value  ------------- Distribution ------------- count    
            8192 |                                         0        
           16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         4        
           32768 |                                         0        
           65536 |                                         0        
          131072 |@@@@@@@@                                 1        
          262144 |                                         0        

  gnome-terminal                                    
           value  ------------- Distribution ------------- count    
            4096 |                                         0        
            8192 |@@@@@@@@@@@@@                            5        
           16384 |@@@@@@@@@@@@@                            5        
           32768 |@@@@@@@@@@@                              4        
           65536 |@@@                                      1        
          131072 |                                         0        

  Xorg                                              
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@@@@@@@                                  4        
            8192 |@@@@@@@@@@@@@                            8        
           16384 |@@@@@@@@@@@@                             7        
           32768 |@@@                                      2        
           65536 |@@                                       1        
          131072 |                                         0        
          262144 |                                         0        
          524288 |                                         0        
         1048576 |                                         0        
         2097152 |@@@                                      2        
         4194304 |                                         0        

  firefox                                           
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@@@                                      22       
            8192 |@@@@@@@@@@@                              90       
           16384 |@@@@@@@@@@@@@                            107      
           32768 |@@@@@@@@@                                72       
           65536 |@@@                                      28       
          131072 |                                         3        
          262144 |                                         0        
          524288 |                                         1        
         1048576 |                                         1        
         2097152 |                                         0

The rows for the frequency distribution are always power-of-two values. Each row indicates a count of the number of elements that are greater than or equal to the corresponding value, but less than the next larger row's value. For example, the previous output shows that firefox had 107 writes, taking between 16,384 nanoseconds and 32,767 nanoseconds, inclusive.

The previous example shows the distribution of numbers of write times. You might also be interested in knowing which write times are contributing to the overall run time the most. You can optionally use the increment argument with the quantize function for this purpose. Note that the default value is 1, but this argument can be a D expression, as well as have negative values.

The following example shows a modified script:

 syscall::write:entry
{
  self->ts = timestamp;
}

syscall::write:return
/self->ts/
{
  self->delta = timestamp - self->ts;
  @time[execname] = quantize(self->delta, self->delta);
  self->ts = 0;
}