About Aggregations
DTrace stores the results of aggregating functions in objects called aggregations. The aggregation results are indexed using a tuple of expressions similar to those used for associative arrays. In D, the syntax for an aggregation is:
@name[ keys ] = aggfunc ( args );
where name is the name of the aggregation, keys is a comma-separated list of D expressions, aggfunc is one of the DTrace aggregating functions, and args is a comma-separated list of arguments appropriate for the aggregating function. The aggregation name is a D identifier that is prefixed with the special character @
. All aggregations named in your D programs are global variables; there are no thread- or clause-local aggregations. The aggregation names are kept in a separate identifier namespace from other D global variables. Remember that a
and @a
are not the same variable if you reuse names. The special aggregation name @
can be used to name an anonymous aggregation in simple D programs. The D compiler treats this name as an alias for the aggregation name @_
.
The DTrace aggregating functions are shown in the following table. Most aggregating functions take just a single argument that represents the new data.
Table 3-1 DTrace Aggregating Functions
Function | Arguments | Result |
---|---|---|
|
none |
The number of times called. |
|
scalar expression |
The total value of the specified expressions. |
|
scalar expression |
The arithmetic average of the specified expressions. |
|
scalar expression |
The smallest value among the specified expressions. |
|
scalar expression |
The largest value among the specified expressions. |
|
scalar expression |
The standard deviation of the specified expressions. |
|
scalar expression, lower bound, upper bound, step value |
A linear frequency distribution, sized by the specified range, of the values of the specified expressions. Increments the value in the highest bucket that is less than the specified expression. |
|
scalar expression |
A power-of-two frequency distribution of the values of the specified expressions. Increments the value in the highest power-of-two bucket that is less than the specified expression. |
|
scalar expression, a factor (base of the logarithm), lower bound, upper bound, step value |
A frequency distribution table of the values specified in the expression. Increments the value in the highest bucket that is less than the specified expression. The values are grouped logarithmically at first and then grouped linearly. |
For example, to count the number of write
system calls in the system, you can use an informative string as a key and the count
aggregating function.
syscall::write:entry { @counts["write system calls"] = count(); }
The dtrace
command prints aggregation results by default when the process terminates, either as the result of an explicit END
action or when the user presses Control-C. The following example output shows the result of running this command, waiting for a few seconds, and pressing Control-C.
# dtrace -s writes.d
dtrace: script './writes.d' matched 1 probe
^C
write system calls 179
#
You can count system calls per process name by using the execname
variable as the key to an aggregation:
syscall::write:entry { @counts[execname] = count(); }
The following example output shows the result of running this command, waiting for a few seconds, and pressing Control-C:
# dtrace -s writesbycmd.d
dtrace: script './writesbycmd.d' matched 1 probe
^C
dtrace 1
cat 4
sed 9
head 9
grep 14
find 15
tail 25
mountd 28
expr 72
sh 291
tee 814
def.dir.flp 1996
make.bin 2010
#
Alternatively, you might want to further examine writes organized by both executable name and file descriptor. The file descriptor is the first argument to write
, so the following example uses a key consisting of both execname
and arg0
:
syscall::write:entry { @counts[execname, arg0] = count(); }
Running this command results in a table with both executable name and file descriptor, as shown in the following example:
# dtrace -s writesbycmdfd.d
dtrace: script './writesbycmdfd.d' matched 1 probe
^C
cat 1 58
sed 1 60
grep 1 89
tee 1 156
tee 3 156
make.bin 5 164
acomp 1 263
macrogen 4 286
cg 1 397
acomp 3 736
make.bin 1 880
iropt 4 1731
#
The following example displays the average time spent in the write system call, organized by process name. This example uses the avg
aggregating function, specifying the expression to average as the argument. The example averages the wall clock time spent in the system call:
syscall::write:entry { self->ts = timestamp; } syscall::write:return /self->ts/ { @time[execname] = avg(timestamp - self->ts); self->ts = 0; }
The following example output shows the result of running this command, waiting for a few seconds, and pressing Control-C:
# dtrace -s writetime.d
dtrace: script './writetime.d' matched 2 probes
^C
iropt 31315
acomp 37037
make.bin 63736
tee 68702
date 84020
sh 91632
dtrace 159200
ctfmerge 321560
install 343300
mcs 394400
get 413695
ctfconvert 594400
bringover 1332465
tail 1335260
#
The average can be useful, but often does not provide sufficient detail to understand the distribution of data points. To understand the distribution in further detail, use the quantize
aggregating function as shown in the following example:
syscall::write:entry { self->ts = timestamp; } syscall::write:return /self->ts/ { @time[execname] = quantize(timestamp - self->ts); self->ts = 0; }
Because each line of output becomes a frequency distribution diagram, the output of this script is substantially longer than previous ones. The following example shows a selection of sample output:
lint value ------------- Distribution ------------- count 8192 | 0 16384 | 2 32768 | 0 65536 |@@@@@@@@@@@@@@@@@@@ 74 131072 |@@@@@@@@@@@@@@@ 59 262144 |@@@ 14 524288 | 0 acomp value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@ 840 16384 |@@@@@@@@@@@ 750 32768 |@@ 165 65536 |@@@@@@ 460 131072 |@@@@@@ 446 262144 | 16 524288 | 0 1048576 | 1 2097152 | 0 iropt value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@@@@ 4149 16384 |@@@@@@@@@@ 1798 32768 |@ 332 65536 |@ 325 131072 |@@ 431 262144 | 3 524288 | 2 1048576 | 1 2097152 | 0
Notice that the rows for the frequency distribution are always power-of-two values. Each row indicates the count of the number of elements greater than or equal to the corresponding value, but less than the next larger row value. For example, the preceding output shows that iropt
had 4,149 writes taking between 8,192 nanoseconds and 16,383 nanoseconds, inclusive.
While quantize
is useful for getting quick insight into the data, you might want to examine a distribution across linear values instead. To display a linear value distribution, use the lquantize
aggregating function. The lquantize
() function takes three arguments in addition to a D expression: a lower bound, an upper bound, and a step. For example, if you wanted to look at the distribution of writes by file descriptor, a power-of-two quantization would not be effective. Instead, use a linear quantization with a small range, as shown in the following example:
syscall::write:entry { @fds[execname] = lquantize(arg0, 0, 100, 1); }
Running this script for several seconds yields a large amount of information. The following example shows a selection of typical output:
mountd value ------------- Distribution ------------- count 11 | 0 12 |@ 4 13 | 0 14 |@@@@@@@@@@@@@@@@@@@@@@@@@ 70 15 | 0 16 |@@@@@@@@@@@@ 34 17 | 0 xemacs-20.4 value ------------- Distribution ------------- count 6 | 0 7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 521 8 | 0 9 | 1 10 | 0 make.bin value ------------- Distribution ------------- count 0 | 0 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3596 2 | 0 3 | 0 4 | 42 5 | 50 6 | 0 acomp value ------------- Distribution ------------- count 0 | 0 1 |@@@@@ 1156 2 | 0 3 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6635 4 |@ 297 5 | 0 iropt value ------------- Distribution ------------- count 2 | 0 3 | 299 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 20144 5 | 0
You can also use the lquantize
aggregating function to aggregate on time since some point in the past. This technique allows you to observe a change in behavior over time. The following example displays the change in system call behavior over the lifetime of a process executing the date
command:
syscall::exec:return, syscall::exece:return /execname == "date"/ { self->start = vtimestamp; } syscall:::entry /self->start/ { /* * You linearly quantize on the current virtual time minus the * process's start time. Divide by 1000 to yield microseconds * rather than nanoseconds. The range runs from 0 to 10 milliseconds * in steps of 100 microseconds; you can expect that no date process * will take longer than 10 milliseconds to complete. */ @a["system calls over time"] = lquantize((vtimestamp - self->start) / 1000, 0, 10000, 100); } syscall::rexit:entry /self->start/ { self->start = 0; }
The preceding script provides greater insight into system call behavior when many date
processes are executed. To see this result, run sh -c 'while true; do date >/dev/null; done'
in one window, while executing the D script in another. The script produces a profile of the system call behavior of the date
command:
# dtrace -s dateprof.d
dtrace: script './dateprof.d' matched 218 probes
^C
system calls over time
value ------------- Distribution ------------- count
< 0 | 0
0 |@@ 20530
100 |@@@@@@ 48814
200 |@@@ 28119
300 |@ 14646
400 |@@@@@ 41237
500 | 1259
600 | 218
700 | 116
800 |@ 12783
900 |@@@ 28133
1000 | 7897
1100 |@ 14065
1200 |@@@ 27549
1300 |@@@ 25715
1400 |@@@@ 35011
1500 |@@ 16734
1600 | 498
1700 | 256
1800 | 369
1900 | 404
2000 | 320
2100 | 555
2200 | 54
2300 | 17
2400 | 5
2500 | 1
2600 | 7
2700 | 0
This output provides a rough idea of the different phases of the date
command with respect to the services required of the kernel. To better understand these phases, you might want to understand which system calls are being called when. If so, you could change the D script to aggregate on the variable probefunc
instead of a constant string.
You can use the llquantize
aggregating function to group data both logarithmically and linearly. The arguments to the llquantize
() function in order are: the expression, the base, the low exponent, the high exponent, and the number of steps per order of magnitude. The following example displays system call latencies in the microsecond range. Note that timestamp
is in nanoseconds and 1000 nanoseconds = 1 microsecond.
syscall:::entry { self->ts = timestamp; } syscall:::return / self->ts / { /* The expression "timestamp - self->ts" gives the system call latency in nanoseconds */ @ = llquantize(timestamp - self->ts, 10, 3, 5, 10); self->ts = 0; }
In this example the expression @ = llquantize(timestamp - self->ts, 10, 3, 5, 10);
generates a frequency distribution of the observed system call latencies. The distribution runs from 103 nanoseconds through 105 nanoseconds (inclusive). For each order of the magnitude, the data is displayed linearly in groups of 10. This example generates an output similar to the following:
value ------------- Distribution ------------- count < 1000 |@@@@@@ 12899 1000 |@@@@@@@@@@@@ 26357 2000 |@ 3202 3000 |@ 1869 4000 |@ 2110 5000 |@@ 4716 6000 |@@ 3998 7000 |@ 1617 8000 |@@ 4924 9000 |@ 2515 10000 |@@@@@@@ 15307 20000 |@ 2240 30000 |@ 1327 40000 |@ 1369 50000 | 990 60000 | 1057 70000 | 631 80000 | 453 90000 | 434 100000 |@ 1570 200000 | 228 300000 | 45 400000 | 59 500000 | 60 600000 | 52 700000 | 30 800000 | 22 900000 | 17 >= 1000000 | 513
You can use the stddev
aggregating function to characterize the distribution of data points. This example shows the average and standard deviation of the time it takes to execute processes:
syscall::exece:entry { self->ts = timestamp; } syscall::exece:return / self->ts / { t = timestamp - self->ts; @execavg[probefunc] = avg(t); @execsd[probefunc] = stddev(t); self->ts = 0; } END { printf("AVERAGE:"); printa(@execavg); printf("nSTDDEV:"); printa(@execsd); }
The sample output as follows:
# dtrace -s ./stddev.d
dtrace: script './stddev.d' matched 3 probes
^C
CPU ID FUNCTION:NAME
0 2 :END AVERAGE:
exece 7053786
STDDEV:
exece 9470351
For aggregations using count()
and sum()
the result may be seen by using the agghist
runtime option. The following example shows how agghist
might be used when counting the number of system calls invoked by each executable:
syscall:::entry { @[execname] = count(); }
# dtrace -s syscall.d -x agghist
dtrace: description 'syscall:::entry' matched 221 probes
^C
.
key -------------- Distribution ------------- count
fmd | 1
inetd | 1
rad | 1
cat | 47
gdm-simple-greet | 107
mail | 119
httpd |@ 174
sstored |@ 288
quota |@ 316
svc.configd |@ 351
Xorg |@ 392
bash |@@ 550
nscd |@@@@ 1299
sshd |@@@@@@@@ 2794
dtrace |@@@@@@@@@@@@@@@@@@@@@ 7218
By default, aggregation histograms are displayed such that the full width available for a bar represents 100% of the measured variable. The aggzoom
option scales the display such that the result with the highest frequency count consumes the full width available. Relative proportions are preserved but smaller bars may become more visible. The following example shows the result of adding aggzoom
:
# dtrace -s syscall.d -x agghist -x aggzoom
dtrace: description 'syscall:::entry' matched 221 probes
^C
.
key -------------- Distribution ------------- count
fmd | 1
inetd | 1
rad | 1
cat | 47
VBoxService | 74
gdm-simple-greet |@ 110
mail |@ 119
httpd |@ 207
sstored |@@ 310
quota |@@ 316
svc.configd |@@ 351
Xorg |@@ 425
bash |@@@ 692
nscd |@@@@@@ 1299
sshd |@@@@@@@@@@@@@@ 2979
dtrace |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 8246
The quantize
() and lquantize
() aggregating actions can sometimes generate an large amount of output. For these two actions, the aggpack
option provides a compressed output and improves the readability. The following example measures the distribution of times spent within each system call:
syscall:::entry { self->ts = timestamp; } . syscall:::return /self->ts/ { @[probefunc] = quantize(timestamp - self->ts); self->ts = 0; }
# dtrace -s syscalls.d -x aggpack
dtrace: script 'syscalls.d' matched 442 probes
^C
.
key min -------------------------- max | count
sigaction 256 : X_ : 536870912 | 4
fstatat 256 : X : 536870912 | 1
getuid 256 : X : 536870912 | 2
mmap 256 : X : 536870912 | 1
uadmin 256 : X : 536870912 | 2
zone 256 : X : 536870912 | 2
getrandom 256 : xx : 536870912 | 2
waitsys 256 : X : 536870912 | 2
sysconfig 256 : _x__ : 536870912 | 7
setitimer 256 : xx : 536870912 | 4
lseek 256 : x __ : 536870912 | 4
brk 256 : _ ___ : 536870912 | 4
lwp_sigmask 256 : X_ : 536870912 | 13
read 256 : Xx : 536870912 | 3
schedctl 256 : X : 536870912 | 1
setcontext 256 : X : 536870912 | 1
writev 256 : X : 536870912 | 2
recvmsg 256 : _x__ : 536870912 | 10
clock_gettime 256 : X__ : 536870912 | 93
write 256 : _ X : 536870912 | 4
p_online 256 : X__ _ : 536870912 | 1024
getdents 256 : xx : 536870912 | 4
ioctl 256 : xx____ _ __ : 536870912 | 2068
lwp_park 256 : xx : 536870912 | 2
pollsys 256 : _x _ __ : 536870912 | 13
nanosleep 256 : _X : 536870912 | 5
portfs 256 : _ X _ : 536870912 | 39
Each histogram now occupies a single line in which one of the characters _
, x
and X
is used to approximate bars for populated intervals. The aggpack
and aggzoom
options may be combined to draw attention to otherwise emptier intervals. Repeating the previous example output.
key min ----------------------------- max | count fcntl 64 : X : 4294967296 | 1 sigaction 64 : Xx : 4294967296 | 4 getrandom 64 : X : 4294967296 | 1 mmap 64 : X : 4294967296 | 1 pset 64 : X x : 4294967296 | 3 fstatat 64 : XXX : 4294967296 | 3 brk 64 : x Xx : 4294967296 | 4 getuid 64 : X : 4294967296 | 7 uadmin 64 : X : 4294967296 | 7 zone 64 : X : 4294967296 | 7 setcontext 64 : X : 4294967296 | 1 sysconfig 64 : _XX_ : 4294967296 | 17 waitsys 64 : xX : 4294967296 | 7 lwp_sigmask 64 : X__ : 4294967296 | 17 lseek 64 : x _XxX : 4294967296 | 15 read 64 : xX : 4294967296 | 13 close 64 : XXXX : 4294967296 | 4 pread 64 : X : 4294967296 | 1 schedctl 64 : X : 4294967296 | 1 so_socket 64 : X : 4294967296 | 1 sendmsg 64 : X_ : 4294967296 | 8 setitimer 64 : _XX : 4294967296 | 26 openat 64 : X X X : 4294967296 | 3 write 64 : XX_ _ : 4294967296 | 15 writev 64 : _X_ : 4294967296 | 14 p_online 64 : X__ : 4294967296 | 1024 recvmsg 64 : _xX_ : 4294967296 | 83 clock_gettime 64 : _X__ : 4294967296 | 509 getdents 64 : _X_ : 4294967296 | 14 ioctl 64 : _X_________ : 4294967296 | 8263 lwp_park 64 : _ X__ : 4294967296 | 12 portfs 64 : _ X _ : 4294967296 | 223 pollsys 64 : xX_ __ _xX__ : 4294967296 | 82 nanosleep 64 : xX _ : 4294967296 | 44