Data Normalization
When aggregating data over some period of time, you might want to normalize the data with respect to some constant factor. This technique enables you to compare disjoint data more easily. For example, when aggregating system calls, you might want to output system calls as a per-second rate instead of as an absolute value over the course of the run. The DTrace normalize
action enables you to normalize data in this way. The parameters to normalize
are an aggregation and a normalization factor. The output of the aggregation shows each value divided by the normalization factor.
Example 3-1 Normalizing an Aggregation With normalize.d
The following example shows how to aggregate data by system call:
#pragma D option quiet BEGIN { /* * Get the start time, in nanoseconds. */ start = timestamp; } syscall:::entry { @func[execname] = count(); } END { /* * Normalize the aggregation based on the number of seconds it has * been running. (There are 1,000,000,000 nanoseconds in one second.) */ normalize(@func, (timestamp - start) / 1000000000); }
Running the preceding script for a brief period of time results in the following output on a physical machine:
# dtrace -s ./normalize.d
^C
syslogd 0
rpc.rusersd 0
utmpd 0
xbiff 0
in.routed 1
sendmail 2
echo 2
FvwmAuto 2
stty 2
cut 2
init 2
pt_chmod 3
picld 3
utmp_update 3
httpd 4
xclock 5
basename 6
tput 6
sh 7
tr 7
arch 9
expr 10
uname 11
mibiisa 15
dirname 18
dtrace 40
ksh 48
java 58
xterm 100
nscd 120
fvwm2 154
prstat 180
perfbar 188
Xsun 1309
normalize
sets the normalization factor for the specified aggregation, but this action does not modify the underlying data. denormalize
takes only an aggregation. Adding the denormalize
action to the preceding example returns both raw system call counts and per-second rates.
Example 3-2 Denormalizing an Aggregation With denorm.d
#pragma D option quiet BEGIN { start = timestamp; } syscall:::entry { @func[execname] = count(); } END { this->seconds = (timestamp - start) / 1000000000; printf("Ran for %d seconds.n", this->seconds); printf("Per-second rate:n"); normalize(@func, this->seconds); printa(@func); printf("nRaw counts:n"); denormalize(@func); printa(@func); }
Running the preceding script for a brief period of time produces output similar to the following example:
# dtrace -s ./denorm.d
^C
Ran for 14 seconds.
Per-second rate:
syslogd 0
in.routed 0
xbiff 1
sendmail 2
elm 2
picld 3
httpd 4
xclock 6
FvwmAuto 7
mibiisa 22
dtrace 42
java 55
xterm 75
adeptedit 118
nscd 127
prstat 179
perfbar 184
fvwm2 296
Xsun 829
Raw counts:
syslogd 1
in.routed 4
xbiff 21
sendmail 30
elm 36
picld 43
httpd 56
xclock 91
FvwmAuto 104
mibiisa 314
dtrace 592
java 774
xterm 1062
adeptedit 1665
nscd 1781
prstat 2506
perfbar 2581
fvwm2 4156
Xsun 11616
Aggregations can also be renormalized. If normalize
is called more than once for the same aggregation, the normalization factor will be the factor specified in the most recent call. The following example prints per-second rates over time:
Example 3-3 Renormalizing an Aggregation With renormalize.d
#pragma D option quiet BEGIN { start = timestamp; } syscall:::entry { @func[execname] = count(); } tick-10sec { normalize(@func, (timestamp - start) / 1000000000); printa(@func); }