3.4 Data Normalization

When aggregating data over some period of time, you might want to normalize the data with respect to some constant factor. This technique enables you to compare disjoint data more easily. For example, when aggregating system calls, you might want to output system calls as a per-second rate instead of as an absolute value over the course of the run. The DTrace normalize action enables you to normalize data in this way. The parameters to normalize are an aggregation and a normalization factor. The output of the aggregation shows each value divided by the normalization factor.

The following example shows how to aggregate data by system call:

#pragma D option quiet

BEGIN
{
  /*
   * Get the start time, in nanoseconds.
   */
  start = timestamp;
}

syscall:::entry
{
  @func[execname] = count();
}

END
{
  /*
   * Normalize the aggregation based on the number of seconds we have
   * been running. (There are 1,000,000,000 nanoseconds in one second.)
   */
  normalize(@func, (timestamp - start) / 1000000000);
}

Running the above script for a brief period of time results in the following output on a desktop machine:

# dtrace -s normalize.d 
^C
  memballoon                                                        1
  udisks-daemon                                                     1
  vmstats                                                           1
  rtkit-daemon                                                      2
  automount                                                         2
  gnome-panel                                                       3
  gnome-settings-                                                   5
  NetworkManager                                                    6
  gvfs-afc-volume                                                   6
  metacity                                                          6
  qpidd                                                             9
  hald-addon-inpu                                                  14
  gnome-terminal                                                   19
  Xorg                                                             35
  VBoxClient                                                       52
  X11-NOTIFY                                                      104
  java                                                            143
  dtrace                                                          309
  sh                                                            36467
  date                                                          68142

normalize sets the normalization factor for the specified aggregation, but this action does not modify the underlying data. denormalize takes only an aggregation. Adding the denormalize action to the preceding example returns both raw system call counts and per-second rates:

#pragma D option quiet

BEGIN
{
  start = timestamp;
}

syscall:::entry
{
  @func[execname] = count();
}

END
{
  this->seconds = (timestamp - start) / 1000000000;
  printf("Ran for %d seconds.\n", this->seconds);
  printf("Per-second rate:\n");
  normalize(@func, this->seconds);
  printa(@func);
  printf("\nRaw counts:\n");
  denormalize(@func);
  printa(@func);
}

Running the above script for a brief period of time produces output similar to the following example:

# dtrace -s denorm.d 
^C
Ran for 7 seconds.
Per-second rate:

  audispd                                                           0
  auditd                                                            0
  memballoon                                                        0
  rtkit-daemon                                                      0
  timesync                                                          1
  gnome-power-man                                                   1
  vmstats                                                           1
  automount                                                         2
  udisks-daemon                                                     2
  gnome-panel                                                       2
  metacity                                                          2
  gnome-settings-                                                   3
  qpidd                                                             4
  clock-applet                                                      4
  gvfs-afc-volume                                                   5
  crond                                                             6
  gnome-terminal                                                    7
  vminfo                                                           15
  hald-addon-inpu                                                  32
  VBoxClient                                                       45
  Xorg                                                             63
  X11-NOTIFY                                                       90
  java                                                            126
  dtrace                                                          315
  sh                                                            31430
  date                                                          58724

Raw counts:

  audispd                                                           1
  auditd                                                            4
  memballoon                                                        4
  rtkit-daemon                                                      6
  timesync                                                          8
  gnome-power-man                                                   9
  vmstats                                                          12
  automount                                                        16
  udisks-daemon                                                    16
  gnome-panel                                                      20
  metacity                                                         20
  gnome-settings-                                                  22
  qpidd                                                            28
  clock-applet                                                     34
  gvfs-afc-volume                                                  40
  crond                                                            42
  gnome-terminal                                                   54
  vminfo                                                          105
  hald-addon-inpu                                                 225
  VBoxClient                                                      318
  Xorg                                                            444
  X11-NOTIFY                                                      634
  java                                                            883
  dtrace                                                         2207
  sh                                                           220016
  date                                                         411073

Aggregations can also be renormalized. If normalize is called more than once for the same aggregation, the normalization factor is the factor specified in the most recent call. The following example prints per-second rates over time:

#pragma D option quiet

BEGIN
{
  start = timestamp;
}

syscall:::entry
{
  @func[execname] = count();
}

tick-10sec
{
  normalize(@func, (timestamp - start) / 1000000000);
  printa(@func);
}