sleep
and wakeup
In enqueue
and dequeue
, the final example demonstrated that a burst in run queue length was due to runnable xterm
processes. One hypothesis is that the observations resulted from a change in virtual desktop. You can use the wakeup probe to explore this hypothesis by determining who is waking the xterm
processes, and when, as shown in the following example:
#pragma D option quiet dtrace:::BEGIN { start = timestamp; } sched:::wakeup /stringof(args[1]->pr_fname) == "xterm"/ { @[execname] = lquantize((timestamp - start) / 1000000000, 0, 10); } profile:::tick-1sec /++x == 10/ { exit(0); }
To investigate the hypothesis, run the preceding script, waiting roughly five seconds, and switch your virtual desktop exactly once. If the burst of runnable xterm
processes is due to switching the virtual desktop, the output should show a burst of wakeup activity at the five second mark.
# dtrace -s ./xterm.d
Xsun
value -------------- Distribution ------------ count
4 | 0
5 |@ 1
6 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 32
7 | 0
The output does show that the X server is waking xterm processes, clustered around the time that you switched virtual desktops. If you wanted to understand the interaction between the X server and the xterm
processes, you could aggregate on user stack traces when the X server fires the wakeup
probe.
Understanding the performance of client/server systems like the X windowing system requires understanding the clients on whose behalf the server is doing work. This kind of question is difficult to answer with conventional performance analysis tools. However, if you have a model where a client sends a message to the server and sleeps pending the server's processing, you can use the wakeup probe to determine the client for whom the request is being performed, as shown in the following example:
self int last; sched:::wakeup /self->last && args[0]->pr_stype == SOBJ_CV/ { @[stringof(args[1]->pr_fname)] = sum(vtimestamp - self->last); self->last = 0; } sched:::wakeup /execname == "Xsun" && self->last == 0/ { self->last = vtimestamp; }
Running the preceding script results in output similar to the following example:
dtrace -s ./xwork.d
dtrace: script './xwork.d' matched 14 probes
^C
xterm 9522510
soffice.bin 9912594
fvwm2 100423123
MozillaFirebird 312227077
acroread 345901577
This output reveals that much Xsun
work is being done on behalf of the processes acroread
, MozillaFirebird
and, to a lesser degree, fvwm2
. Notice that the script only examined wake-ups from condition variable synchronization objects SOBJ_CV
. As described in proc pr_stype Values, condition variables are the type of synchronization object typically used to synchronize for reasons other than access to a shared data region. In the case of the X server, a client will wait for data in a pipe by sleeping on a condition variable.
You can additionally use the sleep
probe along with the wakeup
probe to understand which applications are blocking on which applications, and for how long, as shown in the following example:
#pragma D option quiet sched:::sleep /!(curlwpsinfo->pr_flag & PR_ISSYS) && curlwpsinfo->pr_stype == SOBJ_CV/ { bedtime[curlwpsinfo->pr_addr] = timestamp; } sched:::wakeup /bedtime[args[0]->pr_addr]/ { @[stringof(args[1]->pr_fname), execname] = quantize(timestamp - bedtime[args[0]->pr_addr]); bedtime[args[0]->pr_addr] = 0; } END { printa("%s sleeping on %s:\n%@d\n", @); }
The tail of the output from running the example script for several seconds on a desktop system resembles the following example:
# dtrace -s ./whofor.d
^C
...
xterm sleeping on Xsun:
value -------------- Distribution ------------ count
131072 | 0
262144 | 12
524288 | 2
1048576 | 0
2097152 | 5
4194304 |@@@ 45
8388608 | 1
16777216 | 9
33554432 |@@@@@ 83
67108864 |@@@@@@@@@@@ 164
134217728 |@@@@@@@@@@ 147
268435456 |@@@@ 56
536870912 |@ 17
1073741824 | 9
2147483648 | 1
4294967296 | 3
8589934592 | 1
17179869184 | 0
fvwm2 sleeping on Xsun:
value -------------- Distribution ------------ count
32768 | 0
65536 |@@@@@@@@@@@@@@@@@@@@@@ 67
131072 |@@@@@ 16
262144 |@@ 6
524288 |@ 3
1048576 |@@@@@ 15
2097152 | 0
4194304 | 0
8388608 | 1
16777216 | 0
33554432 | 0
67108864 | 1
134217728 | 0
268435456 | 0
536870912 | 1
1073741824 | 1
2147483648 | 2
4294967296 | 2
8589934592 | 2
17179869184 | 0
34359738368 | 2
68719476736 | 0
syslogd sleeping on syslogd:
value -------------- Distribution ------------ count
17179869184 | 0
34359738368 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3
68719476736 | 0
MozillaFirebird sleeping on MozillaFirebird:
value -------------- Distribution ------------ count
65536 | 0
131072 | 3
262144 |@@ 14
524288 | 0
1048576 |@@@ 18
2097152 | 0
4194304 | 0
8388608 | 1
16777216 | 0
33554432 | 1
67108864 | 3
134217728 |@ 7
268435456 |@@@@@@@@@@ 53
536870912 |@@@@@@@@@@@@@@ 78
1073741824 |@@@@ 25
2147483648 | 0
4294967296 | 0
8589934592 |@ 7
17179869184 | 0
You might want to understand how and why MozillaFirebird
is blocking on itself. You could modify the preceding script as shown in the following example to answer this question:
#pragma D option quiet sched:::sleep /execname == "MozillaFirebird" && curlwpsinfo->pr_stype == SOBJ_CV/ { bedtime[curlwpsinfo->pr_addr] = timestamp; } sched:::wakeup /execname == "MozillaFirebird" && bedtime[args[0]->pr_addr]/ { @[args[1]->pr_pid, args[0]->pr_lwpid, pid, curlwpsinfo->pr_lwpid] = quantize(timestamp - bedtime[args[0]->pr_addr]); bedtime[args[0]->pr_addr] = 0; } sched:::wakeup /bedtime[args[0]->pr_addr]/ { bedtime[args[0]->pr_addr] = 0; } END { printa("%d/%d sleeping on %d/%d:\n%@d\n", @); }
Running the modified script for several seconds results in output similar to the following example:
# dtrace -s ./firebird.d
^C
100459/1 sleeping on 100459/13:
value -------------- Distribution ------------ count
262144 | 0
524288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
1048576 | 0
100459/13 sleeping on 100459/1:
value -------------- Distribution ------------ count
16777216 | 0
33554432 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
67108864 | 0
100459/1 sleeping on 100459/2:
value -------------- Distribution ------------ count
16384 | 0
32768 |@@@@ 5
65536 |@ 2
131072 |@@@@@ 6
262144 | 1
524288 |@ 2
1048576 | 0
2097152 |@@ 3
4194304 |@@@@ 5
8388608 |@@@@@@@@ 9
16777216 |@@@@@ 6
33554432 |@@ 3
67108864 | 0
100459/1 sleeping on 100459/5:
value -------------- Distribution ------------ count
16384 | 0
32768 |@@@@@ 12
65536 |@@ 5
131072 |@@@@@@ 15
262144 | 1
524288 | 1
1048576 | 2
2097152 |@ 4
4194304 |@@@@@ 13
8388608 |@@@ 8
16777216 |@@@@@ 13
33554432 |@@ 6
67108864 |@@ 5
134217728 |@ 4
268435456 | 0
536870912 | 1
1073741824 | 0
100459/2 sleeping on 100459/1:
value -------------- Distribution ------------ count
16384 | 0
32768 |@@@@@@@@@@@@@@ 11
65536 | 0
131072 |@@ 2
262144 | 0
524288 | 0
1048576 |@@@@ 3
2097152 |@ 1
4194304 |@@ 2
8388608 |@@ 2
16777216 |@ 1
33554432 |@@@@@@ 5
67108864 | 0
134217728 | 0
268435456 | 0
536870912 |@ 1
1073741824 |@ 1
2147483648 |@ 1
4294967296 | 0
100459/5 sleeping on 100459/1:
value -------------- Distribution ------------ count
16384 | 0
32768 | 1
65536 | 2
131072 | 4
262144 | 7
524288 | 1
1048576 | 5
2097152 | 10
4194304 |@@@@@@ 77
8388608 |@@@@@@@@@@@@@@@@@@@@@@@ 270
16777216 |@@@ 43
33554432 |@ 20
67108864 |@ 14
134217728 | 5
268435456 | 2
536870912 | 1
1073741824 | 0
You can also use the sleep
and wakeup
probes to understand the performance of door servers such as the name service cache daemon, as shown in the following example:
sched:::sleep /curlwpsinfo->pr_stype == SOBJ_SHUTTLE/ { bedtime[curlwpsinfo->pr_addr] = timestamp; } sched:::wakeup /execname == "nscd" && bedtime[args[0]->pr_addr]/ { @[stringof(curpsinfo->pr_fname), stringof(args[1]->pr_fname)] = quantize(timestamp - bedtime[args[0]->pr_addr]); bedtime[args[0]->pr_addr] = 0; } sched:::wakeup /bedtime[args[0]->pr_addr]/ { bedtime[args[0]->pr_addr] = 0; }
The tail of the output from running the preceding script on a large mail server resembles the following example:
imapd value -------------- Distribution ------------ count 16384 | 0 32768 | 2 65536 |@@@@@@@@@@@@@@@@@ 57 131072 |@@@@@@@@@@@ 37 262144 | 3 524288 |@@@ 11 1048576 |@@@ 10 2097152 |@@ 9 4194304 | 1 8388608 | 0 mountd value -------------- Distribution ------------ count 65536 | 0 131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 49 262144 |@@@ 6 524288 | 1 1048576 | 0 2097152 | 0 4194304 |@@@@ 7 8388608 |@ 3 16777216 | 0 sendmail value -------------- Distribution ------------ count 16384 | 0 32768 |@ 18 65536 |@@@@@@@@@@@@@@@@@ 205 131072 |@@@@@@@@@@@@@ 154 262144 |@ 23 524288 | 5 1048576 |@@@@ 50 2097152 | 7 4194304 | 5 8388608 | 2 16777216 | 0 automountd value -------------- Distribution ------------ count 32768 | 0 65536 |@@@@@@@@@@ 22 131072 |@@@@@@@@@@@@@@@@@@@@@@@ 51 262144 |@@ 6 524288 | 1 1048576 | 0 2097152 | 2 4194304 | 2 8388608 | 1 16777216 | 1 33554432 | 1 67108864 | 0 134217728 | 0 268435456 | 1 536870912 | 0
You might be interested in the unusual data points for automountd
or the persistent data point at over one millisecond for sendmail
. You can add additional predicates to the preceding script to hone in on the causes of any exceptional or anomalous.