sleep and wakeup

`sleep` and `wakeup`

In enqueue and dequeue, the final example demonstrated that a burst in run queue length was due to runnable xterm processes. One hypothesis is that the observations resulted from a change in virtual desktop. You can use the wakeup probe to explore this hypothesis by determining who is waking the xterm processes, and when, as shown in the following example:

#pragma D option quiet

dtrace:::BEGIN
{
        start = timestamp;
}

sched:::wakeup
/stringof(args[1]->pr_fname) == "xterm"/
{
        @[execname] = lquantize((timestamp - start) / 1000000000, 0, 10);
}

profile:::tick-1sec
/++x == 10/
{
        exit(0);
}

To investigate the hypothesis, run the preceding script, waiting roughly five seconds, and switch your virtual desktop exactly once. If the burst of runnable xterm processes is due to switching the virtual desktop, the output should show a burst of wakeup activity at the five second mark.

# dtrace -s ./xterm.d

  Xsun

           value  -------------- Distribution ------------ count    
               4 |                                         0        
               5 |@                                        1        
               6 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   32       
               7 |                                         0

The output does show that the X server is waking xterm processes, clustered around the time that you switched virtual desktops. If you wanted to understand the interaction between the X server and the xterm processes, you could aggregate on user stack traces when the X server fires the wakeup probe.

Understanding the performance of client/server systems like the X windowing system requires understanding the clients on whose behalf the server is doing work. This kind of question is difficult to answer with conventional performance analysis tools. However, if you have a model where a client sends a message to the server and sleeps pending the server's processing, you can use the wakeup probe to determine the client for whom the request is being performed, as shown in the following example:

self int last;

sched:::wakeup
/self->last && args[0]->pr_stype == SOBJ_CV/
{
        @[stringof(args[1]->pr_fname)] = sum(vtimestamp - self->last);
        self->last = 0;
}

sched:::wakeup
/execname == "Xsun" && self->last == 0/
{
        self->last = vtimestamp;
}

Running the preceding script results in output similar to the following example:

dtrace -s ./xwork.d
dtrace: script './xwork.d' matched 14 probes
^C
  xterm                                                       9522510
  soffice.bin                                                 9912594
  fvwm2                                                     100423123
  MozillaFirebird                                           312227077
  acroread                                                  345901577

This output reveals that much Xsun work is being done on behalf of the processes acroread, MozillaFirebird and, to a lesser degree, fvwm2. Notice that the script only examined wake-ups from condition variable synchronization objects SOBJ_CV. As described in proc pr_stype Values, condition variables are the type of synchronization object typically used to synchronize for reasons other than access to a shared data region. In the case of the X server, a client will wait for data in a pipe by sleeping on a condition variable.

You can additionally use the sleep probe along with the wakeup probe to understand which applications are blocking on which applications, and for how long, as shown in the following example:

#pragma D option quiet

sched:::sleep
/!(curlwpsinfo->pr_flag & PR_ISSYS) && curlwpsinfo->pr_stype == SOBJ_CV/
{
        bedtime[curlwpsinfo->pr_addr] = timestamp;
}

sched:::wakeup
/bedtime[args[0]->pr_addr]/
{
        @[stringof(args[1]->pr_fname), execname] =
            quantize(timestamp - bedtime[args[0]->pr_addr]);
        bedtime[args[0]->pr_addr] = 0;
}

END
{
        printa("%s sleeping on %s:\n%@d\n", @);
}

The tail of the output from running the example script for several seconds on a desktop system resembles the following example:

# dtrace -s ./whofor.d
^C
...
 xterm sleeping on Xsun:

           value  -------------- Distribution ------------ count    
          131072 |                                         0        
          262144 |                                         12       
          524288 |                                         2        
         1048576 |                                         0        
         2097152 |                                         5        
         4194304 |@@@                                      45       
         8388608 |                                         1        
        16777216 |                                         9        
        33554432 |@@@@@                                    83       
        67108864 |@@@@@@@@@@@                              164      
       134217728 |@@@@@@@@@@                               147      
       268435456 |@@@@                                     56       
       536870912 |@                                        17       
      1073741824 |                                         9        
      2147483648 |                                         1        
      4294967296 |                                         3        
      8589934592 |                                         1        
     17179869184 |                                         0        

fvwm2 sleeping on Xsun:

           value  -------------- Distribution ------------ count    
           32768 |                                         0        
           65536 |@@@@@@@@@@@@@@@@@@@@@@                   67       
          131072 |@@@@@                                    16       
          262144 |@@                                       6        
          524288 |@                                        3        
         1048576 |@@@@@                                    15       
         2097152 |                                         0        
         4194304 |                                         0        
         8388608 |                                         1        
        16777216 |                                         0        
        33554432 |                                         0        
        67108864 |                                         1        
       134217728 |                                         0        
       268435456 |                                         0        
       536870912 |                                         1        
      1073741824 |                                         1        
      2147483648 |                                         2        
      4294967296 |                                         2        
      8589934592 |                                         2        
     17179869184 |                                         0        
     34359738368 |                                         2        
     68719476736 |                                         0        

syslogd sleeping on syslogd:

           value  -------------- Distribution ------------ count    
     17179869184 |                                         0        
     34359738368 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3        
     68719476736 |                                         0        

MozillaFirebird sleeping on MozillaFirebird:

           value  -------------- Distribution ------------ count    
           65536 |                                         0        
          131072 |                                         3        
          262144 |@@                                       14       
          524288 |                                         0        
         1048576 |@@@                                      18       
         2097152 |                                         0        
         4194304 |                                         0        
         8388608 |                                         1        
        16777216 |                                         0        
        33554432 |                                         1        
        67108864 |                                         3        
       134217728 |@                                        7        
       268435456 |@@@@@@@@@@                               53       
       536870912 |@@@@@@@@@@@@@@                           78       
      1073741824 |@@@@                                     25       
      2147483648 |                                         0        
      4294967296 |                                         0        
      8589934592 |@                                        7        
     17179869184 |                                         0

You might want to understand how and why MozillaFirebird is blocking on itself. You could modify the preceding script as shown in the following example to answer this question:

#pragma D option quiet

sched:::sleep
/execname == "MozillaFirebird" && curlwpsinfo->pr_stype == SOBJ_CV/
{
        bedtime[curlwpsinfo->pr_addr] = timestamp;
}

sched:::wakeup
/execname == "MozillaFirebird" && bedtime[args[0]->pr_addr]/
{
        @[args[1]->pr_pid, args[0]->pr_lwpid, pid, curlwpsinfo->pr_lwpid] = 
            quantize(timestamp - bedtime[args[0]->pr_addr]);
        bedtime[args[0]->pr_addr] = 0;
}

sched:::wakeup
/bedtime[args[0]->pr_addr]/
{
        bedtime[args[0]->pr_addr] = 0;
}

END
{
        printa("%d/%d sleeping on %d/%d:\n%@d\n", @);
}

Running the modified script for several seconds results in output similar to the following example:

# dtrace -s ./firebird.d
^C

 100459/1 sleeping on 100459/13:

           value  -------------- Distribution ------------ count    
          262144 |                                         0        
          524288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1        
         1048576 |                                         0        

100459/13 sleeping on 100459/1:

           value  -------------- Distribution ------------ count    
        16777216 |                                         0        
        33554432 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1        
        67108864 |                                         0        

100459/1 sleeping on 100459/2:

           value  -------------- Distribution ------------ count    
           16384 |                                         0        
           32768 |@@@@                                     5        
           65536 |@                                        2        
          131072 |@@@@@                                    6        
          262144 |                                         1        
          524288 |@                                        2        
         1048576 |                                         0        
         2097152 |@@                                       3        
         4194304 |@@@@                                     5        
         8388608 |@@@@@@@@                                 9        
        16777216 |@@@@@                                    6        
        33554432 |@@                                       3        
        67108864 |                                         0        

100459/1 sleeping on 100459/5:

           value  -------------- Distribution ------------ count    
           16384 |                                         0        
           32768 |@@@@@                                    12       
           65536 |@@                                       5        
          131072 |@@@@@@                                   15       
          262144 |                                         1        
          524288 |                                         1        
         1048576 |                                         2        
         2097152 |@                                        4        
         4194304 |@@@@@                                    13       
         8388608 |@@@                                      8        
        16777216 |@@@@@                                    13       
        33554432 |@@                                       6        
        67108864 |@@                                       5        
       134217728 |@                                        4        
       268435456 |                                         0        
       536870912 |                                         1        
      1073741824 |                                         0        

100459/2 sleeping on 100459/1:

           value  -------------- Distribution ------------ count    
           16384 |                                         0        
           32768 |@@@@@@@@@@@@@@                           11       
           65536 |                                         0        
          131072 |@@                                       2        
          262144 |                                         0        
          524288 |                                         0        
         1048576 |@@@@                                     3        
         2097152 |@                                        1        
         4194304 |@@                                       2        
         8388608 |@@                                       2        
        16777216 |@                                        1        
        33554432 |@@@@@@                                   5        
        67108864 |                                         0        
       134217728 |                                         0        
       268435456 |                                         0        
       536870912 |@                                        1        
      1073741824 |@                                        1        
      2147483648 |@                                        1        
      4294967296 |                                         0        

100459/5 sleeping on 100459/1:

           value  -------------- Distribution ------------ count    
           16384 |                                         0        
           32768 |                                         1        
           65536 |                                         2        
          131072 |                                         4        
          262144 |                                         7        
          524288 |                                         1        
         1048576 |                                         5        
         2097152 |                                         10       
         4194304 |@@@@@@                                   77       
         8388608 |@@@@@@@@@@@@@@@@@@@@@@@                  270      
        16777216 |@@@                                      43       
        33554432 |@                                        20       
        67108864 |@                                        14       
       134217728 |                                         5        
       268435456 |                                         2        
       536870912 |                                         1        
      1073741824 |                                         0

You can also use the sleep and wakeup probes to understand the performance of door servers such as the name service cache daemon, as shown in the following example:

sched:::sleep
/curlwpsinfo->pr_stype == SOBJ_SHUTTLE/
{
        bedtime[curlwpsinfo->pr_addr] = timestamp;
}

sched:::wakeup
/execname == "nscd" && bedtime[args[0]->pr_addr]/
{
        @[stringof(curpsinfo->pr_fname), stringof(args[1]->pr_fname)] =
            quantize(timestamp - bedtime[args[0]->pr_addr]);
        bedtime[args[0]->pr_addr] = 0;
}

sched:::wakeup
/bedtime[args[0]->pr_addr]/
{
        bedtime[args[0]->pr_addr] = 0;
}

The tail of the output from running the preceding script on a large mail server resembles the following example:

imapd
           value  -------------- Distribution ------------ count    
           16384 |                                         0        
           32768 |                                         2        
           65536 |@@@@@@@@@@@@@@@@@                        57       
          131072 |@@@@@@@@@@@                              37       
          262144 |                                         3        
          524288 |@@@                                      11       
         1048576 |@@@                                      10       
         2097152 |@@                                       9        
         4194304 |                                         1        
         8388608 |                                         0        

  mountd                                            
           value  -------------- Distribution ------------ count    
           65536 |                                         0        
          131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@            49       
          262144 |@@@                                      6        
          524288 |                                         1        
         1048576 |                                         0        
         2097152 |                                         0        
         4194304 |@@@@                                     7        
         8388608 |@                                        3        
        16777216 |                                         0        

  sendmail
           value  -------------- Distribution ------------ count    
           16384 |                                         0        
           32768 |@                                        18       
           65536 |@@@@@@@@@@@@@@@@@                        205      
          131072 |@@@@@@@@@@@@@                            154      
          262144 |@                                        23       
          524288 |                                         5        
         1048576 |@@@@                                     50       
         2097152 |                                         7        
         4194304 |                                         5        
         8388608 |                                         2        
        16777216 |                                         0        

  automountd                                        
           value  -------------- Distribution ------------ count    
           32768 |                                         0        
           65536 |@@@@@@@@@@                               22       
          131072 |@@@@@@@@@@@@@@@@@@@@@@@                  51       
          262144 |@@                                       6        
          524288 |                                         1        
         1048576 |                                         0        
         2097152 |                                         2        
         4194304 |                                         2        
         8388608 |                                         1        
        16777216 |                                         1        
        33554432 |                                         1        
        67108864 |                                         0        
       134217728 |                                         0        
       268435456 |                                         1        
       536870912 |                                         0

You might be interested in the unusual data points for automountd or the persistent data point at over one millisecond for sendmail. You can add additional predicates to the preceding script to hone in on the causes of any exceptional or anomalous.