11.8.3.2 enqueue and dequeue Probes

You might want to know on which CPUs processes and threads are waiting to run. You can use the enqueue probe along with the dequeue probe to answer this question. Type the following source code and save it in a file named qtime.d:

sched:::enqueue
{
  a[args[0]->pr_lwpid, args[1]->pr_pid, args[2]->cpu_id] =
  timestamp;
}

sched:::dequeue
/a[args[0]->pr_lwpid, args[1]->pr_pid, args[2]->cpu_id]/
{
  @[args[2]->cpu_id] = quantize(timestamp -
    a[args[0]->pr_lwpid, args[1]->pr_pid, args[2]->cpu_id]);
  a[args[0]->pr_lwpid, args[1]->pr_pid, args[2]->cpu_id] = 0;
}

Running the previous script for several seconds results in output that is similar to the following:

# dtrace -s qtime.d 
dtrace: script 'qtime.d' matched 16 probes
^C

        1
           value  ------------- Distribution ------------- count    
            8192 |                                         0        
           16384 |                                         1        
           32768 |@                                        47       
           65536 |@@@@@@@                                  365      
          131072 |@@@@@@@@@@@@                             572      
          262144 |@@@@@@@@@@@@                             570      
          524288 |@@@@@@@                                  354      
         1048576 |@                                        57       
         2097152 |                                         7        
         4194304 |                                         1        
         8388608 |                                         1        
        16777216 |                                         0        

        0
           value  ------------- Distribution ------------- count    
            8192 |                                         0        
           16384 |                                         6        
           32768 |@                                        49       
           65536 |@@@@@                                    261      
          131072 |@@@@@@@@@@@@@                            753      
          262144 |@@@@@@@@@@@@                             704      
          524288 |@@@@@@@@                                 455      
         1048576 |@                                        74       
         2097152 |                                         9        
         4194304 |                                         2        
         8388608 |                                         0

Rather than looking at wait times, you might want to examine the length of the run queue over time. Using the enqueue and dequeue probes, you can set up an associative array to track the queue length. Type the following source code and save it in a file named qlen.d:

sched:::enqueue
{
  this->len = qlen[args[2]->cpu_id]++;
  @[args[2]->cpu_id] = lquantize(this->len, 0, 100);
}

sched:::dequeue
/qlen[args[2]->cpu_id]/
{
  qlen[args[2]->cpu_id]--;
}

Running the previous script on a largely idle dual-core processor system for approximately 30 seconds results in output that is similar to the following:

# dtrace -s qlen.d 
dtrace: script 'qlen.d' matched 16 probes
^C

        1
           value  ------------- Distribution ------------- count    
             < 0 |                                         0        
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        8124     
               1 |@@@@@@                                   1558     
               2 |@                                        160      
               3 |                                         51       
               4 |                                         24       
               5 |                                         13       
               6 |                                         11       
               7 |                                         9        
               8 |                                         6        
               9 |                                         0        

        0
           value  ------------- Distribution ------------- count    
             < 0 |                                         0        
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           8569     
               1 |@@@@@@@@@                                2429     
               2 |@                                        292      
               3 |                                         25       
               4 |                                         8        
               5 |                                         5        
               6 |                                         4        
               7 |                                         4        
               8 |                                         1        
               9 |                                         0

The output is roughly what you would expect for an idle system: the majority of the time that a runnable thread is enqueued, the run queues were very short (three or fewer threads in length). However, given that the system was largely idle, the exceptional data points at the bottom of each table might be unexpected. For example, why were the run queues as long as 8 runnable threads? To explore this question further, you could write a D script that displays the contents of the run queue when the length of the run queue is long. This problem is complicated because D enablings cannot iterate over data structures, and therefore cannot simply iterate over the entire run queue. Even if D enablings could do so, you should avoid dependencies on the kernel's internal data structures.

For this type of script, you would enable the enqueue and dequeue probes and then use both speculations and associative arrays. Whenever a thread is enqueued, the script increments the length of the queue and records the timestamp in an associative array keyed by the thread. You cannot use a thread-local variable in this case because a thread might be enqueued by another thread. The script then checks to see if the queue length exceeds the maximum, and if so, the script starts a new speculation, and records the timestamp and the new maximum. Then, when a thread is dequeued, the script compares the enqueue timestamp to the timestamp of the longest length: if the thread was enqueued before the timestamp of the longest length, the thread was in the queue when the longest length was recorded. In this case, the script speculatively traces the thread's information. When the kernel dequeues the last thread that was enqueued at the timestamp of the longest length, the script commits the speculation data. Type the following source code and save it in a file named whoqueue.d:

#pragma D option quiet
#pragma D option nspec=4
#pragma D option specsize=100k

int maxlen;
int spec[int];
sched:::enqueue
{
  this->len = ++qlen[this->cpu = args[2]->cpu_id];
  in[args[0]->pr_addr] = timestamp;
}

sched:::enqueue
/this->len > maxlen && spec[this->cpu]/
{
  /*
   * There is already a speculation for this CPU. We just set a new
   * record, so we’ll discard the old one.
   */
  discard(spec[this->cpu]);
}

sched:::enqueue
/this->len > maxlen/
{
  /*
   * We have a winner. Set the new maximum length and set the timestamp
   * of the longest length.
   */
  maxlen = this->len;
  longtime[this->cpu] = timestamp;
  /*
   * Now start a new speculation, and speculatively trace the length.
   */
  this->spec = spec[this->cpu] = speculation();
  speculate(this->spec);
  printf("Run queue of length %d:\n", this->len);
}

sched:::dequeue
/(this->in = in[args[0]->pr_addr]) &&
  this->in <= longtime[this->cpu = args[2]->cpu_id]/
{
  speculate(spec[this->cpu]);
  printf(" %d/%d (%s)\n",
    args[1]->pr_pid, args[0]->pr_lwpid,
    stringof(args[1]->pr_fname));
}

sched:::dequeue
/qlen[args[2]->cpu_id]/
{
  in[args[0]->pr_addr] = 0;
  this->len = --qlen[args[2]->cpu_id];
}

sched:::dequeue
/this->len == 0 && spec[this->cpu]/
{
  /*
   * We just processed the last thread that was enqueued at the time
   * of longest length; commit the speculation, which by now contains
   * each thread that was enqueued when the queue was longest.
   */
  commit(spec[this->cpu]);
  spec[this->cpu] = 0;
}

Running the previous script on the same system results in output that is similar to the following:

# dtrace -s whoqueue.d
Run queue of length 1:
 2850/2850 (java)
Run queue of length 2:
 4034/4034 (kworker/0:1)
 16/16 (sync_supers)
Run queue of length 3:
 10/10 (ksoftirqd/1)
 1710/1710 (hald-addon-inpu)
 25350/25350 (dtrace)
Run queue of length 4:
 2852/2852 (java)
 2850/2850 (java)
 1710/1710 (hald-addon-inpu)
 2099/2099 (Xorg)
Run queue of length 5:
 3149/3149 (notification-da)
 2417/2417 (gnome-settings-)
 2437/2437 (gnome-panel)
 2461/2461 (wnck-applet)
 2432/2432 (metacity)
Run queue of length 9:
 3685/3685 (firefox)
 3149/3149 (notification-da)
 2417/2417 (gnome-settings-)
 2437/2437 (gnome-panel)
 2852/2852 (java)
 2452/2452 (nautilus)
 2461/2461 (wnck-applet)
 2432/2432 (metacity)
 2749/2749 (gnome-terminal)
^C