Solaris Dynamic Tracing Guide

enqueue and dequeue

When a CPU becomes idle, the dispatcher looks for work enqueued on other (non-idle) CPUs. The following example uses the dequeue probe to understand how often applications are transferred and by which CPU:

#pragma D option quiet

sched:::dequeue
/args[2]->cpu_id != --1 && cpu != args[2]->cpu_id &&
    (curlwpsinfo->pr_flag & PR_IDLE)/
{
	@[stringof(args[1]->pr_fname), args[2]->cpu_id] =
	    lquantize(cpu, 0, 100);
}
END
{
	printa("%s stolen from CPU %d by:\n%@d\n", @);
}

The tail of the output from running the above script on a 4 CPU system results in output similar to the following example:


# dtrace -s ./whosteal.d
^C
...
 nscd stolen from CPU 1 by:

           value  -------------- Distribution ------------ count    
               1 |                                         0        
               2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 28       
               3 |                                         0        

snmpd stolen from CPU 1 by:

           value  -------------- Distribution ------------ count    
             < 0 |                                         0        
               0 |@                                        1        
               1 |                                         0        
               2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     31       
               3 |@@                                       2        
               4 |                                         0        

sched stolen from CPU 1 by:

           value  -------------- Distribution ------------ count    
             < 0 |                                         0        
               0 |@@                                       3        
               1 |                                         0        
               2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         36       
               3 |@@@@                                     5        
               4 |                                         0

Instead of knowing which CPUs took which work, you might want to know the CPUs on which processes and threads are waiting to run. You can use the enqueue and dequeue probes together to answer this question:

sched:::enqueue
{
	self->ts = timestamp;
}

sched:::dequeue
/self->ts/
{
	@[args[2]->cpu_id] = quantize(timestamp - self->ts);
	self->ts = 0;
}

Running the above script for several seconds results in output similar to the following example:


# dtrace -s ./qtime.d
dtrace: script './qtime.d' matched 5 probes
^C
       -1
           value  -------------- Distribution ------------ count    
            4096 |                                         0        
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2        
           16384 |                                         0        

        0
           value  -------------- Distribution ------------ count    
            1024 |                                         0        
            2048 |@@@@@@@@@@@@@@@                          262      
            4096 |@@@@@@@@@@@@@                            227      
            8192 |@@@@@                                    87       
           16384 |@@@                                      54       
           32768 |                                         7        
           65536 |                                         9        
          131072 |                                         1        
          262144 |                                         5        
          524288 |                                         4        
         1048576 |                                         2        
         2097152 |                                         0        
         4194304 |                                         0        
         8388608 |                                         0        
        16777216 |                                         1        
        33554432 |                                         2        
        67108864 |                                         2        
       134217728 |                                         0        
       268435456 |                                         0        
       536870912 |                                         0        
      1073741824 |                                         1        
      2147483648 |                                         1        
      4294967296 |                                         0        

        1
           value  -------------- Distribution ------------ count    
            1024 |                                         0        
            2048 |@@@@                                     49       
            4096 |@@@@@@@@@@@@@@@@@@@@                     241      
            8192 |@@@@@@@                                  91       
           16384 |@@@@                                     55       
           32768 |                                         7        
           65536 |                                         3        
          131072 |                                         2        
          262144 |                                         1        
          524288 |                                         0        
         1048576 |                                         0        
         2097152 |                                         0        
         4194304 |                                         0        
         8388608 |                                         0        
        16777216 |                                         0        
        33554432 |                                         3        
        67108864 |                                         1        
       134217728 |                                         4        
       268435456 |                                         2        
       536870912 |                                         0        
      1073741824 |                                         3        
      2147483648 |                                         2        
      4294967296 |                                         0

Notice the non-zero values at the bottom of the example output. These data points reveal several instances on both CPUs where a thread was enqueued to run for several seconds.

Instead of looking at wait times, you might want to examine the length of the run queue over time. Using the enqueue and dequeue probes, you can set up an associative array to track the queue length:

sched:::enqueue
{
	this->len = qlen[args[2]->cpu_id]++;
	@[args[2]->cpu_id] = lquantize(this->len, 0, 100);
}

sched:::dequeue
/qlen[args[2]->cpu_id]/
{
	qlen[args[2]->cpu_id]—;
}

Running the above script for approximately 30 seconds on a largely idle uniprocessor laptop system results in output similar to the following example:


# dtrace -s ./qlen.d
dtrace: script './qlen.d' matched 5 probes
^C
        0
           value  -------------- Distribution ------------ count    
             < 0 |                                         0        
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@                110626   
               1 |@@@@@@@@@                                41142    
               2 |@@                                       12655    
               3 |@                                        5074     
               4 |                                         1722     
               5 |                                         701      
               6 |                                         302      
               7 |                                         63       
               8 |                                         23       
               9 |                                         12       
              10 |                                         24       
              11 |                                         58       
              12 |                                         14       
              13 |                                         3        
              14 |                                         0

The output is roughly what you would expect for an idle system: the majority of the time that a runnable thread is enqueued, the run queue was very short (three or fewer threads in length). However, given that the system was largely idle, the exceptional data points at the bottom of the table might be unexpected. For example, why was the run queue as long as 13 runnable threads? To explore this question, you could write a D script that displays the contents of the run queue when the length of the run queue is long. This problem is complicated because D enablings cannot iterate over data structures, and therefore cannot simply iterate over the entire run queue. Even if D enablings could do so, you should avoid dependencies on the kernel's internal data structures.

For this type of script, you would enable the enqueue and dequeue probes and use both speculations and associative arrays. Whenever a thread is enqueued, the script increments the length of the queue and records the timestamp in an associative array keyed by the thread. You cannot use a thread-local variable in this case because a thread might be enqueued by another thread. The script then checks to see if the queue length exceeds the maximum. If it does, the script starts a new speculation, and records the timestamp and the new maximum. Then, when a thread is dequeued, the script compares the enqueue timestamp to the timestamp of the longest length: if the thread was enqueued before the timestamp of the longest length, the thread was in the queue when the longest length was recorded. In this case, the script speculatively traces the thread's information. Once the kernel dequeues the last thread that was enqueued at the timestamp of the longest length, the script commits the speculation data. This script is shown below:

#pragma D option quiet
#pragma D option nspec=4
#pragma D option specsize=100k

int maxlen;
int spec[int];

sched:::enqueue
{
	this->len = ++qlen[this->cpu = args[2]->cpu_id];
	in[args[0]->pr_addr] = timestamp;
}

sched:::enqueue
/this->len > maxlen && spec[this->cpu]/
{
	/*
	 * There is already a speculation for this CPU.  We just set a new
	 * record, so we'll discard the old one.
	 */
	discard(spec[this->cpu]);
}

sched:::enqueue
/this->len > maxlen/
{
	/*
	 * We have a winner.  Set the new maximum length and set the timestamp
	 * of the longest length.
	 */
	maxlen = this->len;
	longtime[this->cpu] = timestamp;	

	/*
	 * Now start a new speculation, and speculatively trace the length.
	 */
	this->spec = spec[this->cpu] = speculation();
	speculate(this->spec);
	printf("Run queue of length %d:\n", this->len);
}

sched:::dequeue
/(this->in = in[args[0]->pr_addr]) &&
    this->in <= longtime[this->cpu = args[2]->cpu_id]/
{
	speculate(spec[this->cpu]);
	printf("  %d/%d (%s)\n", 
	    args[1]->pr_pid, args[0]->pr_lwpid,
	    stringof(args[1]->pr_fname));
}

sched:::dequeue
/qlen[args[2]->cpu_id]/
{
	in[args[0]->pr_addr] = 0;
	this->len = --qlen[args[2]->cpu_id];
}

sched:::dequeue
/this->len == 0 && spec[this->cpu]/
{
	/*
	 * We just processed the last thread that was enqueued at the time
	 * of longest length; commit the speculation, which by now contains
	 * each thread that was enqueued when the queue was longest.
	 */
	commit(spec[this->cpu]);
	spec[this->cpu] = 0;
}

Running the above script on the same uniprocessor laptop results in output similar to the following example:


# dtrace -s ./whoqueue.d
Run queue of length 3:
 0/0 (sched)
  0/0 (sched)
  101170/1 (dtrace)
Run queue of length 4:
  0/0 (sched)
  100356/1 (Xsun)
  100420/1 (xterm)
  101170/1 (dtrace)
Run queue of length 5:
  0/0 (sched)
  0/0 (sched)
  100356/1 (Xsun)
  100420/1 (xterm)
  101170/1 (dtrace)
Run queue of length 7:
  0/0 (sched)
  100221/18 (nscd)
  100221/17 (nscd)
  100221/16 (nscd)
  100221/13 (nscd)
  100221/14 (nscd)
  100221/15 (nscd)
Run queue of length 16:
  100821/1 (xterm)
  100768/1 (xterm)
  100365/1 (fvwm2)
  101118/1 (xterm)
  100577/1 (xterm)
  101170/1 (dtrace)
  101020/1 (xterm)
  101089/1 (xterm)
  100795/1 (xterm)
  100741/1 (xterm)
  100710/1 (xterm)
  101048/1 (xterm)
  100697/1 (MozillaFirebird-)
  100420/1 (xterm)
  100394/1 (xterm)
  100368/1 (xterm)
^C

The output reveals that the long run queues are due to many runnable xterm processes. This experiment coincided with a change in virtual desktop, and therefore the results are probably due to some sort of X event processing.