sched Provider

Language:

`sched` Provider

The sched provider makes available probes related to CPU scheduling. Because CPUs are the one resource that all threads must consume, the sched provider is useful for understanding systemic behavior. For example, you can use the sched provider to understand when and why threads sleep, run, change priority, or wake other threads.

`sched` Probes

The sched probes are described in the following table.

Table 55 List of sched Probes

Probe	Description
change-pri	Fires whenever a thread's priority is about to be changed. The `lwpsinfo_t` of the thread is pointed to by `args[0]`. The thread's current priority is in the `pr_pri` field of this structure. The `psinfo_t` of the process containing the thread is pointed to by `args[1]`. The thread's new priority is contained in `args[2]`.
cpucaps-sleep	Fires immediately before the current thread is placed on a wait queue. The `lwpsinfo_t` of the waiting thread is pointed to by `args[0]`. The `psinfo_t` of the process containing the waiting thread is pointed to by `args[1]`.
cpucaps-wakeup	Fires immediately after a thread is removed from a wait queue. The `lwpsinfo_t` of the waiting thread is pointed to by `args[0]`. The `psinfo_t` of the process containing the waiting thread is pointed to by `args[1]`.
dequeue	Fires immediately before a runnable thread is dequeued from a run queue. The `lwpsinfo_t` of the thread being dequeued is pointed to by `args[0]`. The `psinfo_t` of the process containing the thread is pointed to by `args[1]`. The `cpuinfo_t` of the CPU from which the thread is being dequeued is pointed to by `args[2]`. If the thread is being dequeued from a run queue that is not associated with a particular CPU, the `cpu_id` member of this structure will be `-1`.
enqueue	Fires immediately before a runnable thread is enqueued to a run queue. The `lwpsinfo_t` of the thread being enqueued is pointed to by `args[0]`. The `psinfo_t` of the process containing the thread is pointed to by `args[1]`. The `cpuinfo_t` of the CPU to which the thread is being enqueued is pointed to by `args[2]`. If the thread is being enqueued from a run queue that is not associated with a particular CPU, the `cpu_id` member of this structure will be -1. The value in `args[3]` is a boolean indicating whether the thread will be enqueued to the front of the run queue. The value is non-zero if the thread will be enqueued at the front of the run queue, and zero if the thread will be enqueued at the back of the run queue.
off-cpu	Fires when the current CPU is about to end execution of a thread. The `curcpu` variable indicates the current CPU. The `curlwpsinfo` variable indicates the thread that is ending execution. The `curpsinfo` variable describes the process containing the current thread. The `lwpsinfo_t` structure of the thread that the current CPU will next execute is pointed to by `args[0]`. The `psinfo_t` of the process containing the next thread is pointed to by `args[1]`.
on-cpu	Fires when a CPU has just begun execution of a thread. The `curcpu` variable indicates the current CPU. The `curlwpsinfo` variable indicates the thread that is beginning execution. The `curpsinfo` variable describes the process containing the current thread.
preempt	Fires immediately before the current thread is preempted. After this probe fires, the current thread will select a thread to run and the `off-cpu` probe will fire for the current thread. In some cases, a thread on one CPU will be preempted, but the preempting thread will run on another CPU in the meantime. In this situation, the preempt probe will fire, but the dispatcher will be unable to find a higher priority thread to run and the `remain-cpu` probe will fire instead of the `off-cpu` probe.
remain-cpu	Fires when a scheduling decision has been made, but the dispatcher has elected to continue to run the current thread. The `curcpu` variable indicates the current CPU. The `curlwpsinfo` variable indicates the thread that is beginning execution. The `curpsinfo` variable describes the process containing the current thread.
schedctl-nopreempt	Fires when a thread is preempted and then re-enqueued at the front of the run queue due to a preemption control request. See schedctl_init(3C) for details on preemption control. As with preempt, either `off-cpu` or `remain-cpu` will fire after `schedctl-nopreempt`. Because `schedctl-nopreempt` denotes a re-enqueuing of the current thread at the front of the run queue, `remain-cpu` is more likely to fire after `schedctl-nopreempt` than `off-cpu`. The `lwpsinfo_t` of the thread being preempted is pointed to by `args[0]`. The `psinfo_t` of the process containing the thread is pointed to by `args[1]`.
schedctl-preempt	Fires when a thread that is using preemption control is nonetheless preempted and re-enqueued at the back of the run queue. See schedctl_init(3C) for details on preemption control. As with preempt, either `off-cpu` or `remain-cpu` will fire after `schedctl-preempt`. Like preempt (and unlike `schedctl-nopreempt`), `schedctl-preempt` denotes a re-enqueuing of the current thread at the back of the run queue. As a result, `off-cpu` is more likely to fire after `schedctl-preempt` than `remain-cpu`. The `lwpsinfo_t` of the thread being preempted is pointed to by `args[0]`. The `psinfo_t` of the process containing the thread is pointed to by `args[1]`.
schedctl-yield	Fires when a thread that had preemption control enabled and its time slice artificially extended executed code to yield the CPU to other threads.
sleep	Fires immediately before the current thread sleeps on a synchronization object. The type of the synchronization object is contained in the `pr_stype` member of the `lwpsinfo_t` pointed to by `curlwpsinfo`. The address of the synchronization object is contained in the `pr_wchan` member of the `lwpsinfo_t` pointed to by `curlwpsinfo`. The meaning of this address is a private implementation detail, but the address value may be treated as a token unique to the synchronization object.
surrender	Fires when a CPU has been instructed by another CPU to make a scheduling decision – often because a higher-priority thread has become runnable.
tick	Fires as a part of clock tick-based accounting. In clock tick-based accounting, CPU accounting is performed by examining which threads and processes are running when a fixed-interval interrupt fires. The `lwpsinfo_t` that corresponds to the thread that is being assigned CPU time is pointed to by `args[0]`. The `psinfo_t` that corresponds to the process that contains the thread is pointed to by `args[1]`.
wakeup	Fires immediately before the current thread wakes a thread sleeping on a synchronization object. The `lwpsinfo_t` of the sleeping thread is pointed to by `args[0]`. The `psinfo_t` of the process containing the sleeping thread is pointed to by `args[1]`. The type of the synchronization object is contained in the `pr_stype` member of the `lwpsinfo_t` of the sleeping thread. The address of the synchronization object is contained in the `pr_wchan` member of the `lwpsinfo_t` of the sleeping thread. The meaning of this address is a private implementation detail, but the address value may be treated as a token unique to the synchronization object.

`sched` Probe Arguments

The argument types for the sched probes are listed in the following table; the arguments are described in Figure 55, Table 55, List of sched Probes.

Table 56 List of sched Probe Arguments

Probe	args[0]	args[1]	args[2]	args[3]
change-pri	lwpsinfo_t *	psinfo_t *	pri_t	-
cpucaps-sleep	lwpsinfo_t *	psinfo_t *	-	-
cpucaps-wakeup-	lwpsinfo_t *	psinfo_t *	-	-
dequeue	lwpsinfo_t *	psinfo_t *	cpuinfo_t *	-
enqueue	lwpsinfo_t *	psinfo_t *	cpuinfo_t *	int
off-cpu	lwpsinfo_t *	psinfo_t *	-	-
on-cpu	-	-	-	-
preempt	-	-	-	-
remain-cpu	-	-	-	-
schedctl-nopreempt	lwpsinfo_t *	psinfo_t *	-	-
schedctl-preempt	lwpsinfo_t *	psinfo_t *	-	-
schedctl-yield	lwpsinfo_t *	psinfo_t *	-	-
sleep	-	-	-	-
surrender	lwpsinfo_t *	psinfo_t *	-	-
tick	lwpsinfo_t *	psinfo_t *	-	-
wakeup	lwpsinfo_t *	psinfo_t *	-	-

Many sched probes have arguments consisting of a pointer to an lwpsinfo_t and a pointer to a psinfo_t, indicating a thread and the process containing the thread, respectively. These structures are described in detail in lwpsinfo_t and psinfo_t, respectively. For more information, see Figure 56, Table 56, List of sched Probe Arguments.

`sched cpuinfo_t` Structure

The cpuinfo_t structure defines a CPU. Arguments to both the enqueue and dequeue probes include a pointer to a cpuinfo_t. Additionally, the cpuinfo_t corresponding to the current CPU is pointed to by the curcpu variable. The definition of the cpuinfo_t structure is as follows:

typedef struct cpuinfo {
        processorid_t cpu_id;           /* CPU identifier */
        psetid_t cpu_pset;              /* processor set identifier */
        chipid_t cpu_chip;              /* chip identifier */
        lgrp_id_t cpu_lgrp;             /* locality group identifer */
        processor_info_t cpu_info;      /* CPU information */
} cpuinfo_t;

The cpu_id member is the processor identifier, as returned by psrinfo and p_online. For more information, see psrinfo(1M) and p_online(2) man pages.

The cpu_pset member is the processor set that contains the CPU, if any. See psrset(1M) for more details on processor sets.

The cpu_chip member is the identifier of the physical chip. Physical chips may contain several CPUs. For more information, see the psrinfo(1M) man page.

The cpu_lgrp member is the identifier of the latency group associated with the CPU. See liblgrp(3LIB) for details on latency groups.

The cpu_info member is the processor_info_t structure associated with the CPU, as returned by processor_info. For more information, see the processor_info(2) man page.

Using the `sched` Provider

This section describes sched provider examples.

`sched on-cpu` and `off-cpu` Probes

One common question you might want answered is which CPUs are running threads and for how long. You can use the on-cpu and off-cpu probes to easily answer this question on a system-wide basis as shown in the following example:

sched:::on-cpu
{
        self->ts = timestamp;
}

sched:::off-cpu
/self->ts/
{
        @[cpu] = quantize(timestamp - self->ts);
        self->ts = 0;
}

Running the preceding script results in output similar to the following example:

# dtrace -s ./where.d
dtrace: script './where.d' matched 5 probes
^C

        0
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@@                                       37       
            8192 |@@@@@@@@@@@@@                            212      
           16384 |@                                        30       
           32768 |                                         10       
           65536 |@                                        17       
          131072 |                                         12       
          262144 |                                         9        
          524288 |                                         6        
         1048576 |                                         5        
         2097152 |                                         1        
         4194304 |                                         3        
         8388608 |@@@@                                     75       
        16777216 |@@@@@@@@@@@@                             201      
        33554432 |                                         6        
        67108864 |                                         0        

        1
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@                                        6        
            8192 |@@@@                                     23       
           16384 |@@@                                      18       
           32768 |@@@@                                     22       
           65536 |@@@@                                     22       
          131072 |@                                        7        
          262144 |                                         5        
          524288 |                                         2        
         1048576 |                                         3        
         2097152 |@                                        9        
         4194304 |                                         4        
         8388608 |@@@                                      18       
        16777216 |@@@                                      19       
        33554432 |@@@                                      16       
        67108864 |@@@@                                     21       
       134217728 |@@                                       14       
       268435456 |                                         0

The preceding output shows that on CPU 1 threads tend to run for less than 100 microseconds at a stretch, or for approximately 10 milliseconds. A noticeable gap between the two clusters of data is shown in the histogram. You also might be interested in knowing which CPUs are running a particular process. You can use the on-cpu and off-cpu probes for answering this question as well. The following script displays which CPUs run a specified application over a period of ten seconds:

#pragma D option quiet

dtrace:::BEGIN
{
        start = timestamp;
}

sched:::on-cpu
/execname == $$1/
{
        self->ts = timestamp;
}

sched:::off-cpu
/self->ts/
{
        @[cpu] = sum(timestamp - self->ts);
        self->ts = 0;
}

profile:::tick-1sec
/++x == 10/
{
        exit(0);
}

dtrace:::END
{
        printf("CPU distribution over %d seconds:\n\n",
            (timestamp - start) / 1000000000);
        printf("CPU microseconds\n--- ------------\n");
        normalize(@, 1000);
        printa("%3d %@d\n", @);
}

Running the preceding script on a large mail server and specifying the IMAP daemon results in output similar to the following example:

# dtrace -s ./whererun.d imapd
CPU distribution of imapd over 10 seconds:

CPU microseconds
--- ------------
 15 10102
 12 16377
 21 25317
 19 25504
 17 35653
 13 41539
 14 46669
 20 57753
 22 70088
 16 115860
 23 127775
 18 160517

Oracle Solaris takes into account the amount of time that a thread has been sleeping when selecting a CPU on which to run the thread: a thread that has been sleeping for less time tends not to migrate. You can use the off-cpu and on-cpu probes to observe this behavior:

sched:::off-cpu
/curlwpsinfo->pr_state == SSLEEP/
{
        self->cpu = cpu;
        self->ts = timestamp;
}

sched:::on-cpu
/self->ts/
{
        @[self->cpu == cpu ?
            "sleep time, no CPU migration" : "sleep time, CPU migration"] =
            lquantize((timestamp - self->ts) / 1000000, 0, 500, 25);
        self->ts = 0;
        self->cpu = 0;
}

Running the preceding script for approximately 30 seconds results in output similar to the following example:

# dtrace -s ./howlong.d
dtrace: script './howlong.d' matched 5 probes
^C
 sleep time, CPU migration                         
           value  -------------- Distribution ------------ count    
             < 0 |                                         0        
               0 |@@@@@@@                                  6838     
              25 |@@@@@                                    4714     
              50 |@@@                                      3108     
              75 |@                                        1304     
             100 |@                                        1557     
             125 |@                                        1425     
             150 |                                         894      
             175 |@                                        1526     
             200 |@@                                       2010     
             225 |@@                                       1933     
             250 |@@                                       1982     
             275 |@@                                       2051     
             300 |@@                                       2021     
             325 |@                                        1708     
             350 |@                                        1113     
             375 |                                         502      
             400 |                                         220      
             425 |                                         106      
             450 |                                         54       
             475 |                                         40       
          >= 500 |@                                        1716     

  sleep time, no CPU migration                      
           value  -------------- Distribution ------------ count    
             < 0 |                                         0        
               0 |@@@@@@@@@@@@                             58413    
              25 |@@@                                      14793    
              50 |@@                                       10050    
              75 |                                         3858     
             100 |@                                        6242     
             125 |@                                        6555     
             150 |                                         3980     
             175 |@                                        5987     
             200 |@                                        9024     
             225 |@                                        9070     
             250 |@@                                       10745    
             275 |@@                                       11898    
             300 |@@                                       11704    
             325 |@@                                       10846    
             350 |@                                        6962     
             375 |                                         3292     
             400 |                                         1713     
             425 |                                         585      
             450 |                                         201      
             475 |                                         96       
          >= 500 |                                         3946

The example output shows that there are many more occurrences of non-migration than migration. Also, when sleep times are longer, migrations are more likely. The distributions are noticeably different in the sub 100 millisecond range, but look very similar as the sleep times get longer. This result would seem to indicate that sleep time is not factored into the scheduling decision once a certain threshold is exceeded.

The final example using off-cpu and on-cpu shows how to use these probes along with the pr_stype field to determine why threads sleep and for how long:

sched:::off-cpu
/curlwpsinfo->pr_state == SSLEEP/
{
        /*
         * In sleeping state.  Track the sobj type.
         */
        self->sobj = curlwpsinfo->pr_stype;
        self->bedtime = timestamp;
}

sched:::off-cpu
/curlwpsinfo->pr_state == SRUN/
{
        self->bedtime = timestamp;
}

sched:::on-cpu
/self->bedtime && !self->sobj/
{
        @["preempted"] = quantize(timestamp - self->bedtime);
        self->bedtime = 0;
}

sched:::on-cpu
/self->sobj/
{
        @[self->sobj == SOBJ_MUTEX ? "kernel-level lock" :
            self->sobj == SOBJ_RWLOCK ? "rwlock" :
            self->sobj == SOBJ_CV ? "condition variable" :
            self->sobj == SOBJ_SEMA ? "semaphore" :
            self->sobj == SOBJ_USER ? "user-level lock" :
            self->sobj == SOBJ_USER_PI ? "user-level prio-inheriting lock" :
            self->sobj == SOBJ_SHUTTLE ? "shuttle" : "unknown"] =
            quantize(timestamp - self->bedtime);
        self->sobj = 0;
        self->bedtime = 0;
}

Running the preceding script for several seconds results in output similar to the following example:

# dtrace -s ./whatfor.d
dtrace: script './whatfor.d' matched 12 probes
^C
 kernel-level lock
           value  -------------- Distribution ------------ count    
           16384 |                                         0        
           32768 |@@@@@@@@                                 3        
           65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@            11       
          131072 |@@                                       1        
          262144 |                                         0        

  preempted
           value  -------------- Distribution ------------ count    
           16384 |                                         0        
           32768 |                                         4        
           65536 |@@@@@@@@                                 408      
          131072 |@@@@@@@@@@@@@@@@@@@@@@                   1031     
          262144 |@@@                                      156      
          524288 |@@                                       116      
         1048576 |@                                        51       
         2097152 |                                         42       
         4194304 |                                         16       
         8388608 |                                         15       
        16777216 |                                         4        
        33554432 |                                         8        
        67108864 |                                         0        

  semaphore
           value  -------------- Distribution ------------ count    
           32768 |                                         0        
           65536 |@@                                       61       
          131072 |@@@@@@@@@@@@@@@@@@@@@@@@                 553      
          262144 |@@                                       63       
          524288 |@                                        36       
         1048576 |                                         7        
         2097152 |                                         22       
         4194304 |@                                        44       
         8388608 |@@@                                      84       
        16777216 |@                                        36       
        33554432 |                                         3        
        67108864 |                                         6        
       134217728 |                                         0        
       268435456 |                                         0        
       536870912 |                                         0        
      1073741824 |                                         0        
      2147483648 |                                         0        
      4294967296 |                                         0        
      8589934592 |                                         0        
     17179869184 |                                         1        
     34359738368 |                                         0        

  shuttle                                           
           value  -------------- Distribution ------------ count    
           32768 |                                         0        
           65536 |@@@@@                                    2        
          131072 |@@@@@@@@@@@@@@@@                         6        
          262144 |@@@@@                                    2        
          524288 |                                         0        
         1048576 |                                         0        
         2097152 |                                         0        
         4194304 |@@@@@                                    2        
         8388608 |                                         0        
        16777216 |                                         0        
        33554432 |                                         0        
        67108864 |                                         0        
       134217728 |                                         0        
       268435456 |                                         0        
       536870912 |                                         0        
      1073741824 |                                         0        
      2147483648 |                                         0        
      4294967296 |@@@@@                                    2        
      8589934592 |                                         0        
     17179869184 |@@                                       1        
     34359738368 |                                         0        

  condition variable                                
           value  -------------- Distribution ------------ count    
           32768 |                                         0        
           65536 |                                         122      
          131072 |@@@@@                                    1579     
          262144 |@                                        340      
          524288 |                                         268      
         1048576 |@@@                                      1028     
         2097152 |@@@                                      1007     
         4194304 |@@@                                      1176     
         8388608 |@@@@                                     1257     
        16777216 |@@@@@@@@@@@@@@                           4385     
        33554432 |                                         295      
        67108864 |                                         157      
       134217728 |                                         96       
       268435456 |                                         48       
       536870912 |                                         144      
      1073741824 |                                         10       
      2147483648 |                                         22       
      4294967296 |                                         18       
      8589934592 |                                         5        
     17179869184 |                                         6        
     34359738368 |                                         4        
     68719476736 |                                         0

`sched` CPU Examples

When a CPU becomes idle, the dispatcher looks for work enqueued on other non-idle CPUs. The following example uses the dequeue probe to understand how often applications are transferred and by which CPU:

#pragma D option quiet

sched:::dequeue
/args[2]->cpu_id != -1 && cpu != args[2]->cpu_id &&
    (curlwpsinfo->pr_flag & PR_IDLE)/
{
        @[stringof(args[1]->pr_fname), args[2]->cpu_id] =
            lquantize(cpu, 0, 100);
}

END
{
        printa("%s stolen from CPU %d by:\n%@d\n", @);
}

The tail of the output from running the preceding script on a four CPU system results in output similar to the following example:

# dtrace -s ./whosteal.d
^C
...
 nscd stolen from CPU 1 by:

           value  -------------- Distribution ------------ count    
               1 |                                         0        
               2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 28       
               3 |                                         0        

snmpd stolen from CPU 1 by:

           value  -------------- Distribution ------------ count    
             < 0 |                                         0        
               0 |@                                        1        
               1 |                                         0        
               2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     31       
               3 |@@                                       2        
               4 |                                         0        

sched stolen from CPU 1 by:

           value  -------------- Distribution ------------ count    
             < 0 |                                         0        
               0 |@@                                       3        
               1 |                                         0        
               2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         36       
               3 |@@@@                                     5        
               4 |                                         0

Instead of knowing which CPUs took which work, you might want to know the CPUs on which processes and threads are waiting to run. You can use the enqueue and dequeue probes together to answer this question:

sched:::enqueue
{
      a[args[0]->pr_lwpid, args[1]->pr_pid, args[2]->cpu_id] =
          timestamp;
}

sched:::dequeue
/a[args[0]->pr_lwpid, args[1]->pr_pid, args[2]->cpu_id]/
{
      @[args[2]->cpu_id] = quantize(timestamp -
          a[args[0]->pr_lwpid, args[1]->pr_pid, args[2]->cpu_id]);
      a[args[0]->pr_lwpid, args[1]->pr_pid, args[2]->cpu_id] = 0;
}

Running the preceding script for several seconds results in output similar to the following example:

# dtrace -s ./qtime.d
dtrace: script './qtime.d' matched 5 probes
^C

        1
           value  ------------- Distribution ------------- count    
            1024 |                                         0        
            2048 |                                         10       
            4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@               4316     
            8192 |@@@@@@@                                  1115     
           16384 |@@@                                      549      
           32768 |@@                                       337      
           65536 |@@                                       330      
          131072 |                                         13       
          262144 |                                         6        
          524288 |                                         4        
         1048576 |                                         2        
         2097152 |                                         1        
         4194304 |                                         0        

        0
           value  ------------- Distribution ------------- count    
            1024 |                                         0        
            2048 |                                         22       
            4096 |@@@@@@@@@@@@@@@@                         2747     
            8192 |@@@@@@@                                  1205     
           16384 |@@@@@@@@@@@@                             1942     
           32768 |@@@                                      469      
           65536 |@@                                       304      
          131072 |                                         28       
          262144 |                                         16       
          524288 |                                         5        
         1048576 |                                         1        
         2097152 |                                         2        
         4194304 |                                         1        
         8388608 |                                         0

Instead of looking at wait times, you might want to examine the length of the run queue over time. Using the enqueue and dequeue probes, you can set up an associative array to track the queue length:

sched:::enqueue
{
        this->len = qlen[args[2]->cpu_id]++;
        @[args[2]->cpu_id] = lquantize(this->len, 0, 100);
}

sched:::dequeue
/qlen[args[2]->cpu_id]/
{
        qlen[args[2]->cpu_id]--;
}

Running the preceding script for approximately 30 seconds on a largely idle uniprocessor system results in output similar to the following example:

# dtrace -s ./qlen.d
dtrace: script './qlen.d' matched 5 probes
^C
        0
           value  -------------- Distribution ------------ count    
             < 0 |                                         0        
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@                110626   
               1 |@@@@@@@@@                                41142    
               2 |@@                                       12655    
               3 |@                                        5074     
               4 |                                         1722     
               5 |                                         701      
               6 |                                         302      
               7 |                                         63       
               8 |                                         23       
               9 |                                         12       
              10 |                                         24       
              11 |                                         58       
              12 |                                         14       
              13 |                                         3        
              14 |                                         0

The output is roughly what you would expect for an idle system: the majority of the time that a runnable thread is enqueued, the run queue was very short, three or fewer threads in length. However, given that the system was largely idle, the exceptional data points at the bottom of the table might be unexpected. For example, why was the run queue as long as 13 runnable threads? To explore this question, you could write a D script that displays the contents of the run queue when the length of the run queue is long. This problem is complicated because D enablings cannot iterate over data structures, and therefore cannot simply iterate over the entire run queue. Even if D enablings could do so, you should avoid dependencies on the kernel's internal data structures.

For this type of script, you would enable the enqueue and dequeue probes and use both speculations and associative arrays. Whenever a thread is enqueued, the script increments the length of the queue and records the timestamp in an associative array keyed by the thread. You cannot use a thread-local variable in this case because a thread might be enqueued by another thread. The script then checks to see if the queue length exceeds the maximum. If it does, the script starts a new speculation, and records the timestamp and the new maximum. Then, when a thread is dequeued, the script compares the enqueue timestamp to the timestamp of the longest length: if the thread was enqueued before the timestamp of the longest length, the thread was in the queue when the longest length was recorded. In this case, the script speculatively traces the thread's information. Once the kernel dequeues the last thread that was enqueued at the timestamp of the longest length, the script commits the speculation data. This script is shown below:

#pragma D option quiet
#pragma D option nspec=4
#pragma D option specsize=100k

int maxlen;
int spec[int];

sched:::enqueue
{
        this->len = ++qlen[this->cpu = args[2]->cpu_id];
        in[args[0]->pr_addr] = timestamp;
}

sched:::enqueue
/this->len > maxlen && spec[this->cpu]/
{
        /*
         * There is already a speculation for this CPU.  As a new record 
         * is set, discard the old one.
         */
        discard(spec[this->cpu]);
}

sched:::enqueue
/this->len > maxlen/
{
        /*
         * You have a winner.  Set the new maximum length and set the timestamp
         * of the longest length.
         */
        maxlen = this->len;
        longtime[this->cpu] = timestamp;        
        /*
         * Now start a new speculation, and speculatively trace the length.
         */
        this->spec = spec[this->cpu] = speculation();
        speculate(this->spec);
        printf("Run queue of length %d:\n", this->len);
}

sched:::dequeue
/(this->in = in[args[0]->pr_addr]) &&
    this->in <= longtime[this->cpu = args[2]->cpu_id]/
{
        speculate(spec[this->cpu]);
        printf("  %d/%d (%s)\n", 
            args[1]->pr_pid, args[0]->pr_lwpid,
            stringof(args[1]->pr_fname));
}

sched:::dequeue
/qlen[args[2]->cpu_id]/
{
        in[args[0]->pr_addr] = 0;
        this->len = --qlen[args[2]->cpu_id];
}

sched:::dequeue
/this->len == 0 && spec[this->cpu]/
{
        /*
         * As you just processed the last thread that was enqueued at the time
         * of longest length; commit the speculation, which by now contains
         * each thread that was enqueued when the queue was longest.
         */
        commit(spec[this->cpu]);
        spec[this->cpu] = 0;
}

Running the preceding script on the same uniprocessor results in output similar to the following example:

# dtrace -s ./whoqueue.d
Run queue of length 3:
 0/0 (sched)
  0/0 (sched)
  101170/1 (dtrace)
Run queue of length 4:
  0/0 (sched)
  100356/1 (Xsun)
  100420/1 (xterm)
  101170/1 (dtrace)
Run queue of length 5:
  0/0 (sched)
  0/0 (sched)
  100356/1 (Xsun)
  100420/1 (xterm)
  101170/1 (dtrace)
Run queue of length 7:
  0/0 (sched)
  100221/18 (nscd)
  100221/17 (nscd)
  100221/16 (nscd)
  100221/13 (nscd)
  100221/14 (nscd)
  100221/15 (nscd)
Run queue of length 16:
  100821/1 (xterm)
  100768/1 (xterm)
  100365/1 (fvwm2)
  101118/1 (xterm)
  100577/1 (xterm)
  101170/1 (dtrace)
  101020/1 (xterm)
  101089/1 (xterm)
  100795/1 (xterm)
  100741/1 (xterm)
  100710/1 (xterm)
  101048/1 (xterm)
  100697/1 (MozillaFirebird-)
  100420/1 (xterm)
  100394/1 (xterm)
  100368/1 (xterm)
^C

The output reveals that the long run queues are due to many runnable xterm processes. This experiment coincided with a change in virtual desktop, and therefore the results are probably due to some sort of X event processing.

`sleep` and `wakeup`

In enqueue and dequeue, the final example demonstrated that a burst in run queue length was due to runnable xterm processes. One hypothesis is that the observations resulted from a change in virtual desktop. You can use the wakeup probe to explore this hypothesis by determining who is waking the xterm processes, and when, as shown in the following example:

#pragma D option quiet

dtrace:::BEGIN
{
        start = timestamp;
}

sched:::wakeup
/stringof(args[1]->pr_fname) == "xterm"/
{
        @[execname] = lquantize((timestamp - start) / 1000000000, 0, 10);
}

profile:::tick-1sec
/++x == 10/
{
        exit(0);
}

To investigate the hypothesis, run the preceding script, waiting roughly five seconds, and switch your virtual desktop exactly once. If the burst of runnable xterm processes is due to switching the virtual desktop, the output should show a burst of wakeup activity at the five second mark.

# dtrace -s ./xterm.d

  Xsun

           value  -------------- Distribution ------------ count    
               4 |                                         0        
               5 |@                                        1        
               6 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   32       
               7 |                                         0

The output does show that the X server is waking xterm processes, clustered around the time that you switched virtual desktops. If you wanted to understand the interaction between the X server and the xterm processes, you could aggregate on user stack traces when the X server fires the wakeup probe.

Understanding the performance of client/server systems like the X windowing system requires understanding the clients on whose behalf the server is doing work. This kind of question is difficult to answer with conventional performance analysis tools. However, if you have a model where a client sends a message to the server and sleeps pending the server's processing, you can use the wakeup probe to determine the client for whom the request is being performed, as shown in the following example:

self int last;

sched:::wakeup
/self->last && args[0]->pr_stype == SOBJ_CV/
{
        @[stringof(args[1]->pr_fname)] = sum(vtimestamp - self->last);
        self->last = 0;
}

sched:::wakeup
/execname == "Xsun" && self->last == 0/
{
        self->last = vtimestamp;
}

Running the preceding script results in output similar to the following example:

dtrace -s ./xwork.d
dtrace: script './xwork.d' matched 14 probes
^C
  xterm                                                       9522510
  soffice.bin                                                 9912594
  fvwm2                                                     100423123
  MozillaFirebird                                           312227077
  acroread                                                  345901577

This output reveals that much Xsun work is being done on behalf of the processes acroread, MozillaFirebird and, to a lesser degree, fvwm2. Notice that the script only examined wakeups from condition variable synchronization objects SOBJ_CV. As described in Figure 50, Table 50, proc pr_stype Values, condition variables are the type of synchronization object typically used to synchronize for reasons other than access to a shared data region. In the case of the X server, a client will wait for data in a pipe by sleeping on a condition variable.

You can additionally use the sleep probe along with the wakeup probe to understand which applications are blocking on which applications, and for how long, as shown in the following example:

#pragma D option quiet

sched:::sleep
/!(curlwpsinfo->pr_flag & PR_ISSYS) && curlwpsinfo->pr_stype == SOBJ_CV/
{
        bedtime[curlwpsinfo->pr_addr] = timestamp;
}

sched:::wakeup
/bedtime[args[0]->pr_addr]/
{
        @[stringof(args[1]->pr_fname), execname] =
            quantize(timestamp - bedtime[args[0]->pr_addr]);
        bedtime[args[0]->pr_addr] = 0;
}

END
{
        printa("%s sleeping on %s:\n%@d\n", @);
}

The tail of the output from running the example script for several seconds on a system resembles the following example:

# dtrace -s ./whofor.d
^C
...
 xterm sleeping on Xsun:

           value  -------------- Distribution ------------ count    
          131072 |                                         0        
          262144 |                                         12       
          524288 |                                         2        
         1048576 |                                         0        
         2097152 |                                         5        
         4194304 |@@@                                      45       
         8388608 |                                         1        
        16777216 |                                         9        
        33554432 |@@@@@                                    83       
        67108864 |@@@@@@@@@@@                              164      
       134217728 |@@@@@@@@@@                               147      
       268435456 |@@@@                                     56       
       536870912 |@                                        17       
      1073741824 |                                         9        
      2147483648 |                                         1        
      4294967296 |                                         3        
      8589934592 |                                         1        
     17179869184 |                                         0        

fvwm2 sleeping on Xsun:

           value  -------------- Distribution ------------ count    
           32768 |                                         0        
           65536 |@@@@@@@@@@@@@@@@@@@@@@                   67       
          131072 |@@@@@                                    16       
          262144 |@@                                       6        
          524288 |@                                        3        
         1048576 |@@@@@                                    15       
         2097152 |                                         0        
         4194304 |                                         0        
         8388608 |                                         1        
        16777216 |                                         0        
        33554432 |                                         0        
        67108864 |                                         1        
       134217728 |                                         0        
       268435456 |                                         0        
       536870912 |                                         1        
      1073741824 |                                         1        
      2147483648 |                                         2        
      4294967296 |                                         2        
      8589934592 |                                         2        
     17179869184 |                                         0        
     34359738368 |                                         2        
     68719476736 |                                         0        

syslogd sleeping on syslogd:

           value  -------------- Distribution ------------ count    
     17179869184 |                                         0        
     34359738368 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3        
     68719476736 |                                         0        

MozillaFirebird sleeping on MozillaFirebird:

           value  -------------- Distribution ------------ count    
           65536 |                                         0        
          131072 |                                         3        
          262144 |@@                                       14       
          524288 |                                         0        
         1048576 |@@@                                      18       
         2097152 |                                         0        
         4194304 |                                         0        
         8388608 |                                         1        
        16777216 |                                         0        
        33554432 |                                         1        
        67108864 |                                         3        
       134217728 |@                                        7        
       268435456 |@@@@@@@@@@                               53       
       536870912 |@@@@@@@@@@@@@@                           78       
      1073741824 |@@@@                                     25       
      2147483648 |                                         0        
      4294967296 |                                         0        
      8589934592 |@                                        7        
     17179869184 |                                         0

You might want to understand how and why MozillaFirebird is blocking on itself. You could modify the preceding script as shown in the following example to answer this question:

#pragma D option quiet

sched:::sleep
/execname == "MozillaFirebird" && curlwpsinfo->pr_stype == SOBJ_CV/
{
        bedtime[curlwpsinfo->pr_addr] = timestamp;
}

sched:::wakeup
/execname == "MozillaFirebird" && bedtime[args[0]->pr_addr]/
{
        @[args[1]->pr_pid, args[0]->pr_lwpid, pid, curlwpsinfo->pr_lwpid] = 
            quantize(timestamp - bedtime[args[0]->pr_addr]);
        bedtime[args[0]->pr_addr] = 0;
}

sched:::wakeup
/bedtime[args[0]->pr_addr]/
{
        bedtime[args[0]->pr_addr] = 0;
}

END
{
        printa("%d/%d sleeping on %d/%d:\n%@d\n", @);
}

Running the modified script for several seconds results in output similar to the following example:

# dtrace -s ./firebird.d
^C

 100459/1 sleeping on 100459/13:

           value  -------------- Distribution ------------ count    
          262144 |                                         0        
          524288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1        
         1048576 |                                         0        

100459/13 sleeping on 100459/1:

           value  -------------- Distribution ------------ count    
        16777216 |                                         0        
        33554432 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1        
        67108864 |                                         0        

100459/1 sleeping on 100459/2:

           value  -------------- Distribution ------------ count    
           16384 |                                         0        
           32768 |@@@@                                     5        
           65536 |@                                        2        
          131072 |@@@@@                                    6        
          262144 |                                         1        
          524288 |@                                        2        
         1048576 |                                         0        
         2097152 |@@                                       3        
         4194304 |@@@@                                     5        
         8388608 |@@@@@@@@                                 9        
        16777216 |@@@@@                                    6        
        33554432 |@@                                       3        
        67108864 |                                         0        

100459/1 sleeping on 100459/5:

           value  -------------- Distribution ------------ count    
           16384 |                                         0        
           32768 |@@@@@                                    12       
           65536 |@@                                       5        
          131072 |@@@@@@                                   15       
          262144 |                                         1        
          524288 |                                         1        
         1048576 |                                         2        
         2097152 |@                                        4        
         4194304 |@@@@@                                    13       
         8388608 |@@@                                      8        
        16777216 |@@@@@                                    13       
        33554432 |@@                                       6        
        67108864 |@@                                       5        
       134217728 |@                                        4        
       268435456 |                                         0        
       536870912 |                                         1        
      1073741824 |                                         0        

100459/2 sleeping on 100459/1:

           value  -------------- Distribution ------------ count    
           16384 |                                         0        
           32768 |@@@@@@@@@@@@@@                           11       
           65536 |                                         0        
          131072 |@@                                       2        
          262144 |                                         0        
          524288 |                                         0        
         1048576 |@@@@                                     3        
         2097152 |@                                        1        
         4194304 |@@                                       2        
         8388608 |@@                                       2        
        16777216 |@                                        1        
        33554432 |@@@@@@                                   5        
        67108864 |                                         0        
       134217728 |                                         0        
       268435456 |                                         0        
       536870912 |@                                        1        
      1073741824 |@                                        1        
      2147483648 |@                                        1        
      4294967296 |                                         0        

100459/5 sleeping on 100459/1:

           value  -------------- Distribution ------------ count    
           16384 |                                         0        
           32768 |                                         1        
           65536 |                                         2        
          131072 |                                         4        
          262144 |                                         7        
          524288 |                                         1        
         1048576 |                                         5        
         2097152 |                                         10       
         4194304 |@@@@@@                                   77       
         8388608 |@@@@@@@@@@@@@@@@@@@@@@@                  270      
        16777216 |@@@                                      43       
        33554432 |@                                        20       
        67108864 |@                                        14       
       134217728 |                                         5        
       268435456 |                                         2        
       536870912 |                                         1        
      1073741824 |                                         0

You can also use the sleep and wakeup probes to understand the performance of door servers such as the name service cache daemon, as shown in the following example:

sched:::sleep
/curlwpsinfo->pr_stype == SOBJ_SHUTTLE/
{
        bedtime[curlwpsinfo->pr_addr] = timestamp;
}

sched:::wakeup
/execname == "nscd" && bedtime[args[0]->pr_addr]/
{
        @[stringof(curpsinfo->pr_fname), stringof(args[1]->pr_fname)] =
            quantize(timestamp - bedtime[args[0]->pr_addr]);
        bedtime[args[0]->pr_addr] = 0;
}

sched:::wakeup
/bedtime[args[0]->pr_addr]/
{
        bedtime[args[0]->pr_addr] = 0;
}

The tail of the output from running the preceding script on a large mail server resembles the following example:

imapd
           value  -------------- Distribution ------------ count    
           16384 |                                         0        
           32768 |                                         2        
           65536 |@@@@@@@@@@@@@@@@@                        57       
          131072 |@@@@@@@@@@@                              37       
          262144 |                                         3        
          524288 |@@@                                      11       
         1048576 |@@@                                      10       
         2097152 |@@                                       9        
         4194304 |                                         1        
         8388608 |                                         0        

  mountd                                            
           value  -------------- Distribution ------------ count    
           65536 |                                         0        
          131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@            49       
          262144 |@@@                                      6        
          524288 |                                         1        
         1048576 |                                         0        
         2097152 |                                         0        
         4194304 |@@@@                                     7        
         8388608 |@                                        3        
        16777216 |                                         0        

  sendmail
           value  -------------- Distribution ------------ count    
           16384 |                                         0        
           32768 |@                                        18       
           65536 |@@@@@@@@@@@@@@@@@                        205      
          131072 |@@@@@@@@@@@@@                            154      
          262144 |@                                        23       
          524288 |                                         5        
         1048576 |@@@@                                     50       
         2097152 |                                         7        
         4194304 |                                         5        
         8388608 |                                         2        
        16777216 |                                         0        

  automountd                                        
           value  -------------- Distribution ------------ count    
           32768 |                                         0        
           65536 |@@@@@@@@@@                               22       
          131072 |@@@@@@@@@@@@@@@@@@@@@@@                  51       
          262144 |@@                                       6        
          524288 |                                         1        
         1048576 |                                         0        
         2097152 |                                         2        
         4194304 |                                         2        
         8388608 |                                         1        
        16777216 |                                         1        
        33554432 |                                         1        
        67108864 |                                         0        
       134217728 |                                         0        
       268435456 |                                         1        
       536870912 |                                         0

You might be interested in the unusual data points for automountd or the persistent data point at over one millisecond for sendmail. You can add additional predicates to the preceding script to hone in on the causes of any exceptional or anomalous.

`preempt` and `remain-cpu`

Because Oracle Solaris is a preemptive system, higher priority threads preempt lower priority ones. Preemption can induce a significant latency bubble in the lower priority thread, so you might want to know which threads are being preempted by which other threads. The following example shows how to use the preempt and remain-cpu probes to display this information:

#pragma D option quiet

sched:::preempt
{
        self->preempt = 1;
}

sched:::remain-cpu
/self->preempt/
{
        self->preempt = 0;
}

sched:::off-cpu
/self->preempt/
{
        /*
         * If you want to preempt yourself, see who you ended up giving
         * the CPU to.
         */
        @[stringof(args[1]->pr_fname), args[0]->pr_pri, execname,
            curlwpsinfo->pr_pri] = count();
        self->preempt = 0;
}

END
{
        printf("%30s %3s %30s %3s %5s\n", "PREEMPTOR", "PRI",
            "PREEMPTED", "PRI", "#");
        printa("%30s %3d %30s %3d %5@d\n", @);
}

Running the preceding script for several seconds on a system results in output similar to the following example:

# dtrace -s ./whopreempt.d
^C
                     PREEMPTOR PRI                      PREEMPTED PRI     #
                         sched  60                           Xsun  53     1
                         xterm  59                           Xsun  53     1
               MozillaFirebird  57                           Xsun  53     1
                        mpstat 100                          fvwm2  59     1
                         sched  99                MozillaFirebird  57     1
                         sched  60                         dtrace  30     1
                        mpstat 100                           Xsun  59     2
                         sched  60                           Xsun  54     2
                         sched  99                          sched  60     2
                         fvwm2  59                           Xsun  44     2
                         sched  99                           Xsun  44     2
                         sched  60                          xterm  59     2
                         sched  99                           Xsun  53     2
                         sched  99                           Xsun  54     3
                         sched  60                          fvwm2  59     3
                         sched  60                           Xsun  59     3
                         sched  99                           Xsun  59     4
                         fvwm2  59                           Xsun  54     8
                         fvwm2  59                           Xsun  53     9
                          Xsun  59                MozillaFirebird  57    10
                         sched  60                MozillaFirebird  57    14
               MozillaFirebird  57                           Xsun  44    16
               MozillaFirebird  57                           Xsun  54    18

Using `sched change-pri`

Preemption is based on priorities, so you might want to observe changes in priority over time. The following example uses the change-pri probe to display this information:

sched:::change-pri
{
        @[stringof(args[0]->pr_clname)] =
            lquantize(args[2] - args[0]->pr_pri, -50, 50, 5);
}

The example script captures the degree to which priority is raised or lowered, and aggregates by scheduling class. Running the preceding script results in output similar to the following example:

# dtrace -s ./pri.d
dtrace: script './pri.d' matched 10 probes
^C
 IA                                                
           value  -------------- Distribution ------------ count    
           < -50 |                                         20       
             -50 |@                                        38       
             -45 |                                         4        
             -40 |                                         13       
             -35 |                                         12       
             -30 |                                         18       
             -25 |                                         18       
             -20 |                                         23       
             -15 |                                         6        
             -10 |@@@@@@@@                                 201      
              -5 |@@@@@@                                   160      
               0 |@@@@@                                    138      
               5 |@                                        47       
              10 |@@                                       66       
              15 |@                                        36       
              20 |@                                        26       
              25 |@                                        28       
              30 |                                         18       
              35 |                                         22       
              40 |                                         8        
              45 |                                         11       
           >= 50 |@                                        34       

  TS                                                
           value  -------------- Distribution ------------ count    
             -15 |                                         0        
             -10 |@                                        1        
              -5 |@@@@@@@@@@@@                             7        
               0 |@@@@@@@@@@@@@@@@@@@@                     12       
               5 |                                         0        
              10 |@@@@@                                    3        
              15 |                                         0

The output shows the priority manipulation of the Interactive (IA) scheduling class. Instead of seeing priority manipulation, you might want to see the priority values of a particular process and thread over time. The following script uses the change-pri probe to display this information:

#pragma D option quiet

BEGIN
{
        start = timestamp;
}

sched:::change-pri
/args[1]->pr_pid == $1 && args[0]->pr_lwpid == $2/
{
        printf("%d %d\n", timestamp - start, args[2]);
}

tick-1sec
/++n == 5/
{
        exit(0);
}

To see the change in priorities over time, type the following command in one window:

$ echo $$
139208
$ while true ; do let i=0 ; done

In another window, run the script and redirect the output to a file:

# dtrace -s ./pritime.d 139208 1 > /tmp/pritime.out

You can use the file /tmp/pritime.out that is generated in the preceding example as input to plotting software to graphically display priority over time. gnuplot is a freely available plotting package that is included in the Oracle Solaris Freeware Companion CD. By default, gnuplot is installed in /opt/sfw/bin.

`sched tick` Examples

Oracle Solaris uses tick-based CPU accounting, in which a system clock interrupt fires at a fixed interval and attributes CPU utilization to the threads and processes running at the time of the tick. The following example shows how to use the tick probe to observe this attribution:

# dtrace -n sched:::tick'{@[stringof(args[1]->pr_fname)] = count()}'
^C
  arch                                                              1
  sh                                                                1
  sed                                                               1
  echo                                                              1
  ls                                                                1
  FvwmAuto                                                          1
  pwd                                                               1
  awk                                                               2
  basename                                                          2
  expr                                                              2
  resize                                                            2
  tput                                                              2
  uname                                                             2
  fsflush                                                           2
  dirname                                                           4
  vim                                                               9
  fvwm2                                                            10
  ksh                                                              19
  xterm                                                            21
  Xsun                                                             93
  MozillaFirebird                                                 260

The system clock frequency varies from operating system to operating system, but generally ranges from 25 hertz to 1024 hertz. The Oracle Solaris system clock frequency is adjustable, but defaults to 100 hertz.

The tick probe only fires if the system clock detects a runnable thread. To use the tick probe to observe the system clock's frequency, you must have a thread that is always runnable. In one window, create a looping shell as shown in the following example:

$ while true ; do let i=0 ; done

n another window, run the following script:

uint64_t last[int];

sched:::tick
/last[cpu]/
{
        @[cpu] = min(timestamp - last[cpu]);
}

sched:::tick
{
        last[cpu] = timestamp;
}

# dtrace -s ./ticktime.d
dtrace: script './ticktime.d' matched 2 probes
^C

  0          9883789

The minimum interval is 9.8 millisecond, which indicates that the default clock tick frequency is 10 milliseconds (100 hertz). The observed minimum is somewhat less than 10 milliseconds due to jitter.

One deficiency of tick-based accounting is that the system clock that performs accounting is often also responsible for dispatching any time-related scheduling activity. As a result, if a thread is to perform some amount of work every clock tick, that is every 10 milliseconds, the system will either over-account for the thread or under-account for the thread, depending on whether the accounting is done before or after time-related dispatching scheduling activity. In Oracle Solaris, accounting is performed before time-related dispatching. As a result, the system will under-account for threads running at regular interval. If such threads run for less than the clock tick interval, they can effectively hide behind the clock tick. The following example shows the degree to which the system has such threads:

sched:::tick,
sched:::enqueue
{
        @[probename] = lquantize((timestamp / 1000000) % 10, 0, 10);
}

The output of the example script is two distributions of the millisecond offset within a ten millisecond interval, one for the tick probe and another for enqueue:

# dtrace -s ./tick.d
dtrace: script './tick.d' matched 4 probes
^C
  tick                                              
           value  -------------- Distribution ------------ count    
               6 |                                         0        
               7 |@                                        3        
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   79       
               9 |                                         0        

  enqueue                                           
           value  -------------- Distribution ------------ count    
             < 0 |                                         0        
               0 |@@                                       267      
               1 |@@                                       300      
               2 |@@                                       259      
               3 |@@                                       291      
               4 |@@@                                      360      
               5 |@@                                       305      
               6 |@@                                       295      
               7 |@@@@                                     522      
               8 |@@@@@@@@@@@@                             1315     
               9 |@@@                                      337

The output histogram named tick shows that the clock tick is firing at an 8 millisecond offset. If scheduling were not at all associated with the clock tick, the output for enqueue would be evenly spread across the ten millisecond interval. However, the output shows a spike at the same 8 millisecond offset, indicating that at least some threads in the system are being scheduled on a time basis.

`cpucaps-sleep` and `cpucaps-wakeup`

You can use cpucaps-sleep and cpucaps-wakeup probes to understand the impact CPU Caps have on specific processes and threads. The following example shows how much various processes spend on wait queues:

sched:::cpucaps-sleep
{
  sleep[args[1]->pr_pid] = timestamp;
}

sched:::cpucaps-wakeup
/sleep[args[1]->pr_pid]/
{
  @sleeps[args[1]->pr_fname] = quantize(timestamp - sleep[args[1]->pr_pid]);
  sleep[args[1]->pr_pid] = 0;
}

Running the preceding script results in output similar to the following example:

# dtrace -s 

./capswait.d
dtrace: script './capswait.d' matched 2 probes
^C

  exmh                                              
           value  ------------- Distribution ------------- count    
         8388608 |                                         0        
        16777216 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4        
        33554432 |                                         0        

  scan                                              
           value  ------------- Distribution ------------- count    
        16777216 |                                         0        
        33554432 |@@@@@@@@@@@@@@@@@@@@                     1        
        67108864 |                                         0        
       134217728 |@@@@@@@@@@@@@@@@@@@@                     1        
       268435456 |                                         0        

  firefox-bin                                       
           value  ------------- Distribution ------------- count    
         4194304 |                                         0        
         8388608 |@@                                       1        
        16777216 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      19       
        33554432 |@@@@                                     2        
        67108864 |                                         0

`sched` Stability

The sched provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see DTrace Stability Mechanisms.

Table 57 Stability Mechanism for the sched Provider

Element	Name Stability	Data Stability	Dependency Class
Provider	Evolving	Evolving	ISA
Module	Private	Private	Unknown
Function	Private	Private	Unknown
Name	Evolving	Evolving	ISA
Arguments	Evolving	Evolving	ISA

Oracle® Solaris 11.3 DTrace (Dynamic Tracing) Guide

sched Probes

sched Probe Arguments

sched cpuinfo_t Structure

Using the sched Provider

sched on-cpu and off-cpu Probes

sched CPU Examples

sleep and wakeup

preempt and remain-cpu

Using sched change-pri

sched tick Examples

cpucaps-sleep and cpucaps-wakeup

sched Stability

Oracle^® Solaris 11.3 DTrace (Dynamic Tracing) Guide

`sched` Provider

`sched` Probes

`sched` Probe Arguments

`sched cpuinfo_t` Structure

Using the `sched` Provider

`sched on-cpu` and `off-cpu` Probes

`sched` CPU Examples

`sleep` and `wakeup`

`preempt` and `remain-cpu`

Using `sched change-pri`

`sched tick` Examples

`cpucaps-sleep` and `cpucaps-wakeup`

`sched` Stability