13.6 Introducing the D Programming Language

13.6.1 Probe Clauses
13.6.2 Pragmas
13.6.3 Global Variables
13.6.4 Predicates
13.6.5 Scalar Arrays and Associative Arrays
13.6.6 Pointers and External Variables
13.6.7 Address Spaces
13.6.8 Thread-local Variables
13.6.9 Speculations
13.6.10 Aggregations

D programs describe the probes that are to be enabled together with the predicates and actions that are bound to the probes. D programs can also declare variables and define new types. This section provides an introduction to the important features that you are likely to encounter in simple D programs.

13.6.1 Probe Clauses

D programs consist of a set of one or more probe clauses. Each probe clause takes the general form shown here:

probe_description_1 [, probe_description_2]...
[/ predicate_statement /]
{
  [action_statement;]
  .
  .
  .
} 

Every probe clause begins with a list of one or more probe descriptions in this form:

provider:module:function:probe_name

where the fields are as follows:

provider

The name of the DTrace provider that is publishing this probe. For kernel probes, the provider name typically corresponds to the name of the DTrace kernel module that performs the instrumentation to enable the probe, for example, proc. When tracing a DTrace-enabled, user-space application or library, this field takes the form namePID, where name is the name of the provider as defined in the provider definition file that was used to build the application or library and PID is the process ID of the running executable.

module

The name of the kernel module, library, or user-space program in which the probe is located, if any, for example, vmlinux. This module is not the same as the kernel module that implements a provider.

function

The name of the function in which the probe is located, for example, do_fork.

probe_name

The name of the probe usually describes its location within a function, for example, create, entry, or return.

The compiler interprets the fields from right to left. For example, the probe description settimeofday:entry would match a probe with function settimeofday and name entry regardless of the value of the probe's provider and module fields. You can regard a probe description as a pattern that matches one or more probes based on their names. You can omit the leading colons before a probe name if the probe that you want to use has a unique name. If several providers publish probes with the same name, use the available fields to obtain the correct probe. If you do not specify a provider, you might obtain unexpected results if multiple probes have the same name. Specifying a provider but leaving the module, function, and probe name fields blank, matches all probes in a provider. For example, syscall::: matches every probe published by the syscall provider.

The optional predicate statement uses criteria such as process ID, command name, or timestamp to determine whether the associated actions should take place. If you omit the predicate, any associated actions always run if the probe is triggered.

You can use the ?, *, and [] shell wildcards with probe clauses. For example, syscall::[gs]et*: matches all syscall probes for function names that begin with get or set. If necessary, use the \ character to escape wildcard characters that form part of a name.

You can enable the same actions for more than one probe description. For example, the following D program uses the trace() function to record a timestamp each time that any process invokes a system call containing the string mem or soc:

syscall::*mem*:entry, syscall::*soc*:entry
{
        trace(timestamp);
}

By default, the trace() function writes the result to the principal buffer, which is accessible by other probe clauses within a D program, and whose contents dtrace displays when the program exits.

13.6.2 Pragmas

You can use compiler directives called pragmas in a D program. Pragma lines begin with a # character, and are usually placed at the beginning of a D program. The primary use of pragmas is to set run-time DTrace options. For example, the following pragma statements suppress all output except for traced data and permit destructive operations.

#pragma D option quiet
#pragma D option destructive

13.6.3 Global Variables

D provides fundamental data types for integers and floating-point constants. You can perform arithmetic only on integers in D programs. D does not support floating-point operations. D provides floating-point types for compatibility with ANSI-C declarations and types. You can trace floating-point data objects and use the printf() function to format them for output. In the current implementation, DTrace supports only the 64-bit data model for writing D programs.

You can use declarations to introduce D variables and external C symbols, or to define new types for use in D. The following example program, tick.d, declares and initializes the variable i when the D program starts, displays its initial value, increments the variable and prints its value once every second, and displays the final value when the program exits.

BEGIN
{
  i = 0;
  trace(i);
}

profile:::tick-1sec
{
  printf("i=%d\n",++i);
}

END
{
  trace(i);
}

When run, the program produces output such as the following until you type Ctrl-C:

# dtrace -s tick.d 
dtrace: script 'tick.d' matched 3 probes
CPU     ID               FUNCTION:NAME
  1      1                       :BEGIN       0
  1    618                       :tick-1sec i=1

  1    618                       :tick-1sec i=2

  1    618                       :tick-1sec i=3

  1    618                       :tick-1sec i=4

  1    618                       :tick-1sec i=5

^C
  0      2                       :END         5

Whenever a probe is triggered, dtrace displays the number of the CPU core on which the process indicated by its ID is running, and the name of the function and the probe. BEGIN and END are DTrace probes that trigger when the dtrace program starts and finishes.

To suppress all output except that from printa(), printf(), and trace(), specify #pragma D option quiet in the program or the -q option to dtrace.

# dtrace -q -s tick.d 
0i=1
i=2
i=3
i=4
i=5
^C
5

13.6.4 Predicates

Predicates are logic statements that select whether DTrace invokes the actions that are associated with a probe. For example, the predicates in the following program sc1000.d examine the value of the variable i. This program also demonstrates how to include C-style comments.

#pragma D option quiet

BEGIN
{
  /* Initialize i */
  i = 1000; 
}

syscall:::entry
/i > 0/
{
  /* Decrement i */
  i--; 
}

syscall:::entry
/(i % 100) == 0/
{
  /* Print i after every 100 system calls */
  printf("i = %d\n",i); 
}

syscall:::entry
/i == 0/
{
  printf("i = 0; 1000 system calls invoked\n");
  exit(0);  /* Exit with a value of 0 */
}

The program initializes i with a value of 1000, decrements its value by 1 whenever a process invokes a system call, prints its value after every 100 system calls, and exits when the value of 1 reaches 0. Running the program in quite mode produces output similar to the following:

# dtrace -s sc1000.d 
i = 900
i = 700
i = 800
i = 600
i = 500
i = 400
i = 300
i = 200
i = 100
i = 0
i = 0; 1000 system calls invoked

Note that the order of the countdown sequence is not as expected. The output for i=800 appears after the output for i=700. If you turn off quiet mode, it becomes apparent that the reason is that dtrace is collecting information from probes that can be triggered on all the CPU cores. You cannot expect runtime output from DTrace to be sequential in a multithreaded environment.

# dtrace -s sc1000.d 
dtrace: script 'sc1000.d' matched 889 probes
CPU     ID                FUNCTION:NAME
  0    457           clock_gettime:entry i = 900

  0    413                   futex:entry i = 700

  1     41                   lseek:entry i = 800

  1     25                    read:entry i = 600

  1     25                    read:entry i = 500

  1     25                    read:entry i = 400

  1     71                  select:entry i = 300

  1     71                  select:entry i = 200

  1     25                    read:entry i = 100

  1     25                    read:entry i = 0

  1     25                    read:entry i = 0; 1000 system calls invoked

The next example is an executable DTrace script that displays the file descriptor, output string, and string length specified to the write() system call whenever the date command is run on the system.

#!/usr/sbin/dtrace -s
#pragma D option quiet

syscall::write:entry
/execname == "date"/
{
  printf("%s(%d, %s, %4d)\n", probefunc, arg0, copyinstr(arg1), arg2);
} 

If you run the script from one window, while typing the date command in another, you see output such as the following in the first window:

write(1, Wed Aug 15 10:42:34 BST 2012
,   29) 

13.6.5 Scalar Arrays and Associative Arrays

The D language supports scalar arrays, which correspond directly in concept and syntax with arrays in C. A scalar array is a fixed-length group of consecutive memory locations that each store a value of the same type. You access scalar arrays by referring to each location with an integer starting from zero. In D programs, you would usually use scalar arrays to access array data within the operating system.

For example, you would use the following statement to declare a scalar array sa of 5 integers:

int sa[5];

As in C, sa[0] refers to the first array element, sa[1] refers to the second, and so on up to sa[4] for the fifth element.

The D language also supports a special kind of variable called an associative array. An associative array is similar to a scalar array in that it associates a set of keys with a set of values, but in an associative array the keys are not limited to integers of a fixed range. In the D language, you can index associative arrays by a list of one or more values of any type. Together the individual key values form a tuple that you use to index into the array and access or modify the value that corresponds to that key. Each tuple key must be of the same length and must have the same key types in the same order. The value associated with each element of an associative array is also of a single fixed type for the entire array.

For example, the following statement defines a new associative array aa of value type int with the tuple signature string, int, and stores the integer value 828 in the array:

aa["foo", 271] = 828;

Once you have defined an array, you can access its elements in the same way as any other variable. For example, the following statement modifies the array element previously stored in a by incrementing the value from 828 to 829:

a["foo", 271]++;

You can define additional elements for the array by specifying a different tuple with the same tuple signature, as shown here:

aa["bar", 314] = 159;
aa["foo", 577] = 216;

The array elements aa["foo", 271] and aa["foo", 577] are distinct because the values of their tuples differ in the value of their second key.

Syntactically, scalar arrays and associative arrays are very similar. You can declare an associative array of integers referenced by an integer key as follows:

int ai[int];

You could reference an element of this array using the expression such as ai[0]. However, from a storage and implementation perspective, the two kinds of array are very different. The scalar array sa consists of five consecutive memory locations numbered from zero, and the index refers to an offset in the storage allocated for the array. An associative array such as ai has no predefined size and it does not store elements in consecutive memory locations. In addition, associative array keys have no relationship to the storage location of the corresponding value. If you access the associative array elements a[0] and a[-5], DTrace allocates only two words of storage, which are not necessarily consecutive in memory. The tuple keys that you use to index associative arrays are abstract names for the corresponding value, and they bear no relationship to the location of the value in memory.

If you create an array using an initial assignment and use a single integer expression as the array index, for example, a[0] = 2;, the D compiler always creates a new associative array, even though a could also be interpreted as an assignment to a scalar array. If you want to use a scalar array, you must explicitly declare its type and size.

13.6.6 Pointers and External Variables

The implementation of pointers in the D language gives you the ability to create and manipulate the memory addresses of data objects in the operating system kernel, and to store the contents of those data objects in variables and associative arrays. The syntax of D pointers is the same as the syntax of pointers in ANSI-C. For example, the following statement declares a D global variable named p that is a pointer to an integer.

int *p;

This declaration means that p itself is a 64-bit integer whose value is the address in memory of another integer.

If you want to create a pointer to a data object inside the kernel, you can compute its address by using the & reference operator. For example, the kernel source code declares an unsigned long max_pfn variable. You can access the value of such an external variable in the D language by prefixing it with the ` (backquote) scope operator:

value = `max_pfn;

If more than one kernel module declares a variable with the same name, prefix the scoped external variable with the name of the module. For example, foo`bar would refer to the address of the bar() function provided by the module foo.

You can extract the address of an external variable by applying the & operator and store it as a pointer:

p = &`max_pfn;

You can use the * dereference operator to refer to the object that a pointer addresses:

value = *p;

You cannot apply the & operator to DTrace objects such as associative arrays, built-in functions, and variables. If you create composite structures, it is possible to construct expressions that retrieve the kernel addresses of DTrace objects. However, DTrace does not guarantee to preserve the addresses of such objects across probe firings.

You cannot use the * dereference operator on the left-hand side of an assignment expression. You may only assign values directly to D variables by name or by applying the array index operator [] to a scalar array or an associative array.

You cannot use pointers to perform indirect function calls. You may only call DTrace functions directly by name.

13.6.7 Address Spaces

DTrace executes D programs within the address space of the operating system kernel. Your entire Oracle Linux system manages one address space for the operating system kernel, and one for each user process. As each address space provides the illusion that it can access all of the memory on the system, the same virtual address might be used in different address spaces, but it would translate to different locations in physical memory. If your D programs use pointers, you need to be aware which address space corresponds to those pointers.

For example, if you use the syscall provider to instrument entry to a system call such as pipe() that takes a pointer to an integer or to an array of integers as an argument, it is not valid to use the * or [] operators to dereference that pointer or array. The address is in the address space of the user process that performed the system call, and not in the address space of the kernel. Dereferencing the address in D accesses the kernel's address space, which would result in an invalid address error or return unexpected data to your D program.

To access user process memory from a DTrace probe, use one of the copyin(), copyinstr(), or copyinto() functions with an address in user space.

The following D programs show two alternate and equivalent ways to print the file descriptor, string, and string length arguments that a process passed to the write() system call:

syscall::write:entry
{
  printf("fd=%d buf=%s count=%d", arg0, stringof(copyin(arg1, arg2)), arg2);
}

syscall::write:entry
{
  printf("fd=%d buf=%s count=%d", arg0, copyinstr(arg1, arg2), arg2);
}

The arg0, arg1 and arg2 variables contain the value of the fd, buf, and count arguments to the system call. Note that the value of arg1 is an address in the address space of the process, and not in the address space of the kernel.

In this example, it is necessary to use the stringof() function with copyin() so that DTrace converts the retrieved user data to a string. The copyinstr() function always returns a string.

To avoid confusion, you should name and comment variables that store user addresses appropriately. You should also store user addresses as variables of type uintptr_t so that you do not accidentally compile D code that dereferences them.

13.6.8 Thread-local Variables

Thread-local variables are defined within the scope of execution of a thread on the system. To indicate that a variable is thread-local, you prefix it with self-> as shown in the following example.

#pragma D option quiet

syscall::read:entry
{
  self->t = timestamp; /* Initialize a thread-local variable */
}

syscall::read:return
/self->t != 0/
{
  printf("%s (pid:tid=%d:%d) spent %d microseconds in read()\n",
  execname, pid, tid, ((timestamp - self->t)/1000)); /* Divide by 1000 -> microseconds */

  self->t = 0; /* Reset the variable */
}

This D program (dtrace.d) displays the command name, process ID, thread ID, and expired time in microseconds whenever a process invokes the read() system call.

# dtrace -s readtrace.d
nome-terminal (pid:tid=2774:2774) spent 27 microseconds in read()
gnome-terminal (pid:tid=2774:2774) spent 16 microseconds in read()
hald-addon-inpu (pid:tid=1662:1662) spent 26 microseconds in read()
hald-addon-inpu (pid:tid=1662:1662) spent 17 microseconds in read()
Xorg (pid:tid=2046:2046) spent 18 microseconds in read()
...

13.6.9 Speculations

The speculative tracing facility in DTrace allows you to tentatively trace data and then later decide whether to commit the data to a tracing buffer or discard the data. Predicates are the primary mechanism for filtering out uninteresting events. Predicates are useful when you know at the time that a probe fires whether or not the probe event is of interest. However, in some situations, you might not know whether a probe event is of interest until after the probe fires.

For example, if a system call is occasionally failing with an error code in errno, you might want to examine the code path leading to the error condition. You can write trace data at one or more probe locations to speculative buffers, and then choose which data to commit to the principal buffer at another probe location. As a result, your trace data contains only the output of interest, no post-processing is required, and the DTrace overhead is minimized.

To create a speculative buffer, use the speculation() function. This function returns a speculation identifier, which you use in subsequent calls to the speculate() function.

Call the speculate() function before performing any data-recording actions in a clause. DTrace directs all subsequent data that you record in a clause to the speculative buffer. You can create only one speculation in any given clause.

Typically, you assign a speculation identifier to a thread-local variable, and then use that variable as a predicate to other probes as well as an argument to speculate(). For example:

#!/usr/sbin/dtrace -Fs

syscall::open:entry
{
  /*
   * The call to speculation() creates a new speculation. If this fails,
   * dtrace will generate an error message indicating the reason for
   * the failed speculation(), but subsequent speculative tracing will be
   * silently discarded.
   */
  self->spec = speculation();
  speculate(self->spec);

  /*
   * Because this printf() follows the speculate(), it is being
   * speculatively traced; it will only appear in the data buffer if the
   * speculation is subsequently commited.
   */
  printf("%s", copyinstr(arg0));
}

syscall::open:return
/self->spec/
{
  /*
   * To balance the output with the -F option, we want to be sure that
   * every entry has a matching return. Because we speculated the
   * open entry above, we want to also speculate the open return.
   * This is also a convenient time to trace the errno value.
   */
  speculate(self->spec);
  trace(errno);
}

If a speculative buffer contains data that you want to retain, use the commit() function to copy its contents to the principal buffer. If you want to delete the contents of a speculative buffer, use the discard() function. The following example clauses commit or discard the speculative buffer based on the value of the errno variable:

syscall::open:return
/self->spec && errno != 0/
{
  /*
   * If errno is non-zero, we want to commit the speculation.
   */
  commit(self->spec);
  self->spec = 0;
}

syscall::open:return
/self->spec && errno == 0/
{
  /*
   * If errno is not set, we discard the speculation.
   */
  discard(self->spec);
  self->spec = 0;
}

Running this script produces output similar to the following example when the open() system call fails:

# ./specopen.d
dtrace: script ’./specopen.d’ matched 4 probes
CPU FUNCTION
  1  => open                                  /var/ld/ld.config
  1  <= open                                          2
  1  => open                                  /images/UnorderedList16.gif
  1  <= open                                          4
...

13.6.10 Aggregations

DTrace provides the following built-in functions for aggregating the data that individual probes gather.

Aggregating Function

Description

avg(scalar_expression)

Returns the arithmetic mean of the expressions that are specified as arguments.

count()

Returns the number of times that the function has been called.

lquantize(scalar_expression, lower_bound, upper_bound, step_interval)

Returns a linear frequency distribution of the expressions that are specified as arguments, scaled to the specified lower bound, upper bound, and step interval. Increments the value in the highest bucket that is smaller than the specified expression.

max(scalar_expression)

Returns the maximum value of the expressions that are specified as arguments.

min(scalar_expression)

Returns the minimum value of the expressions that are specified as arguments.

quantize(scalar_expression)

Returns a power-of-two frequency distribution of the expressions that are specified as arguments. Increments the value of the highest power-of-two bucket that is smaller than the specified expression.

stddev(scalar_expression)

Returns the standard deviation of the expressions that are specified as arguments.

sum(scalar_expression)

Returns the sum of the expressions that are specified as arguments.

DTrace indexes the results of an aggregation using a tuple expression similar to that used for an associative array:

@name[list_of_keys] = aggregating_function(args);

The name of the aggregation is prefixed with an @ character. All aggregations are global. If you do not specify a name, the aggregation is anonymous. The keys describe the data that the aggregating function is collecting.

For example, the following command counts the number of write() system calls invoked by processes until you type Ctrl-C.

# dtrace -n syscall::write:entry'{ @["write() calls"] = count(); }'
dtrace: description 'syscall:::' matched 1 probe
^C

  write() calls                                              9

The next example counts the number of both read() and write() system calls:

# dtrace -n syscall::write:entry,syscall::read:entry\
'{ @[strjoin(probefunc,"() calls")] = count(); }'
dtrace: description 'syscall::write:entry,syscall::read:entry' matched 2 probes
^C

  write() calls                                            150
  read() calls                                            1555
Note

If you specify the -q option to dtrace or #pragma D option quiet in a D program, DTrace suppresses the automatic printing of aggregations. In this case, you must use a printa() statement to display the information.