3 D Program Syntax Reference

This reference describes how to write D programs that can be used with DTrace to enable probes and perform operations.

Program Structure

A D program consists of a set of clauses that describe the probes to enable, an optional predicate that controls when to run, and one or more statements that often describe some functionality to implement when the probe fires. D programs can also contain declarations of variables and definitions of new types. A probe clause declaration uses the following structure:

probe descriptions 
/ predicate / 
{
  statements
}
Probe Descriptions

Probe descriptions ideally express the full description for a probe and take the form:

provider:module:function:name

The field descriptors are defined as follows:

provider

The name of the DTrace provider that the probe belongs to.

module

If the probe corresponds to a specific program location, the name of the kernel module, library, or user-space program in which the probe is found. Some probes might be associated with a module name that isn't tied to a particular source location in cases where they relate to more abstract tracepoints.

function

If the probe corresponds to a specific program location, the name of the program function in which the probe is found.

name

The name that provides some idea of the probe's semantic meaning, such as BEGIN or END.

DTrace recognizes a form of shorthand when referencing probes. By convention, if you don't specify all the fields of a probe description, DTrace can match a request to all the probes with matching values in the parts of the name that you do specify. For example, you can reference the probe name BEGIN in a script to match any probe with the name field BEGIN, regardless of the value of the provider, module, and function fields. For example, you might see a probe referenced as:

BEGIN

If a probe is referenced in a D program and it doesn't use a full probe description, the fields are interpreted based on an order of precedence:

  • A single component matches the probe name, expressed as:
    name
  • Two components match the function and probe name, expressed as:
    function:name
  • Three components match the module, function, and probe name
    module:function:name

Although probes can also be referenced by their ID, this value can change over time. The number of probes on the system doesn't directly correlate to the ID, because new provider modules can be loaded at any time and some providers also offer the ability to create new probes on-the-fly. Avoid using the numerical probe ID to reference a probe.

Probe descriptions also support a pattern-matching syntax similar to the shell globbing pattern matching syntax that's described in the sh(1) manual page. For example, you can use the asterisk symbol (*) to perform a wildcard match, as in the following description:

sdt:::tcp*

If any fields are blank in the probe description, a wildcard match is performed on that field.

Unless matching several probes intentionally, specifying the full probe description to avoid unpredictable results is better practice.

Table 3-1 Probe Name Pattern Matching Characters

Symbol Description

*

Matches any string, including the null string.

?

Matches any single character.

[]

Matches any one of the characters inside the square brackets. A pair of characters separated by - matches any character between the pair, inclusive. If the first character after the [ is !, any character not within the set is matched.

\

Interpret the next character as itself, without any special meaning.

To successfully match and enable a probe, the complete probe description must match on every field. A probe description field that isn't a pattern must exactly match the corresponding field of the probe. Note that a description field that's empty matches any probe.

Several probes can be included in a comma-separated list. By including several probes in the description, the same predicate, and function sequences are applied when each probe is activated.

Predicates

Predicates are expressions that appear between a pair of slashes (//) that are then evaluated at probe firing time to decide whether the associated functions must be processed. Predicates are the primary conditional construct that are used for building more complex control flow in a D program. You can omit the predicate section of the probe clause entirely for any probe so that the functions are always processed when the probe is activated.

Predicate expressions can use any of the D operators and can include any D data objects such as variables and constants. The predicate expression must evaluate to a value of integer or pointer type so that it can be considered as true or false. As with all D expressions, a zero value is interpreted as false and any non-zero value is interpreted as true.

Statements

Statements are described by a list of expressions or functions that are separated by semicolons (;) and within braces ({}). An empty set of braces with no statements included causes the default action to be processed. The Default Action reports the probe activation.

A program can consist of several probe-clause declarations. Clauses run in program order.

A program can be stored on the file system and can be run by the DTrace utility. You can transform a program into an executable script by prepending the file with an interpreter directive that calls the dtrace command along with any required options, as a single argument, to run the program. See the sh(1) manual page for more information on adding the interpreter line to the beginning of a script. The interpreter directive might look as follows:

#!/usr/sbin/dtrace -qs

A script can also include D pragma directives to set runtime and compiler options. See DTrace Runtime and Compile-time Options Reference for more information on including this information in a script.

Types, Operators, and Expressions

D provides the ability to access and manipulate various data objects: variables and data structures can be created and changed, data objects that are defined in the OS kernel and user processes can be accessed, and integer, floating-point, and string constants can be declared. D provides a superset of the ANSI C operators that are used to manipulate objects and create complex expressions. This section describes the detailed set of rules for types, operators, and expressions.

Identifier Names and Keywords

D identifier names are composed of uppercase and lowercase letters, digits, and underscores, where the first character must be a letter or underscore. All identifier names beginning with an underscore (_) are reserved for use by the D system libraries. Avoid using these names in D programs. By convention, D programmers typically use mixed-case names for variables and all uppercase names for constants.

D language keywords are special identifiers that are reserved for use in the programming language syntax itself. These names are always specified in lowercase and must not be used for the names of D variables. The following table lists the keywords that are reserved for use by the D language.

Table 3-2 D Keywords

auto*

do*

if*

register*

string+

unsigned

break*

double

import*+

restrict*

stringof+

void

case*

else*

inline

return*

struct

volatile

char

enum

int

self+

switch*

while*

const

extern

long

short

this+

xlate+

continue*

float

offsetof+

signed

translator+

 

counter*+

for*

probe*+

sizeof

typedef

 

default*

goto*

provider*+

static*

union

 

D reserves for use as keywords a superset of the ANSI C keywords. The keywords reserved for future use by the D language are marked with “*”. The D compiler produces a syntax error if you try to use a keyword that's reserved for future use. The keywords that are defined by D but not defined by ANSI C are marked with “+”. D provides the complete set of types and operators found in ANSI C. The major difference in D programming is the absence of control-flow constructs. Note that keywords associated with control-flow in ANSI C are reserved for future use in D.

Data Types and Sizes

D provides fundamental data types for integers and floating-point constants. Arithmetic can only be performed on integers in D programs. Floating-point constants can be used to initialize data structures, but floating-point arithmetic isn't permitted in D. D provides a 64-bit data model for use in writing programs.

The names of the integer types and their sizes in the 64-bit data model are shown in the following table. Integers are always represented in twos-complement form in the native byte-encoding order of a system.

Table 3-3 D Integer Data Types

Type Name 64-bit Size

char

1 byte

short

2 bytes

int

4 bytes

long

8 bytes

long long

8 bytes

Integer types, including char, can be prefixed with the signed or unsigned qualifier. Integers are implicitly signed unless the unsigned qualifier isn't specified. The D compiler also provides the type aliases that are listed in the following table.

Table 3-4 D Integer Type Aliases

Type Name Description

int8_t

1-byte signed integer

int16_t

2-byte signed integer

int32_t

4-byte signed integer

int64_t

8-byte signed integer

intptr_t

Signed integer of size equal to a pointer

uint8_t

1-byte unsigned integer

uint16_t

2-byte unsigned integer

uint32_t

4-byte unsigned integer

uint64_t

8-byte unsigned integer

uintptr_t

Unsigned integer of size equal to a pointer

These type aliases are equivalent to using the name of the corresponding base type listed in the previous table and are appropriately defined for each data model. For example, the uint8_t type name is an alias for the type unsigned char.

Note:

The predefined type aliases can't be used in files that are included by the preprocessor.

D provides floating-point types for compatibility with ANSI C declarations and types. Floating-point operators aren't available in D, but floating-point data objects can be traced and formatted with the printf function. You can use the floating-point types that are listed in the following table.

Table 3-5 D Floating-Point Data Types

Type Name 64-bit Size

float

4 bytes

double

8 bytes

long double

16 bytes

D also provides the special type string to represent ASCII strings. Strings are discussed in more detail in DTrace String Processing.

Constants

Integer constants can be written in decimal (12345), octal (012345), or hexadecimal (0x12345) format. Octal (base 8) constants must be prefixed with a leading zero. Hexadecimal (base 16) constants must be prefixed with either 0x or 0X. Integer constants are assigned the smallest type among int, long, and long long that can represent their value. If the value is negative, the signed version of the type is used. If the value is positive and too large to fit in the signed type representation, the unsigned type representation is used. You can apply one of the suffixes listed in the following table to any integer constant to explicitly specify its D type.

Suffix D type

u or U

unsigned version of the type selected by the compiler

l or L

long

ul or UL

unsigned long

ll or LL

long long

ull or ULL

unsigned long long

Floating-point constants are always written in decimal format and must contain either a decimal point (12.345), an exponent (123e45), or both ( 123.34e-5). Floating-point constants are assigned the type double by default. You can apply one of the suffixes listed in the following table to any floating-point constant to explicitly specify its D type.

Suffix D type

f or F

float

l or L

long double

Character constants are written as a single character or escape sequence that's inside a pair of single quotes ('a'). Character constants are assigned the int type rather than char and are equivalent to an integer constant with a value that's determined by that character's value in the ASCII character set. See the ascii(7) manual page for a list of characters and their values. You can also use any of the special escape sequences that are listed in the following table. D uses the same escape sequences as those found in ANSI C.

Table 3-6 Character Escape Sequences

Escape Sequence Represents Escape Sequence Represents

\a

alert

\\

backslash

\b

backspace

\?

question mark

\f

form feed

\'

single quote

\n

newline

\"

double quote

\r

carriage return

\0oo

octal value 0oo

\t

horizontal tab

\xhh

hexadecimal value 0xhh

\v

vertical tab

\0

null character

You can include more than one character specifier inside single quotes to create integers with individual bytes that are initialized according to the corresponding character specifiers. The bytes are read left-to-right from a character constant and assigned to the resulting integer in the order corresponding to the native endianness of the operating environment. Up to eight character specifiers can be included in a single character constant.

Strings constants of any length can be composed by enclosing them in a pair of double quotes ("hello"). A string constant can't contain a literal newline character. To create strings containing newlines, use the \n escape sequence instead of a literal newline. String constants can contain any of the special character escape sequences that are shown for character constants before. Similar to ANSI C, strings are represented as arrays of characters that end with a null character (\0) that's implicitly added to each string constant you declare. String constants are assigned the special D type string. The D compiler provides a set of special features for comparing and tracing character arrays that are declared as strings.

Arithmetic Operators

Binary arithmetic operators are described in the following table. These operators all have the same meaning for integers that they do in ANSI C.

Table 3-7 Binary Arithmetic Operators

Operator Description

+

Integer addition

-

Integer subtraction

*

Integer multiplication

/

Integer division

%

Integer modulus

Arithmetic in D can only be performed on integer operands or on pointers. Arithmetic can't be performed on floating-point operands in D programs. The DTrace execution environment doesn't take any action on integer overflow or underflow. You must check for these conditions in situations where overflow and underflow can occur.

However, the DTrace execution environment does automatically check for and report division by zero errors resulting from improper use of the / and % operators. If a D program contains an invalid division operation that's detectable at compile time, a compile error is returned and the compilation fails. If the invalid division operation takes place at run time, processing of the current clause is quit, and the ERROR probe is activated. If the D program has no clause for the ERROR probe, the error is printed and tracing continues. Otherwise, the actions in the clause assigned to the ERROR probe are processed. Errors that are detected by DTrace have no effect on other DTrace users or on the OS kernel. You therefore don't need to be concerned about causing any damage if a D program inadvertently contains one of these errors.

In addition to these binary operators, the + and - operators can also be used as unary operators, and these operators have higher precedence than any of the binary arithmetic operators. The order of precedence and associativity properties for all D operators is presented in Operator Precedence. You can control precedence by grouping expressions in parentheses (()).

Relational Operators

Binary relational operators are described in the following table. These operators all have the same meaning that they do in ANSI C.

Table 3-8 D Relational Operators

Operator Description

<

Left-hand operand is less than right-operand

<=

Left-hand operand is less than or equal to right-hand operand

>

Left-hand operand is greater than right-hand operand

>=

Left-hand operand is greater than or equal to right-hand operand

==

Left-hand operand is equal to right-hand operand

!=

Left-hand operand isn't equal to right-hand operand

Relational operators are most often used to write D predicates. Each operator evaluates to a value of type int, which is equal to one if the condition is true, or zero if it's false.

Relational operators can be applied to pairs of integers, pointers, or strings. If pointers are compared, the result is equivalent to an integer comparison of the two pointers interpreted as unsigned integers. If strings are compared, the result is determined as if by performing a strcmp() on the two operands. The following table shows some example D string comparisons and their results.

D string comparison Result

"coffee" < "espresso"

Returns 1 (true)

"coffee" == "coffee"

Returns 1 (true)

"coffee"" >= "mocha"

Returns 0 (false)

Relational operators can also be used to compare a data object associated with an enumeration type with any of the enumerator tags defined by the enumeration.

Logical Operators

Binary logical operators are listed in the following table. The first two operators are equivalent to the corresponding ANSI C operators.

Table 3-9 D Logical Operators

Operator Description

&&

Logical AND: true if both operands are true

||

Logical OR: true if one or both operands are true

^^

Logical XOR: true if exactly one operand is true

Logical operators are most often used in writing D predicates. The logical AND operator performs the following short-circuit evaluation: if the left-hand operand is false, the right-hand expression isn't evaluated. The logical OR operator also performs the following short-circuit evaluation: if the left-hand operand is true, the right-hand expression isn't evaluated. The logical XOR operator doesn't short-circuit. Both expression operands are always evaluated.

In addition to the binary logical operators, the unary ! operator can be used to perform a logical negation of a single operand: it converts a zero operand into a one and a non-zero operand into a zero. By convention, D programmers use ! when working with integers that are meant to represent Boolean values and == 0 when working with non-Boolean integers, although the expressions are equivalent.

The logical operators can be applied to operands of integer or pointer types. The logical operators interpret pointer operands as unsigned integer values. As with all logical and relational operators in D, operands are true if they have a non-zero integer value and false if they have a zero integer value.

Bitwise Operators

D provides the bitwise operators that are listed in the following table for manipulating individual bits inside integer operands. These operators all have the same meaning as in ANSI C.

Table 3-10 D Bitwise Operators

Operator Description

~

Unary operator that can be used to perform a bitwise negation of a single operand: it converts each zero bit in the operand into a one bit, and each one bit in the operand into a zero bit

&

Bitwise AND

|

Bitwise OR

^

Bitwise XOR

<<

Shift the left-hand operand left by the number of bits specified by the right-hand operand

>>

Shift the left-hand operand right by the number of bits specified by the right-hand operand

The shift operators are used to move bits left or right in a particular integer operand. Shifting left fills empty bit positions on the right-hand side of the result with zeroes. Shifting right using an unsigned integer operand fills empty bit positions on the left-hand side of the result with zeroes. Shifting right using a signed integer operand fills empty bit positions on the left-hand side with the value of the sign bit, also known as an arithmetic shift operation.

Shifting an integer value by a negative number of bits or by a number of bits larger than the number of bits in the left-hand operand itself produces an undefined result. The D compiler produces an error message if the compiler can detect this condition when you compile the D program.

Assignment Operators

Binary assignment operators are listed in the following table. You can only modify D variables and arrays. Kernel data objects and constants can not be modified using the D assignment operators. The assignment operators have the same meaning as they do in ANSI C.

Table 3-11 D Assignment Operators

Operator Description

=

Set the left-hand operand equal to the right-hand expression value.

+=

Increment the left-hand operand by the right-hand expression value

-=

Decrement the left-hand operand by the right-hand expression value.

*=

Multiply the left-hand operand by the right-hand expression value.

/=

Divide the left-hand operand by the right-hand expression value.

%=

Modulo the left-hand operand by the right-hand expression value.

|=

Bitwise OR the left-hand operand with the right-hand expression value.

&=

Bitwise AND the left-hand operand with the right-hand expression value.

^=

Bitwise XOR the left-hand operand with the right-hand expression value.

<<=

Shift the left-hand operand left by the number of bits specified by the right-hand expression value.

>>=

Shift the left-hand operand right by the number of bits specified by the right-hand expression value.

Aside from the assignment operator =, the other assignment operators are provided as shorthand for using the = operator with one of the other operators that were described earlier. For example, the expression x = x + 1 is equivalent to the expression x += 1. These assignment operators adhere to the same rules for operand types as the binary forms described earlier.

The result of any assignment operator is an expression equal to the new value of the left-hand expression. You can use the assignment operators or any of the operators described thus far in combination to form expressions of arbitrary complexity. You can use parentheses () to group terms in complex expressions.

Increment and Decrement Operators

D provides the special unary ++ and -- operators for incrementing and decrementing pointers and integers. These operators have the same meaning as they do in ANSI C. These operators can be applied to variables and to the individual elements of a struct, union, or array. The operators can be applied either before or after the variable name. If the operator appears before the variable name, the variable is first changed and then the resulting expression is equal to the new value of the variable. For example, the following two code fragments produce identical results:
x += 1; y = x;

y = ++x;
If the operator appears after the variable name, then the variable is changed after its current value is returned for use in the expression. For example, the following two code fragments produce identical results:
y = x; x -= 1;

y = x--;

You can use the increment and decrement operators to create new variables without declaring them. If a variable declaration is omitted and the increment or decrement operator is applied to a variable, the variable is implicitly declared to be of type int64_t.

To use the increment and decrement operators on elements of an array or struct, place the operator after or before the full reference to the element:

int foo[5];
struct { int a; } bar;

bar.a++;
foo[1]++;
--foo[1];

The increment and decrement operators can be applied to integer or pointer variables. When applied to integer variables, the operators increment, or decrement the corresponding value by one. When applied to pointer variables, the operators increment, or decrement the pointer address by the size of the data type that's referenced by the pointer.

Conditional Expressions

D doesn't provide the facility to use if-then-else constructs. Instead, conditional expressions, by using the ternary operator (?:), can be used to approximate some of this functionality. The ternary operator associates a triplet of expressions, where the first expression is used to conditionally evaluate one of the other two.

For example, the following D statement could be used to set a variable x to one of two strings, depending on the value of i:

x = i == 0 ? "zero" : "non-zero";

In the previous example, the expression i == 0 is first evaluated to determine whether it's true or false. If the expression is true, the second expression is evaluated and its value is returned. If the expression is false, the third expression is evaluated and its value is returned.

As with any D operator, you can use several ?: operators in a single expression to create more complex expressions. For example, the following expression would take a char variable c containing one of the characters 0-9, a-f, or A-F, and return the value of this character when interpreted as a digit in a hexadecimal (base 16) integer:

hexval = (c >= '0' && c <= '9') ? c - '0' : (c >= 'a' && c <= 'f') ? c + 10 - 'a' : c + 10 - 'A';

To be evaluated for its truth value, the first expression that's used with ?: must be a pointer or integer. The second and third expressions can be of any compatible types. You can't construct a conditional expression where, for example, one path returns a string and another path returns an integer. The second and third expressions must be true expressions that have a value. Therefore, data reporting functions can't be used in these expressions because those functions don't return a value. To conditionally trace data, use a predicate instead.

Type Conversions

When expressions are constructed by using operands of different but compatible types, type conversions are performed to determine the type of the resulting expression. The D rules for type conversions are the same as the arithmetic conversion rules for integers in ANSI C. These rules are sometimes referred to as the usual arithmetic conversions.

Each integer type is ranked in the order char, short, int, long, long long, with the corresponding unsigned types assigned a rank higher than its signed equivalent, but below the next integer type. When you construct an expression using two integer operands such as x + y and the operands are of different integer types, the operand type with the highest rank is used as the result type.

If a conversion is required, the operand with the lower rank is first promoted to the type of the higher rank. Promotion doesn't change the value of the operand: it only extends the value to a larger container according to its sign. If an unsigned operand is promoted, the unused high-order bits of the resulting integer are filled with zeroes. If a signed operand is promoted, the unused high-order bits are filled by performing sign extension. If a signed type is converted to an unsigned type, the signed type is first sign-extended and then assigned the new, unsigned type that's determined by the conversion.

Integers and other types can also be explicitly cast from one type to another. Pointers and integers can be cast to any integer or pointer types, but not to other types.

An integer or pointer cast is formed using an expression such as the following:

y = (int)x;

In this example, the destination type is within parentheses and used to prefix the source expression. Integers are cast to types of higher rank by performing promotion. Integers are cast to types of lower rank by zeroing the excess high-order bits of the integer.

Because D doesn't include floating-point arithmetic, no floating-point operand conversion or casting is permitted and no rules for implicit floating-point conversion are defined.

Operator Precedence

D includes complex rules for operator precedence and associativity. The rules provide precise compatibility with the ANSI C operator precedence rules. The entries in the following table are in order from highest precedence to lowest precedence.

Table 3-12 D Operator Precedence and Associativity

Operators Associativity

() [] -> .

Left to right

! ~ ++ -- + - * & (type) sizeof stringof offsetof xlate

Right to left

(Note that these are the unary operators)

* / %

Left to right

+ -

Left to right

<< >>

Left to right

< <= > >=

Left to right

== !=

Left to right

&

Left to right

^

Left to right

|

Left to right

&&

Left to right

^^

Left to right

||

Left to right

?:

Right to left

= += -= *= /= %= &= ^= ?= <<= >>=

Right to left

,

Left to right

The comma (,) operator that's listed in the table is for compatibility with the ANSI C comma operator. It can be used to evaluate a set of expressions in left-to-right order and return the value of the right most expression. This operator is provided for compatibility with C and usage isn't recommended.

The () entry listed in the table of operator precedence represents a function call. A comma is also used in D to list arguments to functions and to form lists of associative array keys. Note that this comma isn't the same as the comma operator and doesn't guarantee left-to-right evaluation. The D compiler provides no guarantee regarding the order of evaluation of arguments to a function or keys to an associative array. Be careful of using expressions with interacting side-effects, such as the pair of expressions i and i++, in these contexts.

The [] entry listed in the table of operator precedence represents an array or associative array reference. Note that aggregations are also treated as associative arrays. The [] operator can also be used to index into fixed-size C arrays.

The following table provides further explanation for the function of several miscellaneous operators that are provided by the D language.

Operators Description

sizeof

Computes the size of an object.

offsetof

Computes the offset of a type member.

stringof

Converts the operand to a string.

xlate

Translates a data type.

unary &

Computes the address of an object.

unary *

Dereferences a pointer to an object.

-> and .

Accesses a member of a structure or union type.

Type and Constant Definitions

This section describes how to declare type aliases and named constants in D. It also discusses D type and namespace management for program and OS types and identifiers.

typedefs

The typedef keyword is used to declare an identifier as an alias for an existing type. The typedef declaration is used outside of probe clauses in the following form:

typedef existing-type new-type ;

where existing-type is any type declaration and new-type is an identifier to be used as the alias for this type. For example, the D compiler uses the following declaration internally to create the uint8_t type alias:

typedef unsigned char uint8_t;

You can use type aliases anywhere that a normal type can be used, such as the type of a variable or associative array value or tuple member. You can also combine typedef with more elaborate declarations such as the definition of a new struct, as shown in the following example:

typedef struct foo {
  int x;
  int y;
} foo_t;

In the previous example, struct foo is defined using the same type as its alias, foo_t. Linux C system headers often use the suffix _t to denote a typedef alias.

Enumerations

Defining symbolic names for constants in a program eases readability and simplifies the process of maintaining the program in the future. One method is to define an enumeration, which associates a set of integers with a set of identifiers called enumerators that the compiler recognizes and replaces with the corresponding integer value. An enumeration is defined by using a declaration such as the following:

enum colors {
  RED,
  GREEN,
  BLUE
};

The first enumerator in the enumeration, RED, is assigned the value zero and each subsequent identifier is assigned the next integer value.

You can also specify an explicit integer value for any enumerator by suffixing it with an equal sign and an integer constant, as shown in the following example:

enum colors {
  RED = 7,
  GREEN = 9,
  BLUE
};

The enumerator BLUE is assigned the value 10 by the compiler because it has no value specified and the previous enumerator is set to 9. When an enumeration is defined, the enumerators can be used anywhere in a D program that an integer constant is used. In addition, the enumeration enum colors is also defined as a type that's equivalent to an int. The D compiler permits a variable of enum type to be used anywhere an int can be used and permits any integer value to be assigned to a variable of enum type. You can also omit the enum name in the declaration, if the type name isn't needed.

Enumerators are visible in all the following clauses and declarations in a program. Therefore, you can't define the same enumerator identifier in more than one enumeration. However, you can define more than one enumerator with the same value in either the same or different enumerations. You can also assign integers that have no corresponding enumerator to a variable of the enumeration type.

The D enumeration syntax is the same as the corresponding syntax in ANSI C. D also provides access to enumerations that are defined in the OS kernel and its loadable modules. Note that these enumerators aren't globally visible in a D program. Kernel enumerators are only visible if you specify one as an argument in a comparison with an object of the corresponding enumeration type. This feature protects D programs against inadvertent identifier name conflicts, with the large collection of enumerations that are defined in the OS kernel.

Inlines

D named constants can also be defined by using inline directives, which provide a more general means of creating identifiers that are replaced by predefined values or expressions during compilation. Inline directives are a more powerful form of lexical replacement than the #define directive provided by the C preprocessor because the replacement is assigned an actual type and is performed by using the compiled syntax tree and not a set of lexical tokens. An inline directive is specified by using a declaration of the following form:

inline type name = expression;

where type is a type declaration of an existing type, name is any valid D identifier that isn't previously defined as an inline or global variable, and expression is any valid D expression. After the inline directive is processed, the D compiler substitutes the compiled form of expression for each subsequent instance of name in the program source.

For example, the following D program would trace the string "hello" and integer value 123:

inline string hello = "hello";
inline int number = 100 + 23;

BEGIN
{
  trace(hello);
  trace(number);
}

An inline name can be used anywhere a global variable of the corresponding type is used. If the inline expression can be evaluated to an integer or string constant at compile time, then the inline name can also be used in contexts that require constant expressions, such as scalar array dimensions.

The inline expression is validated for syntax errors as part of evaluating the directive. The expression result type must be compatible with the type that's defined by the inline, according to the same rules used for the D assignment operator (=). An inline expression can't reference the inline identifier itself: recursive definitions aren't permitted.

The DTrace software packages install several D source files in the system directory /usr/lib64/dtrace/installed-version , which contain inline directives that you can use in D programs.

For example, the signal.d library includes directives of the following form:

inline int SIGHUP = 1;
inline int SIGINT = 2;
inline int SIGQUIT = 3;
...

These inline definitions provide you with access to the current set of Oracle Linux signal names, as described in the sigaction(2) manual page. Similarly, the errno.d library contains inline directives for the C errno constants that are described in the errno(3) manual page.

By default, the D compiler includes all of the provided D library files automatically so that you can use these definitions in any D program.

Type Namespaces

In traditional languages such as ANSI C, type visibility is determined by whether a type is nested inside a function or other declaration. Types declared at the outer scope of a C program are associated with a single global namespace and are visible throughout the entire program. Types that are defined in C header files are typically included in this outer scope. Unlike these languages, D provides access to types from several outer scopes.

D is a language that provides dynamic observability across different layers of a software stack, including the OS kernel, an associated set of loadable kernel modules, and user processes that are running on the system. A single D program can instantiate probes to gather data from several kernel modules or other software entities that are compiled into independent binary objects. Therefore, more than one data type of the same name, sometimes with different definitions, might be present in the universe of types that are available to DTrace and the D compiler. To manage this situation, the D compiler associates each type with a namespace, which is identified by the containing program object. Types from a particular kernel level object, such as the main kernel or a kernel module, can be accessed by specifying the object name and the back quote (`) scoping operator in any type name.

For a kernel module named foo that contains the following C type declaration:

typedef struct bar {
  int x;
} bar_t;

The types struct bar and bar_t could be accessed from D using the following type names:

struct foo`bar
foo`bar_t

For example, the kernel includes a task_struct that's described in include/linux/sched.h. The definition of this struct depends on kernel configuration at build. You can find out information about the struct, such as its size, by referencing it as follows:

sizeof(struct vmlinux`task_struct)

The back quote operator can be used in any context where a type name is appropriate, including when specifying the type for D variable declarations or cast expressions in D probe clauses.

The D compiler also provides two special, built-in type namespaces that use the names C and D. The C type namespace is initially populated with the standard ANSI C intrinsic types, such as int. In addition, type definitions that are acquired by using the C preprocessor (cpp), by running the dtrace -C command, are processed by, and added to the C scope. So, you can include C header files containing type declarations that are already visible in another type namespace without causing a compilation error.

The D type namespace is initially populated with the D type intrinsics, such as int and string, and the built-in D type aliases, such as uint64_t. Any new type declarations that appear in the D program source are automatically added to the D type namespace. If you create a complex type such as a struct in a D program consisting of member types from other namespaces, the member types are copied into the D namespace by the declaration.

When the D compiler encounters a type declaration that doesn't specify an explicit namespace using the back quote operator, the compiler searches the set of active type namespaces to find a match by using the specified type name. The C namespace is always searched first, followed by the D namespace. If the type name isn't found in either the C or D namespace, the type namespaces of the active kernel modules are searched in load address order, which doesn't guarantee any ordering properties among the loadable modules. To avoid type name conflicts with other kernel modules, use the scoping operator when accessing types that are defined in loadable kernel modules.

The D compiler uses the compressed ANSI C debugging information that's provided with the core Linux kernel modules to access the types that are associated with the OS source code, without the need to access the corresponding C include files. Note that this symbolic debugging information might not be available for all kernel modules on the system. The D compiler reports an error if you try to access a type within the namespace of a module that lacks the compressed C debugging information that's intended for use with DTrace.

Variables

D provides several variable types: scalar variables, associative arrays, scalar arrays, and multidimensional scalar arrays. Variables can be created by declaring them explicitly, but are most often created implicitly on first use. Variables can be restricted to clause or thread scope to avoid name conflicts and to control the lifetime of a variable explicitly.

Scalar Variables

Scalar variables are used to represent individual, fixed-size data objects, such as integers and pointers. Scalar variables can also be used for fixed-size objects that are composed of one or more primitive or composite types. D provides the ability to create arrays of objects and composite structures. DTrace also represents strings as fixed-size scalars by permitting them to grow to a predefined maximum length.

To create a scalar variable, you can write an assignment expression of the following form:
name = expression ;
where name is any valid D identifier and expression is any value or expression that the variable contains.

DTrace includes several built-in scalar variables that can be referenced within D programs. The values of these variables are automatically populated by DTrace. See DTrace Built-in Variable Reference for a complete list of these variables.

Associative Arrays

Associative arrays are used to represent collections of data elements that can be retrieved by specifying a key. Associative arrays differ from normal, fixed-size arrays in that they have no predefined limit on the number of elements and can use any expression as a key. Furthermore, elements in an associative array aren't stored in consecutive storage locations.

To create an associative array, you can write an assignment expression of the following form:

name [ key ] = expression ;

where name is any valid D identifier, key is a comma-separated list of one or more expressions, often as string values, and expression is the value that's contained by the array for the specified key.

The type of each object that's contained in the array is also fixed for all elements in the array. You can use any of the assignment operators that are defined in Types, Operators, and Expressions to change associative array elements, subject to the operand rules defined for each operator. The D compiler produces an appropriate error message if you try an incompatible assignment. You can use any type with an associative array key or value that can be used with a scalar variable.

You can reference values in an associative array by specifying the array name and the appropriate key.

You can remove the elements of an associative array by assigning 0 to them. When you remove the elements in the array, the storage that's used for that element is deallocated and made available to the system for use.

Scalar Arrays

Scalar arrays are a fixed-length group of consecutive memory locations that each store a value of the same type. Scalar arrays are accessed by referring to each location with an integer, starting from zero. Scalar arrays aren't used as often in D as associative arrays.

A D scalar array of 5 integers is declared by using the type int and suffixing the declaration with the number of elements in square brackets, for example:

int s[5];

The D expression s[0] refers to the first array element, s[1] refers to the second, and so on. DTrace performs bounds checking on the indexes of scalar arrays at compile time to help catch bad index references early.

Note:

Scalar arrays and associative arrays are syntactically similar. You can declare an associative array of integers referenced by an integer key as follows:

int a[int];

You can also reference this array using the expression a[0], but from a storage and implementation perspective, the two arrays are different. The scalar array s consists of five consecutive memory locations numbered from zero, and the index refers to an offset in the storage that's allocated for the array. However, the associative array a has no predefined size and doesn't store elements in consecutive memory locations. In addition, associative array keys have no relationship to the corresponding value storage location. You can access associative array elements a[0] and a[-5] and only two words of storage are allocated by DTrace. Furthermore, these elements don't have to be consecutive. Associative array keys are abstract names for the corresponding values and have no relationship to the value storage locations.

If you create an array using an initial assignment and use a single integer expression as the array index , for example, a[0] = 2, the D compiler always creates a new associative array, even though in this expression a could also be interpreted as an assignment to a scalar array. Scalar arrays must be predeclared in this situation so that the D compiler can recognize the definition of the array size and infer that the array is a scalar array.

Multidimensional Scalar Arrays

Multidimensional scalar arrays are used infrequently in D, but are provided for compatibility with ANSI C and are for observing and accessing OS data structures that are created by using this capability in C. A multidimensional array is declared as a consecutive series of scalar array sizes within square brackets [] following the base type. For example, to declare a fixed-size, two-dimensional array of integers of dimensions that's 12 rows by 34 columns, you would write the following declaration:

int s[12][34];

A multidimensional scalar array is accessed by using similar notation. For example, to access the value stored at row 0 and column 1, you would write the D expression as follows:

s[0][1]

Storage locations for multidimensional scalar array values are computed by multiplying the row number by the total number of columns declared and then adding the column number.

Be careful not to confuse the multidimensional array syntax with the D syntax for associative array accesses, that's, s[0][1], isn't the same as s[0,1]). If you use an incompatible key expression with an associative array or try an associative array access of a scalar array, the D compiler reports an appropriate error message and refuses to compile the program.

Variable Scope

Variable scoping is used to define where variable names are valid within a program and to avoid variable naming collisions. By using scoped variables you can control the availability of the variable instance to the whole program, a particular thread, or a specific clause.

The following table lists and describes the three primary variable scopes that are available. Note that external variables provide a fourth scope that falls outside of the control of the D program.

Scope Syntax Initial Value Thread-safe? Description

global

myname

0

No

Any probe that fires on any thread accesses the same instance of the variable.

Thread-local

self->myname

0

Yes

Any probe that fires on a thread accesses the thread-specific instance of the variable.

Clause-local

this->myname

Not defined

Yes

Any probe that fires accesses an instance of the variable specific to that particular firing of the probe.

Note:

Note the following information:

  • Scalar variables and associative arrays have a global scope and aren't multi-processor safe (MP-safe). Because the value of such variables can be changed by more than one processor, a variable can become corrupted if more than one probe changes it.

  • Aggregations are MP-safe even though they have a global scope because independent copies are updated locally before a final aggregation produces the global result.

Global Variables

Global variables are used to declare variable storage that's persistent across the entire D program. Global variables provide the broadest scope.

Global variables of any type can be defined in a D program, including associative arrays. The following are some example global variable definitions:

x = 123; /* integer value */
s = "hello"; /* string value */
a[123, 'a'] = 456; /* associative array */

Global variables are created automatically on their first assignment and use the type appropriate for the right side of the first assignment statement. Except for scalar arrays, you don't need to explicitly declare global variables before using them. To create a declaration anyway, you must place it outside of program clauses, for example:

int x; /* declare int x as a global variable */
int x[unsigned long long, char];
syscall::read:entry
{
  x = 123;
  a[123, 'a'] = 456;
}
D variable declarations can't assign initial values. You can use a BEGIN probe clause to assign any initial values. All global variable storage is filled with zeroes by DTrace before you first reference the variable.

Thread-Local Variables

Thread-local variables are used to declare variable storage that's local to each OS thread. Thread-local variables are useful in situations where you want to enable a probe and mark every thread that fires the probe with some tag or other data.

Thread-local variables are referenced by applying the -> operator to the special identifier self, for example:

syscall::read:entry
{
  self->read = 1;
}

This D fragment example enables the probe on the read() system call and associates a thread-local variable named read with each thread that fires the probe. Similar to global variables, thread-local variables are created automatically on their first assignment and assume the type that's used on the right-hand side of the first assignment statement, which is int in this example.

Each time the self->read variable is referenced in the D program, the data object that's referenced is the one associated with the OS thread that was executing when the corresponding DTrace probe fired. You can think of a thread-local variable as an associative array that's implicitly indexed by a tuple that describes the thread's identity in the system. A thread's identity is unique over the lifetime of the system: if the thread exits and the same OS data structure is used to create a thread, this thread doesn't reuse the same DTrace thread-local storage identity.

When you have defined a thread-local variable, you can reference it for any thread in the system, even if the variable in question hasn't been previously assigned for that particular thread. If a thread's copy of the thread-local variable hasn't yet been assigned, the data storage for the copy is defined to be filled with zeroes. As with associative array elements, underlying storage isn't allocated for a thread-local variable until a non-zero value is assigned to it. Also, as with associative array elements, assigning zero to a thread-local variable causes DTrace to deallocate the underlying storage. Always assign zero to thread-local variables that are no longer in use.

Thread-local variables of any type can be defined in a D program, including associative arrays. The following are some example thread-local variable definitions:

self->x = 123; /* integer value */
self->s = "hello"; /* string value */
self->a[123, 'a'] = 456; /* associative array */

You don't need to explicitly declare thread-local variables before using them. To create a declaration anyway, you must place it outside of program clauses by prepending the keyword self, for example:

self int x; /* declare int x as a thread-local variable */ 
syscall::read:entry
{
  self->x = 123;
}

Thread-local variables are kept in a separate namespace from global variables so that you can reuse names. Remember that x and self->x aren't the same variable if you overload names in a program.

Clause-Local Variables

Clause-local variable are used to restrict the storage of a variable to the particular firing of a probe. Clause-local is the narrowest scope. When a probe fires on a CPU, the D script is run in program order. Each clause-local variable is instantiated with an undefined value the first time it is used in the script. The same instance of the variable is used in all clauses until the D script has completed running for that particular firing of the probe.

Clause-local variables can be referenced and assigned by prefixing with this->:

BEGIN
{
  this->secs = timestamp / 1000000000;
  ...
}

To declare a clause-local variable explicitly before using it, you can do so by using the this keyword:

this int x;  /* an integer clause-local variable */
this char c; /* a character clause-local variable */

BEGIN
{
  this->x = 123;
  this->c = 'D';
}

Note that if a program contains several clauses for a single probe, any clause-local variables remain intact as the clauses are run sequentially and clause-local variables are persistent across different clauses that are enabling the same probe. While clause-local variables are persistent across clauses that are enabling the same probe, their values are undefined in the first clause processed for a specified probe. To avoid unexpected results, assign each clause-local variable an appropriate value before using it.

Clause-local variables can be defined using any scalar variable type, but associative arrays can't be defined using clause-local scope. The scope of clause-local variables only applies to the corresponding variable data, not to the name and type identity defined for the variable. When a clause-local variable is defined, this name and type signature can be used in any later D program clause.

You can use clause-local variables to accumulate intermediate results of calculations or as temporary copies of other variables. Access to a clause-local variable is much faster than access to an associative array. Therefore, if you need to reference an associative array value several times in the same D program clause, it's more efficient to copy it into a clause-local variable first and then reference the local variable repeatedly.

External Variables

The D language uses the back quote character (`) as a special scoping operator for accessing symbols or variables that are defined in the OS, outside of the D program itself.

DTrace instrumentation runs inside the Oracle Linux OS kernel. So, in addition to accessing special DTrace variables and probe arguments, you can also access kernel data structures, symbols, and types. These capabilities enable advanced DTrace users, administrators, service personnel, and driver developers to examine low-level behavior of the OS kernel and device drivers.

For example, the Oracle Linux kernel contains a C declaration of a system variable named max_pfn. This variable is declared in C in the kernel source code as follows:

unsigned long max_pfn

To trace the value of this variable in a D program, you can write the following D statement:

trace(`max_pfn);

DTrace associates each kernel symbol with the type that's used for the symbol in the corresponding OS C code, which provides source-based access to the local OS data structures.

Kernel symbol names are kept in a separate namespace from D variable and function identifiers, so you don't need to be concerned about these names conflicting with other D variables. When you prefix a variable with a back quote, the D compiler searches the known kernel symbols and uses the list of loaded modules to find a matching variable definition. Because the Oracle Linux kernel can dynamically load modules with separate symbol namespaces, the same variable name might be used more than once in the active OS kernel. You can resolve these name conflicts by specifying the name of the kernel module that contains the variable to be accessed before the back quote in the symbol name. For example, you would refer to the address of the _bar function that's provided by a kernel module named foo as follows:

foo`_bar

You can apply any of the D operators to external variables, except for those that modify values, subject to the usual rules for operand types. When required, the D compiler loads the variable names that correspond to active kernel modules, so you don't need to declare these variables. You can't apply any operator to an external variable that modifies its value, such as = or +=. For safety reasons, DTrace prevents you from damaging or corrupting the state of the software that you're observing.

When you access external variables from a D program, you're accessing the internal implementation details of another program, such as the OS kernel or its device drivers. These implementation details don't form a stable interface upon which you can rely. Any D programs you write that depend on these details might not work when you next upgrade the corresponding piece of software. For this reason, external variables are typically used to debug performance or functionality problems by using DTrace.

Pointers

Pointers are memory addresses of data objects and reference memory used by the OS, by the user program, or by the D script. Pointers in D are data objects that store an integer virtual address value and associate it with a D type that describes the format of the data stored at the corresponding memory location.

You can explicitly declare a D variable to be of pointer type by first specifying the type of the referenced data and then appending an asterisk (*) to the type name. Doing so indicates you want to declare a pointer type, as shown in the following statement:

int *p;

The statement declares a D global variable named p that's a pointer to an integer. The declaration means that p is a 64-bit integer with a value that's the address of another integer located somewhere in memory. Because the compiled form of the D code is run at probe firing time inside the kernel itself, D pointers are typically pointers associated with the kernel's address space.

To create a pointer to a data object inside the kernel, you can compute its address by using the & operator. For example, the kernel source code declares an unsigned long max_pfn variable. You could trace the address of this variable by tracing the result of applying the & operator to the name of that object in D:

trace(&`max_pfn);

The * operator can be used to specify the object addressed by the pointer, and acts as the inverse of the & operator. For example, the following two D code fragments are equivalent in meaning:

q = &`max_pfn; trace(*q);

trace(`max_pfn); 

In this example, the first fragment creates a D global variable pointer q. Because the max_pfn object is of type unsigned long, the type of &`max_pfn is unsigned long *, a pointer to unsigned long. The type of q is implicit in the declaration. Tracing the value of *q follows the pointer back to the data object max_pfn. This fragment is therefore the same as the second fragment, which directly traces the value of the data object by using its name.

Pointer Safety

DTrace is a robust, safe environment for running D programs. You might write a buggy D program, but invalid D pointer accesses don't cause DTrace or the OS kernel to fail or crash in any way. Instead, the DTrace software detects any invalid pointer accesses, and returns a BADADDR fault; the current clause execution quits, an ERROR probe fires, and tracing continues unless the program called exit for the ERROR probe.

Pointers are required in D because they're an intrinsic part of the OS's implementation in C, but DTrace implements the same kind of safety mechanisms that are found in the Java programming language to prevent buggy programs from affecting themselves or each other. DTrace's error reporting is similar to the runtime environment for the Java programming language that detects a programming error and reports an exception.

To observe DTrace's error handling and reporting, you could write a deliberately bad D program using pointers. For example, in an editor, type the following D program and save it in a file named badptr.d:

BEGIN
{
  x = (int *)NULL;
  y = *x;
  trace(y);
}

The badptr.d program uses a cast expression to convert NULL to be a pointer to an integer. The program then dereferences the pointer by using the expression *x, assigns the result to another variable y, and then tries to trace y. When the D program is run, DTrace detects an invalid pointer access when the statement y = *x is processed and reports the following error:

dtrace: script '/tmp/badptr.d' matched 1 probe
dtrace: error on enabled probe ID 2 (ID 1: dtrace:::BEGIN): invalid address (0x0) in action #1 at BPF pc 156

Notice that the D program moves past the error and continues to run; the system and all observed processes remain unperturbed. You can also add an ERROR probe to any script to handle D errors. For details about the DTrace error mechanism, see ERROR Probe.

Pointer and Array Relationship

A scalar array is represented by a variable that's associated with the address of its first storage location. A pointer is also the address of a storage location with a defined type. Thus, D permits the use of the array [] index notation with both pointer variables and array variables. For example, the following two D fragments are equivalent in meaning:

p = &a[0]; trace(p[2]);

trace(a[2]); 

In the first fragment, the pointer p is assigned to the address of the first element in scalar array a by applying the & operator to the expression a[0]. The expression p[2] traces the value of the third array element (index 2). Because p now contains the same address associated with a, this expression yields the same value as a[2], shown in the second fragment. One consequence of this equivalence is that D permits you to access any index of any pointer or array. If you access memory beyond the end of a scalar array's predefined size, you either get an unexpected result or DTrace reports an invalid address error.

The difference between pointers and arrays is that a pointer variable refers to a separate piece of storage that contains the integer address of some other storage; whereas, an array variable names the array storage itself, not the location of an integer that in turn contains the location of the array.

This difference is manifested in the D syntax if you try to assign pointers and scalar arrays. If x and y are pointer variables, the expression x = y is legal; it copies the pointer address in y to the storage location that's named by x. If x and y are scalar array variables, the expression x = y isn't legal. Arrays can't be assigned as a whole in D. If p is a pointer and a is a scalar array, the statement p = a is permitted. This statement is equivalent to the statement p = &a[0].

Pointer Arithmetic

As in C, pointer arithmetic in D isn't identical to integer arithmetic. Pointer arithmetic implicitly adjusts the underlying address by multiplying or dividing the operands by the size of the type referenced by the pointer.

The following D fragment illustrates this property:

int *x;

BEGIN
{
  trace(x);
  trace(x + 1);
  trace(x + 2);
}

This fragment creates an integer pointer x and then traces its value, its value incremented by one, and its value incremented by two. If you create and run this program, DTrace reports the integer values 0, 4, and 8.

Because x is a pointer to an int (size 4 bytes), incrementing x adds 4 to the underlying pointer value. This property is useful when using pointers to reference consecutive storage locations such as arrays. For example, if x was assigned to the address of an array a, the expression x + 1 would be equivalent to the expression &a[1]. Similarly, the expression *(x + 1) would reference the value a[1]. Pointer arithmetic is implemented by the D compiler whenever a pointer value is incremented by using the +, ++, or =+ operators. Pointer arithmetic is also applied as follows; when an integer is subtracted from a pointer on the left-hand side, when a pointer is subtracted from another pointer, or when the -- operator is applied to a pointer.

For example, the following D program would trace the result 2:

int *x, *y;
int a[5];

BEGIN
{
  x = &a[0];
  y = &a[2];
  trace(y - x);
}

Generic Pointers

Sometimes it's useful to represent or manipulate a generic pointer address in a D program without specifying the type of data referred to by the pointer. Generic pointers can be specified by using the type void *, where the keyword void represents the absence of specific type information, or by using the built-in type alias uintptr_t, which is aliased to an unsigned integer type of size that's appropriate for a pointer in the current data model. You can't apply pointer arithmetic to an object of type void *, and these pointers can't be dereferenced without casting them to another type first. You can cast a pointer to the uintptr_t type when you need to perform integer arithmetic on the pointer value.

Pointers to void can be used in any context where a pointer to another data type is required, such as an associative array tuple expression or the right-hand side of an assignment statement. Similarly, a pointer to any data type can be used in a context where a pointer to void is required. To use a pointer to a non-void type in place of another non-void pointer type, an explicit cast is required. You must always use explicit casts to convert pointers to integer types, such as uintptr_t, or to convert these integers back to the appropriate pointer type.

Pointers to DTrace Objects

The D compiler prohibits you from using the & operator to obtain pointers to DTrace objects such as associative arrays, built-in functions, and variables. You're prohibited from obtaining the address of these variables so that the DTrace runtime environment is free to relocate them as needed between probe firings . In this way, DTrace can more efficiently manage the memory required for programs. If you create composite structures, it's possible to construct expressions that retrieve the kernel address of DTrace object storage. Avoid creating such expressions in D programs. If you need to use such an expression, don't rely on the address being the same across probe firings.

Pointers and Address Spaces

A pointer is an address that provides a translation within some virtual address space to a piece of physical memory. DTrace runs D programs within the address space of the OS kernel itself. The Linux system manages many address spaces: one for the OS kernel itself, and one for each user process. Because each address space provides the illusion that it can access all the memory on the system, the same virtual address pointer value can be reused across address spaces, but translate to different physical memory. Therefore, when writing D programs that use pointers, you must be aware of the address space corresponding to the pointers you intend to use.

For example, if you use the syscall provider to instrument entry to a system call that takes a pointer to an integer or array of integers as an argument, such as, pipe(), it would not be valid to dereference that pointer or array using the * or [] operators because the address in question is an address in the address space of the user process that performed the system call. Applying the * or [] operators to this address in D would result in kernel address space access, which would result in an invalid address error or in returning unexpected data to the D program, depending on whether the address happened to match a valid kernel address.

To access user-process memory from a DTrace probe, you must apply one of the copyin, copyinstr, or copyinto functions. To avoid confusion, take care when writing D programs to name and comment variables storing user addresses appropriately. You can also store user addresses as uintptr_t so that you don't accidentally compile D code that dereferences them..

Structs and Unions

Collections of related variables can be grouped together into composite data objects called structs and unions. You define these objects in D by creating new type definitions for them. You can use any new types for any D variables, including associative array values. This section explores the syntax and semantics for creating and manipulating these composite types and the D operators that interact with them.

Structs

The D keyword struct, short for structure, is used to introduce a new type that's composed of a group of other types. The new struct type can be used as the type for D variables and arrays, enabling you to define groups of related variables under a single name. D structs are the same as the corresponding construct in C and C++. If you have programmed in the Java programming language, think of a D struct as a class that contains only data members and no methods.

Suppose you want to create a more sophisticated system call tracing program in D that records several things about each read() and write() system call that's run for an application, for example, the elapsed time, number of calls, and the largest byte count passed as an argument.

You could write a D clause to record these properties in four separate associative arrays, as shown in the following example:

int ts[string];       /* declare ts */
int calls[string];    /* declare calls */
int elapsed [string];  /* declare elapsed */
int maxbytes[string]; /* declare maxbytes */ 

syscall::read:entry, syscall::write:entry
/pid == $target/
{
  ts[probefunc] = timestamp;
  calls[probefunc]++;
  maxbytes[probefunc] = arg2 > maxbytes[probefunc] ?
        arg2 : maxbytes[probefunc];
}

syscall::read:return, syscall::write:return
/ts[probefunc] != 0 && pid == $target/
{
  elapsed[probefunc] += timestamp - ts[probefunc];
}

END
{
  printf("       calls max bytes elapsed nsecs\n");
  printf("------ ----- --------- -------------\n");
  printf("  read %5d %9d %d\n",
  calls["read"], maxbytes["read"], elapsed["read"]);
  printf(" write %5d %9d %d\n",
  calls["write"], maxbytes["write"], elapsed["write"]);
}

You can make the program easier to read and maintain by using a struct. A struct provides a logical grouping pf data items that belong together. It also saves storage space because all data items can be stored with a single key.

First, declare a new struct type at the top of the D program source file:

struct callinfo {
  uint64_t ts;       /* timestamp of last syscall entry */
  uint64_t elapsed;  /* total elapsed time in nanoseconds */
  uint64_t calls;    /* number of calls made */
  size_t maxbytes;   /* maximum byte count argument */
};

The struct keyword is followed by an optional identifier that's used to refer back to the new type, which is now known as struct callinfo. The struct members are then within a set of braces {} and the entire declaration ends with a semicolon (;). Each struct member is defined by using the same syntax as a D variable declaration, with the type of the member listed first followed by an identifier naming the member and another semicolon (;).

The struct declaration defines the new type. It doesn't create any variables or allocate any storage in DTrace. When declared, you can use struct callinfo as a type throughout the remainder of the D program. Each variable of type struct callinfo stores a copy of the four variables that are described by our structure template. The members are arranged in memory in order, according to the member list, with padding space introduced between members, as required for data object alignment purposes.

You can use the member identifier names to access the individual member values using the “.” operator by writing an expression of the following form:


        variable-name.member-name
      

The following example is an improved program that uses the new structure type. In a text editor, type the following D program and save it in a file named rwinfo.d:

struct callinfo {
  uint64_t ts; /* timestamp of last syscall entry */
  uint64_t elapsed; /* total elapsed time in nanoseconds */
  uint64_t calls; /* number of calls made */
  size_t maxbytes; /* maximum byte count argument */
};

struct callinfo i[string]; /* declare i as an associative array */

syscall::read:entry, syscall::write:entry
/pid == $target/
{
  i[probefunc].ts = timestamp;
  i[probefunc].calls++;
  i[probefunc].maxbytes = arg2 > i[probefunc].maxbytes ?
        arg2 : i[probefunc].maxbytes;
}

syscall::read:return, syscall::write:return
/i[probefunc].ts != 0 && pid == $target/
{
  i[probefunc].elapsed += timestamp - i[probefunc].ts;
}

END
{
  printf("       calls max bytes elapsed nsecs\n");
  printf("------ ----- --------- -------------\n");
  printf("  read %5d %9d %d\n",
  i["read"].calls, i["read"].maxbytes, i["read"].elapsed);
  printf(" write %5d %9d %d\n",
  i["write"].calls, i["write"].maxbytes, i["write"].elapsed);
}

Run the program to return the results for a command. For example run the dtrace -q -s rwinfo.d -c /bin/date command. The date program runs and is traced until it exits and fires the END probe which prints the results:

# dtrace -q -s rwinfo.d -c date
 ...
       calls max bytes elapsed nsecs 
------ ----- --------- ------------- 
 read     2       4096         10689 
 write    1         29          9817

Pointers to Structs

Referring to structs by using pointers is common in C and D. You can use the operator -> to access struct members through a pointer. If struct s has a member m, and you have a pointer to this struct named sp, where sp is a variable of type struct s *, you can either use the * operator to first dereference the sp pointer to access the member:

struct s *sp;
(*sp).m

Or, you can use the -> operator to achieve the same thing:

struct s *sp; 
sp->m

DTrace provides several built-in variables that are pointers to structs. For example, the pointer curpsinfo refers to struct psinfo and its content provides a snapshot of information about the state of the process associated with the thread that fired the current probe. The following table lists a few example expressions that use curpsinfo, including their types and their meanings.

Example Expression Type Meaning

curpsinfo->pr_pid

pid_t

Current process ID

curpsinfo->pr_fname

char []

Executable file name

curpsinfo->pr_psargs

char []

Initial command line arguments

The next example uses the pr_fname member to identify a process of interest. In an editor, type the following script and save it in a file named procfs.d:

syscall::write:entry
/ curpsinfo->pr_fname == "date" /
{
  printf("%s run by UID %d\n", curpsinfo->pr_psargs, curpsinfo->pr_uid);
}

This clause uses the expression curpsinfo->pr_fname to access and match the command name so that the script selects the correct write() requests before tracing the arguments. Notice that by using operator == with a left-hand argument that's an array of char and a right-hand argument that's a string, the D compiler infers that the left-hand argument can be promoted to a string and a string comparison is performed. Type the command dtrace -q -s procs.d in one shell and then run several variations of the date command in another shell. The output that's displayed by DTrace might be similar to the following, indicating that curpsinfo->pr_psargs can show how the command is invoked and also any arguments that are included with the command:

# dtrace -q -s procfs.d 
date  run by UID 500
/bin/date  run by UID 500
date -R  run by UID 500
...
^C
#

Complex data structures are used often in C programs, so the ability to describe and reference structs from D also provides a powerful capability for observing the inner workings of the Oracle Linux OS kernel and its system interfaces.

Unions

Unions are another kind of composite type available in ANSI C and D and are related to structs. A union is a composite type where a set of members of different types are defined and the member objects all occupy the same region of storage. A union is therefore an object of variant type, where only one member is valid at any particular time, depending on how the union has been assigned. Typically, some other variable, or piece of state is used to indicate which union member is currently valid. The size of a union is the size of its largest member. The memory alignment that's used for the union is the maximum alignment required by the union members.

Member Sizes and Offsets

You can determine the size in bytes of any D type or expression, including a struct or union, by using the sizeof operator. The sizeof operator can be applied either to an expression or to the name of a type surrounded by parentheses, as illustrated in the following two examples:

sizeof expression 
sizeof (type-name)

For example, the expression sizeof (uint64_t) would return the value 8, and the expression sizeof (callinfo.ts) would also return 8, if inserted into the source code of the previous example program. The formal return type of the sizeof operator is the type alias size_t, which is defined as an unsigned integer that's the same size as a pointer in the current data model and is used to represent byte counts. When the sizeof operator is applied to an expression, the expression is validated by the D compiler, but the resulting object size is computed at compile time and no code for the expression is generated. You can use sizeof anywhere an integer constant is required.

You can use the companion operator offsetof to determine the offset in bytes of a struct or union member from the start of the storage that's associated with any object of the struct or union type. The offsetof operator is used in an expression of the following form:

offsetof (type-name, member-name)

Here, type-name is the name of any struct or union type or type alias, and member-name is the identifier naming a member of that struct or union. Similar to sizeof, offsetof returns a size_t and you can use it anywhere in a D program that an integer constant can be used.

Bit-Fields

D also permits the definition of integer struct and union members of arbitrary numbers of bits, known as bit-fields. A bit-field is declared by specifying a signed or unsigned integer base type, a member name, and a suffix indicating the number of bits to be assigned for the field, as shown in the following example:

struct s 
{
  int a : 1;
  int b : 3;
  int c : 12;
};

The bit-field width is an integer constant that's separated from the member name by a trailing colon. The bit-field width must be positive and must be of a number of bits not larger than the width of the corresponding integer base type. Bit-fields that are larger than 64 bits can't be declared in D. D bit-fields provide compatibility with and access to the corresponding ANSI C capability. Bit-fields are typically used in situations when memory storage is at a premium or when a struct layout must match a hardware register layout.

A bit-field is a compiler construct that automates the layout of an integer and a set of masks to extract the member values. The same result can be achieved by defining the masks yourself and using the & operator. The C and D compilers try to pack bits as efficiently as possible, but they're free to do so in any order or fashion. Therefore, bit-fields aren't guaranteed to produce identical bit layouts across differing compilers or architectures. If you require stable bit layout, construct the bit masks yourself and extract the values by using the & operator.

A bit-field member is accessed by specifying its name with the “.” or -> operators, similar to any other struct or union member. The bit-field is automatically promoted to the next largest integer type for use in any expressions. Because bit-field storage can't be aligned on a byte boundary or be a round number of bytes in size, you can't apply the sizeof or offsetof operators to a bit-field member. The D compiler also prohibits you from taking the address of a bit-field member by using the & operator.

DTrace String Processing

DTrace provides facilities for tracing and manipulating strings. This section describes the complete set of D language features for declaring and manipulating strings. Unlike ANSI C, strings in D have their own built-in type and operator support to enable you to easily and unambiguously use them in tracing programs.

String Representation

In DTrace, strings are represented as an array of characters ending in a null byte, which is a byte with a value of zero, usually written as '\0'. The visible part of the string is of variable length, depending on the location of the null byte, but DTrace stores each string in a fixed-size array so that each probe traces a consistent amount of data. Strings cannot exceed the length of the predefined string limit. However, the limit can be modified in your D program or on the dtrace command line by tuning the strsize option. The default string limit is 256 bytes.

The D language provides an explicit string type rather than using the type char * to refer to strings. The string type is equivalent to char *, in that it's the address of a sequence of characters, but the D compiler and D functions such as trace provide enhanced capabilities when applied to expressions of type string. For example, the string type removes the ambiguity of type char * when you need to trace the actual bytes of a string.

In the following D statement, if s is of type char *, DTrace traces the value of the pointer s, which means it traces an integer address value:

trace(s);

In the following D statement, by the definition of the * operator, the D compiler dereferences the pointer s and traces the single character at that location:

trace(*s);

These behaviors enable you to manipulate character pointers that refer to either single characters, or to arrays of byte-sized integers that aren't strings and don't end with a null byte.

In the next D statement, if s is of type string, the string type indicates to the D compiler that you want DTrace to trace a null terminated string of characters whose address is stored in the variable s:

trace(s);

You can also perform lexical comparison of expressions of type string. See String Comparison.

String Constants

String constants are enclosed in pairs of double quotes ("") and are automatically assigned the type string by the D compiler. You can define string constants of any length, limited only by the amount of memory DTrace is permitted to consume on your system and by whatever limit you have set for the strsize DTrace runtime option. The terminating null byte (\0) is added automatically by the D compiler to any string constants that you declare. The size of a string constant object is the number of bytes associated with the string, plus one additional byte for the terminating null byte.

A string constant can't contain a literal newline character. To create strings containing newlines, use the \n escape sequence instead of a literal newline. String constants can also contain any of the special character escape sequences that are defined for character constants.

String Assignment

Unlike the assignment of char * variables, strings are copied by value and not by reference. The string assignment operator = copies the actual bytes of the string from the source operand up to and including the null byte to the variable on the left-hand side, which must be of type string.

You can use a declaration to create a string variable:

string s;

Or you can create a string variable by assigning it an expression of type string.

For example, the D statement:

s = "hello";

creates a variable s of type string and copies the six bytes of the string "hello" into it (five printable characters, plus the null byte).

String assignment is analogous to the C library function strcpy(), with the exception that if the source string exceeds the limit of the storage of the destination string, the resulting string is automatically truncated by a null byte at this limit.

You can also assign to a string variable an expression of a type that's compatible with strings. In this case, the D compiler automatically promotes the source expression to the string type and performs a string assignment. The D compiler permits any expression of type char * or of type char[n], a scalar array of char of any size, to be promoted to a string.

String Conversion

Expressions of other types can be explicitly converted to type string by using a cast expression or by applying the special stringof operator, which are equivalent in the following meaning:

s = (string) expression;

s = stringof (expression);

The expression is interpreted as an address to the string.

The stringof operator binds very tightly to the operand on its right-hand side. You can optionally surround the expression by using parentheses, for clarity.

Scalar type expressions, such as a pointer or integer, or a scalar array address can be converted to strings, in that the scalar is interpreted as an address to a char type. Expressions of other types such as void may not be converted to string. If you erroneously convert an invalid address to a string, the DTrace safety features prevents you from damaging the system or DTrace, but you might end up tracing a sequence of undecipherable characters.

String Comparison

D overloads the binary relational operators and permits them to be used for string comparisons, as well as integer comparisons. The relational operators perform string comparison whenever both operands are of type string or when one operand is of type string and the other operand can be promoted to type string. See String Assignment for a detailed description. See also Table 3-13, which lists the relational operators that can be used to compare strings.

Table 3-13 D Relational Operators for Strings

Operator Description

<

Left-hand operand is less than right-operand.

<=

Left-hand operand is less than or equal to right-hand operand.

>

Left-hand operand is greater than right-hand operand.

>=

Left-hand operand is greater than or equal to right-hand operand.

==

Left-hand operand is equal to right-hand operand.

!=

Left-hand operand is not equal to right-hand operand.

As with integers, each operator evaluates to a value of type int, which is equal to one if the condition is true or zero if it is false.

The relational operators compare the two input strings byte-by-byte, similarly to the C library routine strcmp(). Each byte is compared by using its corresponding integer value in the ASCII character set until a null byte is read or the maximum string length is reached. See the ascii(7) manual page for more information. Some example D string comparisons and their results are shown in the following table.

D string comparison Result

"coffee" < "espresso"

Returns 1 (true)

"coffee" == "coffee"

Returns 1 (true)

"coffee"" >= "mocha"

Returns 0 (false)

Note:

Identical Unicode strings might compare as being different if one or the other of the strings isn't normalized.

Aggregations

Aggregations enable you to accumulate data for statistical analysis. The aggregation is calculated at runtime, so that post-processing isn't required and processing is highly efficient and accurate. Aggregations function similarly to associative arrays, but are populated by aggregating functions. In D, the syntax for an aggregation is as follows:

@name[ keys ] = aggfunc( args );

The aggregation name is a D identifier that's prefixed with the special character @. All aggregations that are named in D programs are global variables. Aggregations can't have thread-local or clause-local scope. The aggregation names are kept in an identifier namespace that's separate from other D global variables. If you reuse names, remember that a and @a are not the same variable. The special aggregation name @ can be used to name an anonymous aggregation in D programs. The D compiler treats this name as an alias for the aggregation name @_.

Aggregations can be regular or indexed. Indexed aggregations use keys, where keys are a comma-separated list of D expressions, similar to the tuples of expressions used for associative arrays. Regular aggregations are treated similarly to indexed aggregations, but don't use keys for indexing.

The aggfunc is one of the DTrace aggregating functions, and args is a comma-separated list of arguments appropriate to that function. Most aggregating functions take a single argument that represents the new datum.

Aggregation Functions

The following functions are aggregating functions that can be used in a program to collect data and present it in a meaningful way.

  • avg: Stores the arithmetic average of the specified expressions in an aggregation.

  • count: Stores an incremented count value in an aggregation.

  • max: Stores the largest value among the specified expressions in an aggregation.

  • min: Stores the smallest value among the specified expressions in an aggregation.

  • sum: Stores the total value of the specified expression in an aggregation.

  • stddev: Stores the standard deviation of the specified expressions in an aggregation.

  • quantize: Stores a power-of-two frequency distribution of the values of the specified expressions in an aggregation. An optional increment can be specified.

  • lquantize: Stores the linear frequency distribution of the values of the specified expressions, sized by the specified range, in an aggregation.

  • llquantize: Stores the log-linear frequency distribution in an aggregation.

Printing Aggregations

By default, several aggregations are displayed in the order in which they're introduced in the D program. You can override this behavior by using the printa function to print the aggregations. The printa function also lets you precisely format the aggregation data by using a format string.

If an aggregation isn't formatted with a printa statement in a D program, the dtrace command snapshots the aggregation data and prints the results after tracing has completed, using the default aggregation format. If an aggregation is formatted with a printa statement, the default behavior is disabled. You can achieve the same results by adding the printa(@aggregation-name) statement to an END probe clause in a program.

The default output format for the avg, count, min, max, stddev, and sum aggregating functions displays an integer decimal value corresponding to the aggregated value for each tuple. The default output format for the quantize, lquantize, and llquantize aggregating functions displays an ASCII histogram with the results. Aggregation tuples are printed as though trace had been applied to each tuple element.

Data Normalization

When aggregating data over some period, you might want to normalize the data based on some constant factor. This technique lets you compare disjointed data more easily. For example, when aggregating system calls, you might want to output system calls as a per-second rate instead of as an absolute value over the course of the run. The DTrace normalize function lets you normalize data in this way. The parameters to normalize are an aggregation and a normalization factor. The output of the aggregation shows each value divided by the normalization factor.

Speculation

DTrace includes a speculative tracing facility that can be used to tentatively trace data at one or more probe locations. You can then decide to commit the data to the principal buffer at another probe location. You can use speculation to trace data that only contains the output that's of interest; no extra processing is required and the DTrace overhead is minimized.

Speculation is achieved by:
  • Setting up a temporary speculation buffer
  • Instructing on or more clauses to trace to the speculation buffer
  • Committing the data in the speculation buffer to the primary buffer; or discarding the speculation buffer.

You can choose to commit or discard speculation data when certain conditions are met, by using the appropriate functions within a clause. By using speculation, you can trace data for a set of probes until a condition is met and then either dispose of the data if it isn't useful, or keep it.

The following table describes DTrace speculation functions.

Table 3-14 DTrace Speculation Functions

Function Args Description

speculation

None

Returns an identifier for a new speculative buffer.

speculate

ID

Denotes that the remainder of the clause must be traced to the speculative buffer specified by ID.

commit

ID

Commits the speculative buffer that's associated with ID.

discard

ID

Discards the speculative buffer that's associated with ID.

Example 3-1 How to use speculation

The following example illustrates how to use speculation. All speculation functions must be used together for speculation to work correctly.

The speculation is created for the syscall::open:entry probe and the ID for the speculation is attached to a thread-local variable. The first argument of the open() system call is traced to the speculation buffer by using the printf function.

Three more clauses are included for the syscall::open:return probe. In the first of these clauses, the errno is traced to the speculative buffer. The predicate for the second of the clauses filters for a non-zero errno value and commits the speculation buffer. The predicate of the third of the clauses filters for a zero errno value and discards the speculation buffer.

The output of the program is returned for the primary data buffer, so the program effectively returns the file name and error number when an open() system call fails. If the call doesn't fail, the information that was traced into the speculation buffer is discarded.

syscall::open:entry
{
  /*
   * The call to speculation() creates a new speculation. If this fails,
   * dtrace will generate an error message indicating the reason for
   * the failed speculation(), but subsequent speculative tracing will be
   * silently discarded.
   */
  self->spec = speculation();
  speculate(self->spec);

  /*
   * Because this printf() follows the speculate(), it is being
   * speculatively traced; it will only appear in the primary data buffer if the
   * speculation is subsequently committed.
   */
  printf("%s", copyinstr(arg0));
}

syscall::open:return
/self->spec/
{
  /*
   * Trace the errno value into the speculation buffer.
   */
  speculate(self->spec);
  trace(errno);
}

syscall::open:return
/self->spec && errno != 0/
{
  /*
   * If errno is non-zero, commit the speculation.
   */
  commit(self->spec);
  self->spec = 0;
}

syscall::open:return
/self->spec && errno == 0/
{
  /*
   * If errno is not set, discard the speculation.
   */
  discard(self->spec);
  self->spec = 0;
}