3 D Program Syntax Reference
This reference describes how to write D programs that can be used with DTrace to enable probes and perform operations.
Program Structure
A D program consists of a set of clauses that describe the probes to enable, an optional predicate that controls when to run, and one or more statements that often describe some functionality to implement when the probe fires. D programs can also contain declarations of variables and definitions of new types. A probe clause declaration uses the following structure:
probe descriptions
/ predicate /
{
statements
}
- Probe Descriptions
-
Probe descriptions ideally express the full description for a probe and take the form:
provider:module:function:name
The field descriptors are defined as follows:
- provider
-
The name of the DTrace provider that the probe belongs to.
- module
-
If the probe corresponds to a specific program location, the name of the kernel module, library, or user-space program in which the probe is found. Some probes might be associated with a module name that isn't tied to a particular source location in cases where they relate to more abstract tracepoints.
- function
-
If the probe corresponds to a specific program location, the name of the program function in which the probe is found.
- name
-
The name that provides some idea of the probe's semantic meaning, such as
BEGIN
orEND
.
DTrace recognizes a form of shorthand when referencing probes. By convention, if you don't specify all the fields of a probe description, DTrace can match a request to all the probes with matching values in the parts of the name that you do specify. For example, you can reference the probe name
BEGIN
in a script to match any probe with the name fieldBEGIN
, regardless of the value of the provider, module, and function fields. For example, you might see a probe referenced as:BEGIN
If a probe is referenced in a D program and it doesn't use a full probe description, the fields are interpreted based on an order of precedence:
- A single component matches the probe name, expressed
as:
name
- Two components match the function and probe name, expressed
as:
function:name
- Three components match the module, function, and probe
name
module:function:name
Although probes can also be referenced by their ID, this value can change over time. The number of probes on the system doesn't directly correlate to the ID, because new provider modules can be loaded at any time and some providers also offer the ability to create new probes on-the-fly. Avoid using the numerical probe ID to reference a probe.
Probe descriptions also support a pattern-matching syntax similar to the shell globbing pattern matching syntax that's described in the
sh(1)
manual page. For example, you can use the asterisk symbol (*) to perform a wildcard match, as in the following description:sdt:::tcp*
If any fields are blank in the probe description, a wildcard match is performed on that field.
Unless matching several probes intentionally, specifying the full probe description to avoid unpredictable results is better practice.
Table 3-1 Probe Name Pattern Matching Characters
Symbol Description *
Matches any string, including the null string.
?
Matches any single character.
[]
Matches any one of the characters inside the square brackets. A pair of characters separated by
-
matches any character between the pair, inclusive. If the first character after the[
is!
, any character not within the set is matched.\
Interpret the next character as itself, without any special meaning.
To successfully match and enable a probe, the complete probe description must match on every field. A probe description field that isn't a pattern must exactly match the corresponding field of the probe. Note that a description field that's empty matches any probe.
Several probes can be included in a comma-separated list. By including several probes in the description, the same predicate, and function sequences are applied when each probe is activated.
- Predicates
-
Predicates are expressions that appear between a pair of slashes (
//
) that are then evaluated at probe firing time to decide whether the associated functions must be processed. Predicates are the primary conditional construct that are used for building more complex control flow in a D program. You can omit the predicate section of the probe clause entirely for any probe so that the functions are always processed when the probe is activated.Predicate expressions can use any of the D operators and can include any D data objects such as variables and constants. The predicate expression must evaluate to a value of integer or pointer type so that it can be considered as true or false. As with all D expressions, a zero value is interpreted as false and any non-zero value is interpreted as true.
- Statements
-
Statements are described by a list of expressions or functions that are separated by semicolons (
;
) and within braces ({}
). An empty set of braces with no statements included causes the default action to be processed. The Default Action reports the probe activation.
A program can consist of several probe-clause declarations. Clauses run in program order.
A program can be stored on the file system and can be run by the DTrace utility. You can
transform a program into an executable script by prepending the file with an interpreter
directive that calls the dtrace command along with any required
options, as a single argument, to run the program. See the sh(1)
manual
page for more information on adding the interpreter line to the beginning of a script. The
interpreter directive might look as follows:
#!/usr/sbin/dtrace -qs
A script can also include D pragma directives to set runtime and compiler options. See DTrace Runtime and Compile-time Options Reference for more information on including this information in a script.
Types, Operators, and Expressions
D provides the ability to access and manipulate various data objects: variables and data structures can be created and changed, data objects that are defined in the OS kernel and user processes can be accessed, and integer, floating-point, and string constants can be declared. D provides a superset of the ANSI C operators that are used to manipulate objects and create complex expressions. This section describes the detailed set of rules for types, operators, and expressions.
Identifier Names and Keywords
D identifier names are composed of uppercase and lowercase letters, digits, and
underscores, where the first character must be a letter or underscore. All identifier names
beginning with an underscore (_
) are reserved for use by the D system
libraries. Avoid using these names in D programs. By convention, D programmers typically use
mixed-case names for variables and all uppercase names for constants.
D language keywords are special identifiers that are reserved for use in the programming language syntax itself. These names are always specified in lowercase and must not be used for the names of D variables. The following table lists the keywords that are reserved for use by the D language.
Table 3-2 D Keywords
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D reserves for use as keywords a superset of the ANSI C keywords. The keywords reserved
for future use by the D language are marked with “*
”. The D compiler
produces a syntax error if you try to use a keyword that's reserved for future use. The
keywords that are defined by D but not defined by ANSI C are marked with
“+
”. D provides the complete set of types and operators found in ANSI C.
The major difference in D programming is the absence of control-flow constructs. Note that
keywords associated with control-flow in ANSI C are reserved for future use in D.
Data Types and Sizes
D provides fundamental data types for integers and floating-point constants. Arithmetic can only be performed on integers in D programs. Floating-point constants can be used to initialize data structures, but floating-point arithmetic isn't permitted in D. D provides a 64-bit data model for use in writing programs.
The names of the integer types and their sizes in the 64-bit data model are shown in the following table. Integers are always represented in twos-complement form in the native byte-encoding order of a system.
Table 3-3 D Integer Data Types
Type Name | 64-bit Size |
---|---|
|
1 byte |
|
2 bytes |
|
4 bytes |
|
8 bytes |
|
8 bytes |
Integer types, including char
, can be prefixed with the signed or
unsigned qualifier. Integers are implicitly signed unless the unsigned qualifier isn't
specified. The D compiler also provides the type aliases that are listed in the following
table.
Table 3-4 D Integer Type Aliases
Type Name | Description |
---|---|
|
1-byte signed integer |
|
2-byte signed integer |
|
4-byte signed integer |
|
8-byte signed integer |
|
Signed integer of size equal to a pointer |
|
1-byte unsigned integer |
|
2-byte unsigned integer |
|
4-byte unsigned integer |
|
8-byte unsigned integer |
|
Unsigned integer of size equal to a pointer |
These type aliases are equivalent to using the name of the corresponding base type listed in
the previous table and are appropriately defined for each data model. For example, the
uint8_t
type name is an alias for the type unsigned
char
.
Note:
The predefined type aliases can't be used in files that are included by the preprocessor.
D provides floating-point types for compatibility with ANSI C declarations and types.
Floating-point operators aren't available in D, but floating-point data objects can be
traced and formatted with the printf
function. You can use the
floating-point types that are listed in the following table.
Table 3-5 D Floating-Point Data Types
Type Name | 64-bit Size |
---|---|
|
4 bytes |
|
8 bytes |
|
16 bytes |
D also provides the special type string
to
represent ASCII strings. Strings are discussed in more detail in
DTrace String Processing.
Constants
Integer constants can be written in decimal
(12345
), octal (012345
),
or hexadecimal (0x12345
) format. Octal (base
8) constants must be prefixed with a leading zero. Hexadecimal
(base 16) constants must be prefixed with either
0x
or 0X
. Integer
constants are assigned the smallest type among
int
, long
, and
long long
that can represent their value. If
the value is negative, the signed version of the type is used.
If the value is positive and too large to fit in the signed type
representation, the unsigned type representation is used. You
can apply one of the suffixes listed in the following table to
any integer constant to explicitly specify its D type.
Suffix | D type |
---|---|
|
|
|
|
|
|
|
|
|
|
Floating-point constants are always written in decimal format
and must contain either a decimal point
(12.345
), an exponent
(123e45
), or both (
123.34e-5
). Floating-point constants are assigned the
type double
by default. You can apply one of
the suffixes listed in the following table to any floating-point
constant to explicitly specify its D type.
Suffix | D type |
---|---|
|
|
|
|
Character constants are written as a single character or escape sequence that's inside a
pair of single quotes ('a'
). Character constants are assigned the
int
type rather than char
and are equivalent to an
integer constant with a value that's determined by that character's value in the ASCII
character set. See the ascii(7)
manual page for a list of characters and
their values. You can also use any of the special escape sequences that are listed in the
following table. D uses the same escape sequences as those found in ANSI C.
Table 3-6 Character Escape Sequences
Escape Sequence | Represents | Escape Sequence | Represents |
---|---|---|---|
|
alert |
|
backslash |
|
backspace |
|
question mark |
|
form feed |
|
single quote |
|
newline |
|
double quote |
|
carriage return |
|
octal value 0oo |
|
horizontal tab |
|
hexadecimal value 0xhh |
|
vertical tab |
|
null character |
You can include more than one character specifier inside single quotes to create integers with individual bytes that are initialized according to the corresponding character specifiers. The bytes are read left-to-right from a character constant and assigned to the resulting integer in the order corresponding to the native endianness of the operating environment. Up to eight character specifiers can be included in a single character constant.
Strings constants of any length can be composed by enclosing them in a pair of double
quotes ("hello"
). A string constant can't contain a literal newline
character. To create strings containing newlines, use the \n
escape
sequence instead of a literal newline. String constants can contain any of the special
character escape sequences that are shown for character constants before. Similar to ANSI C,
strings are represented as arrays of characters that end with a null character
(\0
) that's implicitly added to each string constant you declare. String
constants are assigned the special D type string
. The D compiler provides a
set of special features for comparing and tracing character arrays that are declared as
strings.
Arithmetic Operators
Binary arithmetic operators are described in the following table. These operators all have the same meaning for integers that they do in ANSI C.
Table 3-7 Binary Arithmetic Operators
Operator | Description |
---|---|
|
Integer addition |
|
Integer subtraction |
|
Integer multiplication |
|
Integer division |
|
Integer modulus |
Arithmetic in D can only be performed on integer operands or on pointers. Arithmetic can't be performed on floating-point operands in D programs. The DTrace execution environment doesn't take any action on integer overflow or underflow. You must check for these conditions in situations where overflow and underflow can occur.
However, the DTrace execution environment does automatically check for and report division
by zero errors resulting from improper use of the /
and %
operators. If a D program contains an invalid division operation that's detectable at
compile time, a compile error is returned and the compilation fails. If the invalid division
operation takes place at run time, processing of the current clause is quit, and the ERROR
probe is activated. If the D program has no clause for the ERROR probe, the error is printed
and tracing continues. Otherwise, the actions in the clause assigned to the ERROR probe are
processed. Errors that are detected by DTrace have no effect on other DTrace users or on the
OS kernel. You therefore don't need to be concerned about causing any damage if a D program
inadvertently contains one of these errors.
In addition to these binary operators, the +
and -
operators can also be used as unary operators, and these operators have higher precedence
than any of the binary arithmetic operators. The order of precedence and associativity
properties for all D operators is presented in Operator Precedence. You can control precedence by
grouping expressions in parentheses (()
).
Relational Operators
Binary relational operators are described in the following table. These operators all have the same meaning that they do in ANSI C.
Table 3-8 D Relational Operators
Operator | Description |
---|---|
|
Left-hand operand is less than right-operand |
|
Left-hand operand is less than or equal to right-hand operand |
|
Left-hand operand is greater than right-hand operand |
|
Left-hand operand is greater than or equal to right-hand operand |
|
Left-hand operand is equal to right-hand operand |
|
Left-hand operand isn't equal to right-hand operand |
Relational operators are most often used to write D predicates. Each operator evaluates to
a value of type int
, which is equal to one if the condition is
true
, or zero if it's false
.
Relational operators can be applied to pairs of integers,
pointers, or strings. If pointers are compared, the result is
equivalent to an integer comparison of the two pointers
interpreted as unsigned integers. If strings are compared, the
result is determined as if by performing a
strcmp()
on the two operands. The following
table shows some example D string comparisons and their results.
D string comparison | Result |
---|---|
|
Returns 1 ( |
|
Returns 1 ( |
|
Returns 0 ( |
Relational operators can also be used to compare a data object associated with an enumeration type with any of the enumerator tags defined by the enumeration.
Logical Operators
Binary logical operators are listed in the following table. The first two operators are equivalent to the corresponding ANSI C operators.
Table 3-9 D Logical Operators
Operator | Description |
---|---|
|
Logical |
|
Logical |
|
Logical |
Logical operators are most often used in writing D predicates. The logical
AND
operator performs the following short-circuit evaluation: if the
left-hand operand is false, the right-hand expression isn't evaluated. The logical
OR
operator also performs the following short-circuit evaluation: if the
left-hand operand is true, the right-hand expression isn't evaluated. The logical
XOR
operator doesn't short-circuit. Both expression operands are always
evaluated.
In addition to the binary logical operators, the unary !
operator can be
used to perform a logical negation of a single operand: it converts a zero operand into a
one and a non-zero operand into a zero. By convention, D programmers use !
when working with integers that are meant to represent Boolean values and ==
0
when working with non-Boolean integers, although the expressions are
equivalent.
The logical operators can be applied to operands of integer or pointer types. The logical operators interpret pointer operands as unsigned integer values. As with all logical and relational operators in D, operands are true if they have a non-zero integer value and false if they have a zero integer value.
Bitwise Operators
D provides the bitwise operators that are listed in the following table for manipulating individual bits inside integer operands. These operators all have the same meaning as in ANSI C.
Table 3-10 D Bitwise Operators
Operator | Description |
---|---|
|
Unary operator that can be used to perform a bitwise negation of a single operand: it converts each zero bit in the operand into a one bit, and each one bit in the operand into a zero bit |
|
Bitwise |
|
Bitwise |
|
Bitwise |
|
Shift the left-hand operand left by the number of bits specified by the right-hand operand |
|
Shift the left-hand operand right by the number of bits specified by the right-hand operand |
The shift operators are used to move bits left or right in a particular integer operand. Shifting left fills empty bit positions on the right-hand side of the result with zeroes. Shifting right using an unsigned integer operand fills empty bit positions on the left-hand side of the result with zeroes. Shifting right using a signed integer operand fills empty bit positions on the left-hand side with the value of the sign bit, also known as an arithmetic shift operation.
Shifting an integer value by a negative number of bits or by a number of bits larger than the number of bits in the left-hand operand itself produces an undefined result. The D compiler produces an error message if the compiler can detect this condition when you compile the D program.
Assignment Operators
Binary assignment operators are listed in the following table. You can only modify D variables and arrays. Kernel data objects and constants can not be modified using the D assignment operators. The assignment operators have the same meaning as they do in ANSI C.
Table 3-11 D Assignment Operators
Operator | Description |
---|---|
|
Set the left-hand operand equal to the right-hand expression value. |
|
Increment the left-hand operand by the right-hand expression value |
|
Decrement the left-hand operand by the right-hand expression value. |
|
Multiply the left-hand operand by the right-hand expression value. |
|
Divide the left-hand operand by the right-hand expression value. |
|
Modulo the left-hand operand by the right-hand expression value. |
|
Bitwise OR the left-hand operand with the right-hand expression value. |
|
Bitwise AND the left-hand operand with the right-hand expression value. |
|
Bitwise XOR the left-hand operand with the right-hand expression value. |
|
Shift the left-hand operand left by the number of bits specified by the right-hand expression value. |
|
Shift the left-hand operand right by the number of bits specified by the right-hand expression value. |
Aside from the assignment operator =
, the other assignment operators are
provided as shorthand for using the =
operator with one of the other
operators that were described earlier. For example, the expression x = x +
1
is equivalent to the expression x += 1
. These assignment
operators adhere to the same rules for operand types as the binary forms described earlier.
The result of any assignment operator is an expression equal to
the new value of the left-hand expression. You can use the
assignment operators or any of the operators described thus far
in combination to form expressions of arbitrary complexity. You
can use parentheses ()
to group terms in
complex expressions.
Increment and Decrement Operators
++
and --
operators for
incrementing and decrementing pointers and integers. These operators have the same meaning
as they do in ANSI C. These operators can be applied to variables and to the individual
elements of a struct, union, or array. The operators can be applied either before or after
the variable name. If the operator appears before the variable name, the variable is first
changed and then the resulting expression is equal to the new value of the variable. For
example, the following two code fragments produce identical results:
x += 1; y = x;
y = ++x;
y = x; x -= 1;
y = x--;
You can use the increment and decrement operators to create new
variables without declaring them. If a variable declaration is
omitted and the increment or decrement operator is applied to a
variable, the variable is implicitly declared to be of type
int64_t
.
To use the increment and decrement operators on elements of an array or struct, place the operator after or before the full reference to the element:
int foo[5];
struct { int a; } bar;
bar.a++;
foo[1]++;
--foo[1];
The increment and decrement operators can be applied to integer or pointer variables. When applied to integer variables, the operators increment, or decrement the corresponding value by one. When applied to pointer variables, the operators increment, or decrement the pointer address by the size of the data type that's referenced by the pointer.
Conditional Expressions
D doesn't provide the facility to use if-then-else
constructs. Instead,
conditional expressions, by using the ternary operator (?:
), can be used to
approximate some of this functionality. The ternary operator associates a triplet of
expressions, where the first expression is used to conditionally evaluate one of the other
two.
For example, the following D statement could be used to set a
variable x
to one of two strings, depending
on the value of i
:
x = i == 0 ? "zero" : "non-zero";
In the previous example, the expression i == 0
is first evaluated to
determine whether it's true or false. If the expression is true, the second expression is
evaluated and its value is returned. If the expression is false, the third expression is
evaluated and its value is returned.
As with any D operator, you can use several ?:
operators in a single
expression to create more complex expressions. For example, the following expression would
take a char
variable c
containing one of the characters
0-9
, a-f
, or A-F
, and return the value
of this character when interpreted as a digit in a hexadecimal (base 16) integer:
hexval = (c >= '0' && c <= '9') ? c - '0' : (c >= 'a' && c <= 'f') ? c + 10 - 'a' : c + 10 - 'A';
To be evaluated for its truth value, the first expression that's used with
?:
must be a pointer or integer. The second and third expressions can be
of any compatible types. You can't construct a conditional expression where, for example,
one path returns a string and another path returns an integer. The second and third
expressions must be true expressions that have a value. Therefore, data reporting functions
can't be used in these expressions because those functions don't return a value. To
conditionally trace data, use a predicate instead.
Type Conversions
When expressions are constructed by using operands of different but compatible types, type conversions are performed to determine the type of the resulting expression. The D rules for type conversions are the same as the arithmetic conversion rules for integers in ANSI C. These rules are sometimes referred to as the usual arithmetic conversions.
Each integer type is ranked in the order char
, short
,
int
, long
, long long
, with the
corresponding unsigned types assigned a rank higher than its signed equivalent, but below
the next integer type. When you construct an expression using two integer operands such as
x + y
and the operands are of different integer types, the operand type
with the highest rank is used as the result type.
If a conversion is required, the operand with the lower rank is first promoted to the type of the higher rank. Promotion doesn't change the value of the operand: it only extends the value to a larger container according to its sign. If an unsigned operand is promoted, the unused high-order bits of the resulting integer are filled with zeroes. If a signed operand is promoted, the unused high-order bits are filled by performing sign extension. If a signed type is converted to an unsigned type, the signed type is first sign-extended and then assigned the new, unsigned type that's determined by the conversion.
Integers and other types can also be explicitly cast from one type to another. Pointers and integers can be cast to any integer or pointer types, but not to other types.
An integer or pointer cast is formed using an expression such as the following:
y = (int)x;
In this example, the destination type is within parentheses and used to prefix the source expression. Integers are cast to types of higher rank by performing promotion. Integers are cast to types of lower rank by zeroing the excess high-order bits of the integer.
Because D doesn't include floating-point arithmetic, no floating-point operand conversion or casting is permitted and no rules for implicit floating-point conversion are defined.
Operator Precedence
D includes complex rules for operator precedence and associativity. The rules provide precise compatibility with the ANSI C operator precedence rules. The entries in the following table are in order from highest precedence to lowest precedence.
Table 3-12 D Operator Precedence and Associativity
Operators | Associativity |
---|---|
|
Left to right |
|
Right to left (Note that these are the unary operators) |
|
Left to right |
|
Left to right |
|
Left to right |
|
Left to right |
|
Left to right |
|
Left to right |
|
Left to right |
|
Left to right |
|
Left to right |
|
Left to right |
|
Left to right |
|
Right to left |
|
Right to left |
|
Left to right |
The comma (,
) operator that's listed in the table is for compatibility
with the ANSI C comma operator. It can be used to evaluate a set of expressions in
left-to-right order and return the value of the right most expression. This operator is
provided for compatibility with C and usage isn't recommended.
The ()
entry listed in the table of operator precedence represents a
function call. A comma is also used in D to list arguments to functions and to form lists of
associative array keys. Note that this comma isn't the same as the comma operator and
doesn't guarantee left-to-right evaluation. The D compiler provides no guarantee regarding
the order of evaluation of arguments to a function or keys to an associative array. Be
careful of using expressions with interacting side-effects, such as the pair of expressions
i
and i++
, in these contexts.
The []
entry listed in the table of operator precedence represents an
array or associative array reference. Note that aggregations are also treated as associative
arrays. The []
operator can also be used to index into fixed-size C arrays.
The following table provides further explanation for the function of several miscellaneous operators that are provided by the D language.
Operators | Description |
---|---|
|
Computes the size of an object. |
|
Computes the offset of a type member. |
|
Converts the operand to a string. |
|
Translates a data type. |
unary |
Computes the address of an object. |
unary |
Dereferences a pointer to an object. |
|
Accesses a member of a structure or union type. |
Type and Constant Definitions
This section describes how to declare type aliases and named constants in D. It also discusses D type and namespace management for program and OS types and identifiers.
typedefs
The typedef
keyword is used to declare an identifier as an alias for an
existing type. The typedef
declaration is used outside of probe clauses in
the following form:
typedef existing-type new-type ;
where existing-type is any type
declaration and new-type is an
identifier to be used as the alias for this type. For example,
the D compiler uses the following declaration internally to
create the uint8_t
type alias:
typedef unsigned char uint8_t;
You can use type aliases anywhere that a normal type can be
used, such as the type of a variable or associative array value
or tuple member. You can also combine typedef
with more elaborate declarations such as the definition of a new
struct
, as shown in the following example:
typedef struct foo {
int x;
int y;
} foo_t;
In the previous example, struct foo
is
defined using the same type as its alias,
foo_t
. Linux C system headers often use the
suffix _t
to denote a
typedef
alias.
Enumerations
Defining symbolic names for constants in a program eases readability and simplifies the process of maintaining the program in the future. One method is to define an enumeration, which associates a set of integers with a set of identifiers called enumerators that the compiler recognizes and replaces with the corresponding integer value. An enumeration is defined by using a declaration such as the following:
enum colors {
RED,
GREEN,
BLUE
};
The first enumerator in the enumeration, RED
,
is assigned the value zero and each subsequent identifier is
assigned the next integer value.
You can also specify an explicit integer value for any enumerator by suffixing it with an equal sign and an integer constant, as shown in the following example:
enum colors {
RED = 7,
GREEN = 9,
BLUE
};
The enumerator BLUE
is assigned the value 10
by the
compiler because it has no value specified and the previous enumerator is set to
9
. When an enumeration is defined, the enumerators can be used anywhere
in a D program that an integer constant is used. In addition, the enumeration enum
colors
is also defined as a type that's equivalent to an int
.
The D compiler permits a variable of enum
type to be used anywhere an
int
can be used and permits any integer value to be assigned to a
variable of enum
type. You can also omit the enum
name in
the declaration, if the type name isn't needed.
Enumerators are visible in all the following clauses and declarations in a program. Therefore, you can't define the same enumerator identifier in more than one enumeration. However, you can define more than one enumerator with the same value in either the same or different enumerations. You can also assign integers that have no corresponding enumerator to a variable of the enumeration type.
The D enumeration syntax is the same as the corresponding syntax in ANSI C. D also provides access to enumerations that are defined in the OS kernel and its loadable modules. Note that these enumerators aren't globally visible in a D program. Kernel enumerators are only visible if you specify one as an argument in a comparison with an object of the corresponding enumeration type. This feature protects D programs against inadvertent identifier name conflicts, with the large collection of enumerations that are defined in the OS kernel.
Inlines
D named constants can also be defined by using inline
directives, which
provide a more general means of creating identifiers that are replaced by predefined values
or expressions during compilation. Inline directives are a more powerful form of lexical
replacement than the #define
directive provided by the C preprocessor
because the replacement is assigned an actual type and is performed by using the compiled
syntax tree and not a set of lexical tokens. An inline
directive is
specified by using a declaration of the following form:
inline type name = expression;
where type is a type declaration of an existing type, name is any valid D identifier that isn't previously defined as an inline or global variable, and expression is any valid D expression. After the inline directive is processed, the D compiler substitutes the compiled form of expression for each subsequent instance of name in the program source.
For example, the following D program would trace the string
"hello
" and integer value
123
:
inline string hello = "hello";
inline int number = 100 + 23;
BEGIN
{
trace(hello);
trace(number);
}
An inline name can be used anywhere a global variable of the corresponding type is used. If the inline expression can be evaluated to an integer or string constant at compile time, then the inline name can also be used in contexts that require constant expressions, such as scalar array dimensions.
The inline expression is validated for syntax errors as part of evaluating the directive.
The expression result type must be compatible with the type that's defined by the
inline
, according to the same rules used for the D assignment operator
(=
). An inline expression can't reference the inline
identifier itself: recursive definitions aren't permitted.
The DTrace software packages install several D source files in the system directory
/usr/lib64/dtrace/installed-version
, which contain inline directives that you can use in D programs.
For example, the signal.d
library includes
directives of the following form:
inline int SIGHUP = 1;
inline int SIGINT = 2;
inline int SIGQUIT = 3;
...
These inline definitions provide you with access to the current
set of Oracle Linux signal names, as described in the
sigaction(2)
manual page. Similarly, the
errno.d
library contains inline directives
for the C errno
constants that are described
in the errno(3)
manual page.
By default, the D compiler includes all of the provided D library files automatically so that you can use these definitions in any D program.
Type Namespaces
In traditional languages such as ANSI C, type visibility is determined by whether a type is nested inside a function or other declaration. Types declared at the outer scope of a C program are associated with a single global namespace and are visible throughout the entire program. Types that are defined in C header files are typically included in this outer scope. Unlike these languages, D provides access to types from several outer scopes.
D is a language that provides dynamic observability across different layers of a software
stack, including the OS kernel, an associated set of loadable kernel modules, and user
processes that are running on the system. A single D program can instantiate probes to
gather data from several kernel modules or other software entities that are compiled into
independent binary objects. Therefore, more than one data type of the same name, sometimes
with different definitions, might be present in the universe of types that are available to
DTrace and the D compiler. To manage this situation, the D compiler associates each type
with a namespace, which is identified by the containing program object. Types from a
particular kernel level object, such as the main kernel or a kernel module, can be accessed
by specifying the object name and the back quote (`
) scoping operator in
any type name.
For a kernel module named foo
that contains the following C type
declaration:
typedef struct bar {
int x;
} bar_t;
The types struct bar
and
bar_t
could be accessed from D using the
following type names:
struct foo`bar
foo`bar_t
For example, the kernel includes a task_struct
that's described in
include/linux/sched.h
. The definition of this struct depends on kernel
configuration at build. You can find out information about the struct, such as its size, by
referencing it as follows:
sizeof(struct vmlinux`task_struct)
The back quote operator can be used in any context where a type name is appropriate, including when specifying the type for D variable declarations or cast expressions in D probe clauses.
The D compiler also provides two special, built-in type namespaces that use the names C
and D. The C type namespace is initially populated with the standard ANSI C intrinsic types,
such as int
. In addition, type definitions that are acquired by using the C
preprocessor (cpp), by running the dtrace -C
command, are processed by, and added to the C scope. So, you can include C header files
containing type declarations that are already visible in another type namespace without
causing a compilation error.
The D type namespace is initially populated with the D type intrinsics, such as
int
and string
, and the built-in D type aliases, such as
uint64_t
. Any new type declarations that appear in the D program source
are automatically added to the D type namespace. If you create a complex type such as a
struct
in a D program consisting of member types from other namespaces,
the member types are copied into the D namespace by the declaration.
When the D compiler encounters a type declaration that doesn't specify an explicit namespace using the back quote operator, the compiler searches the set of active type namespaces to find a match by using the specified type name. The C namespace is always searched first, followed by the D namespace. If the type name isn't found in either the C or D namespace, the type namespaces of the active kernel modules are searched in load address order, which doesn't guarantee any ordering properties among the loadable modules. To avoid type name conflicts with other kernel modules, use the scoping operator when accessing types that are defined in loadable kernel modules.
The D compiler uses the compressed ANSI C debugging information that's provided with the core Linux kernel modules to access the types that are associated with the OS source code, without the need to access the corresponding C include files. Note that this symbolic debugging information might not be available for all kernel modules on the system. The D compiler reports an error if you try to access a type within the namespace of a module that lacks the compressed C debugging information that's intended for use with DTrace.
Variables
D provides several variable types: scalar variables, associative arrays, scalar arrays, and multidimensional scalar arrays. Variables can be created by declaring them explicitly, but are most often created implicitly on first use. Variables can be restricted to clause or thread scope to avoid name conflicts and to control the lifetime of a variable explicitly.
- Scalar Variables
-
Scalar variables are used to represent individual, fixed-size data objects, such as integers and pointers. Scalar variables can also be used for fixed-size objects that are composed of one or more primitive or composite types. D provides the ability to create arrays of objects and composite structures. DTrace also represents strings as fixed-size scalars by permitting them to grow to a predefined maximum length.
To create a scalar variable, you can write an assignment expression of the following form:
where name is any valid D identifier and expression is any value or expression that the variable contains.name = expression ;
DTrace includes several built-in scalar variables that can be referenced within D programs. The values of these variables are automatically populated by DTrace. See DTrace Built-in Variable Reference for a complete list of these variables.
- Associative Arrays
-
Associative arrays are used to represent collections of data elements that can be retrieved by specifying a key. Associative arrays differ from normal, fixed-size arrays in that they have no predefined limit on the number of elements and can use any expression as a key. Furthermore, elements in an associative array aren't stored in consecutive storage locations.
To create an associative array, you can write an assignment expression of the following form:
name [ key ] = expression ;
where name is any valid D identifier, key is a comma-separated list of one or more expressions, often as string values, and expression is the value that's contained by the array for the specified key.
The type of each object that's contained in the array is also fixed for all elements in the array. You can use any of the assignment operators that are defined in Types, Operators, and Expressions to change associative array elements, subject to the operand rules defined for each operator. The D compiler produces an appropriate error message if you try an incompatible assignment. You can use any type with an associative array key or value that can be used with a scalar variable.
You can reference values in an associative array by specifying the array name and the appropriate key.
You can remove the elements of an associative array by assigning 0 to them. When you remove the elements in the array, the storage that's used for that element is deallocated and made available to the system for use.
- Scalar Arrays
-
Scalar arrays are a fixed-length group of consecutive memory locations that each store a value of the same type. Scalar arrays are accessed by referring to each location with an integer, starting from zero. Scalar arrays aren't used as often in D as associative arrays.
A D scalar array of 5 integers is declared by using the type
int
and suffixing the declaration with the number of elements in square brackets, for example:int s[5];
The D expression
s[0]
refers to the first array element,s[1]
refers to the second, and so on. DTrace performs bounds checking on the indexes of scalar arrays at compile time to help catch bad index references early.Note:
Scalar arrays and associative arrays are syntactically similar. You can declare an associative array of integers referenced by an integer key as follows:
int a[int];
You can also reference this array using the expression
a[0]
, but from a storage and implementation perspective, the two arrays are different. The scalar arrays
consists of five consecutive memory locations numbered from zero, and the index refers to an offset in the storage that's allocated for the array. However, the associative arraya
has no predefined size and doesn't store elements in consecutive memory locations. In addition, associative array keys have no relationship to the corresponding value storage location. You can access associative array elementsa[0]
anda[-5]
and only two words of storage are allocated by DTrace. Furthermore, these elements don't have to be consecutive. Associative array keys are abstract names for the corresponding values and have no relationship to the value storage locations.If you create an array using an initial assignment and use a single integer expression as the array index , for example,
a[0] = 2
, the D compiler always creates a new associative array, even though in this expressiona
could also be interpreted as an assignment to a scalar array. Scalar arrays must be predeclared in this situation so that the D compiler can recognize the definition of the array size and infer that the array is a scalar array. - Multidimensional Scalar Arrays
-
Multidimensional scalar arrays are used infrequently in D, but are provided for compatibility with ANSI C and are for observing and accessing OS data structures that are created by using this capability in C. A multidimensional array is declared as a consecutive series of scalar array sizes within square brackets
[]
following the base type. For example, to declare a fixed-size, two-dimensional array of integers of dimensions that's 12 rows by 34 columns, you would write the following declaration:int s[12][34];
A multidimensional scalar array is accessed by using similar notation. For example, to access the value stored at row
0
and column1
, you would write the D expression as follows:s[0][1]
Storage locations for multidimensional scalar array values are computed by multiplying the row number by the total number of columns declared and then adding the column number.
Be careful not to confuse the multidimensional array syntax with the D syntax for associative array accesses, that's,
s[0][1]
, isn't the same ass[0,1]
). If you use an incompatible key expression with an associative array or try an associative array access of a scalar array, the D compiler reports an appropriate error message and refuses to compile the program.
Variable Scope
Variable scoping is used to define where variable names are valid within a program and to avoid variable naming collisions. By using scoped variables you can control the availability of the variable instance to the whole program, a particular thread, or a specific clause.
The following table lists and describes the three primary variable scopes that are available. Note that external variables provide a fourth scope that falls outside of the control of the D program.
Scope | Syntax | Initial Value | Thread-safe? | Description |
---|---|---|---|---|
global |
|
0 |
No |
Any probe that fires on any thread accesses the same instance of the variable. |
Thread-local |
|
0 |
Yes |
Any probe that fires on a thread accesses the thread-specific instance of the variable. |
Clause-local |
|
Not defined |
Yes |
Any probe that fires accesses an instance of the variable specific to that particular firing of the probe. |
Note:
Note the following information:
-
Scalar variables and associative arrays have a global scope and aren't multi-processor safe (MP-safe). Because the value of such variables can be changed by more than one processor, a variable can become corrupted if more than one probe changes it.
-
Aggregations are MP-safe even though they have a global scope because independent copies are updated locally before a final aggregation produces the global result.
Global Variables
Global variables are used to declare variable storage that's persistent across the entire D program. Global variables provide the broadest scope.
Global variables of any type can be defined in a D program, including associative arrays. The following are some example global variable definitions:
x = 123; /* integer value */
s = "hello"; /* string value */
a[123, 'a'] = 456; /* associative array */
Global variables are created automatically on their first assignment and use the type appropriate for the right side of the first assignment statement. Except for scalar arrays, you don't need to explicitly declare global variables before using them. To create a declaration anyway, you must place it outside of program clauses, for example:
int x; /* declare int x as a global variable */
int x[unsigned long long, char];
syscall::read:entry
{
x = 123;
a[123, 'a'] = 456;
}
D variable declarations can't assign initial values. You can use a
BEGIN
probe clause to assign any initial values. All global variable
storage is filled with zeroes by DTrace before you first reference the variable.
Thread-Local Variables
Thread-local variables are used to declare variable storage that's local to each OS thread. Thread-local variables are useful in situations where you want to enable a probe and mark every thread that fires the probe with some tag or other data.
Thread-local variables are referenced by applying the
->
operator to the special identifier
self
, for example:
syscall::read:entry
{
self->read = 1;
}
This D fragment example enables the probe on the read()
system call and
associates a thread-local variable named read
with each thread that fires
the probe. Similar to global variables, thread-local variables are created automatically on
their first assignment and assume the type that's used on the right-hand side of the first
assignment statement, which is int
in this example.
Each time the self->read
variable is referenced in the D program, the
data object that's referenced is the one associated with the OS thread that was executing
when the corresponding DTrace probe fired. You can think of a thread-local variable as an
associative array that's implicitly indexed by a tuple that describes the thread's identity
in the system. A thread's identity is unique over the lifetime of the system: if the thread
exits and the same OS data structure is used to create a thread, this thread doesn't reuse
the same DTrace thread-local storage identity.
When you have defined a thread-local variable, you can reference it for any thread in the system, even if the variable in question hasn't been previously assigned for that particular thread. If a thread's copy of the thread-local variable hasn't yet been assigned, the data storage for the copy is defined to be filled with zeroes. As with associative array elements, underlying storage isn't allocated for a thread-local variable until a non-zero value is assigned to it. Also, as with associative array elements, assigning zero to a thread-local variable causes DTrace to deallocate the underlying storage. Always assign zero to thread-local variables that are no longer in use.
Thread-local variables of any type can be defined in a D program, including associative arrays. The following are some example thread-local variable definitions:
self->x = 123; /* integer value */
self->s = "hello"; /* string value */
self->a[123, 'a'] = 456; /* associative array */
You don't need to explicitly declare thread-local variables before using them. To create a
declaration anyway, you must place it outside of program clauses by prepending the keyword
self
, for example:
self int x; /* declare int x as a thread-local variable */
syscall::read:entry
{
self->x = 123;
}
Thread-local variables are kept in a separate namespace from global variables so that you
can reuse names. Remember that x
and self->x
aren't the
same variable if you overload names in a program.
Clause-Local Variables
Clause-local variable are used to restrict the storage of a variable to the particular firing of a probe. Clause-local is the narrowest scope. When a probe fires on a CPU, the D script is run in program order. Each clause-local variable is instantiated with an undefined value the first time it is used in the script. The same instance of the variable is used in all clauses until the D script has completed running for that particular firing of the probe.
Clause-local variables can be referenced and assigned by
prefixing with this->
:
BEGIN
{
this->secs = timestamp / 1000000000;
...
}
To declare a clause-local variable explicitly before using it, you can do so by using the
this
keyword:
this int x; /* an integer clause-local variable */
this char c; /* a character clause-local variable */
BEGIN
{
this->x = 123;
this->c = 'D';
}
Note that if a program contains several clauses for a single probe, any clause-local variables remain intact as the clauses are run sequentially and clause-local variables are persistent across different clauses that are enabling the same probe. While clause-local variables are persistent across clauses that are enabling the same probe, their values are undefined in the first clause processed for a specified probe. To avoid unexpected results, assign each clause-local variable an appropriate value before using it.
Clause-local variables can be defined using any scalar variable type, but associative arrays can't be defined using clause-local scope. The scope of clause-local variables only applies to the corresponding variable data, not to the name and type identity defined for the variable. When a clause-local variable is defined, this name and type signature can be used in any later D program clause.
You can use clause-local variables to accumulate intermediate results of calculations or as temporary copies of other variables. Access to a clause-local variable is much faster than access to an associative array. Therefore, if you need to reference an associative array value several times in the same D program clause, it's more efficient to copy it into a clause-local variable first and then reference the local variable repeatedly.
External Variables
The D language uses the back quote character (`
) as a special scoping
operator for accessing symbols or variables that are defined in the OS, outside of the D
program itself.
DTrace instrumentation runs inside the Oracle Linux OS kernel. So, in addition to accessing special DTrace variables and probe arguments, you can also access kernel data structures, symbols, and types. These capabilities enable advanced DTrace users, administrators, service personnel, and driver developers to examine low-level behavior of the OS kernel and device drivers.
For example, the Oracle Linux kernel contains a C declaration of a system variable named
max_pfn
. This variable is declared in C in the kernel source code as
follows:
unsigned long max_pfn
To trace the value of this variable in a D program, you can write the following D statement:
trace(`max_pfn);
DTrace associates each kernel symbol with the type that's used for the symbol in the corresponding OS C code, which provides source-based access to the local OS data structures.
Kernel symbol names are kept in a separate namespace from D variable and function
identifiers, so you don't need to be concerned about these names conflicting with other D
variables. When you prefix a variable with a back quote, the D compiler searches the known
kernel symbols and uses the list of loaded modules to find a matching variable definition.
Because the Oracle Linux kernel can dynamically load modules with separate symbol
namespaces, the same variable name might be used more than once in the active OS kernel. You
can resolve these name conflicts by specifying the name of the kernel module that contains
the variable to be accessed before the back quote in the symbol name. For example, you would
refer to the address of the _bar
function that's provided by a kernel
module named foo
as follows:
foo`_bar
You can apply any of the D operators to external variables, except for those that modify
values, subject to the usual rules for operand types. When required, the D compiler loads
the variable names that correspond to active kernel modules, so you don't need to declare
these variables. You can't apply any operator to an external variable that modifies its
value, such as =
or +=
. For safety reasons, DTrace
prevents you from damaging or corrupting the state of the software that you're observing.
When you access external variables from a D program, you're accessing the internal implementation details of another program, such as the OS kernel or its device drivers. These implementation details don't form a stable interface upon which you can rely. Any D programs you write that depend on these details might not work when you next upgrade the corresponding piece of software. For this reason, external variables are typically used to debug performance or functionality problems by using DTrace.
Pointers
Pointers are memory addresses of data objects and reference memory used by the OS, by the user program, or by the D script. Pointers in D are data objects that store an integer virtual address value and associate it with a D type that describes the format of the data stored at the corresponding memory location.
You can explicitly declare a D variable to be of pointer type by first specifying the type
of the referenced data and then appending an asterisk (*
) to the type name.
Doing so indicates you want to declare a pointer type, as shown in the following
statement:
int *p;
The statement declares a D global variable named p
that's a pointer to an
integer. The declaration means that p
is a 64-bit integer with a value
that's the address of another integer located somewhere in memory. Because the compiled form
of the D code is run at probe firing time inside the kernel itself, D pointers are typically
pointers associated with the kernel's address space.
To create a pointer to a data object inside the kernel, you can compute its address by
using the &
operator. For example, the kernel source code declares an
unsigned long max_pfn
variable. You could trace the address of this
variable by tracing the result of applying the &
operator to the name
of that object in D:
trace(&`max_pfn);
The *
operator can be used to specify the object addressed by the pointer,
and acts as the inverse of the &
operator. For example, the following
two D code fragments are equivalent in meaning:
q = &`max_pfn; trace(*q);
trace(`max_pfn);
In this example, the first fragment creates a D global variable pointer q
.
Because the max_pfn
object is of type unsigned long
, the
type of &`max_pfn
is unsigned long *
, a pointer to
unsigned long
. The type of q
is implicit in the
declaration. Tracing the value of *q
follows the pointer back to the data
object max_pfn
. This fragment is therefore the same as the second fragment,
which directly traces the value of the data object by using its name.
Pointer Safety
DTrace is a robust, safe environment for running D programs. You might write a buggy D
program, but invalid D pointer accesses don't cause DTrace or the OS kernel to fail or crash
in any way. Instead, the DTrace software detects any invalid pointer accesses, and returns a
BADADDR
fault; the current clause execution quits, an ERROR probe fires,
and tracing continues unless the program called exit
for the ERROR probe.
Pointers are required in D because they're an intrinsic part of the OS's implementation in C, but DTrace implements the same kind of safety mechanisms that are found in the Java programming language to prevent buggy programs from affecting themselves or each other. DTrace's error reporting is similar to the runtime environment for the Java programming language that detects a programming error and reports an exception.
To observe DTrace's error handling and reporting, you could
write a deliberately bad D program using pointers. For example,
in an editor, type the following D program and save it in a file
named badptr.d
:
BEGIN
{
x = (int *)NULL;
y = *x;
trace(y);
}
The badptr.d
program uses a cast expression to convert
NULL
to be a pointer to an integer. The program then dereferences the
pointer by using the expression *x
, assigns the result to another variable
y
, and then tries to trace y
. When the D program is run,
DTrace detects an invalid pointer access when the statement y = *x
is
processed and reports the following error:
dtrace: script '/tmp/badptr.d' matched 1 probe
dtrace: error on enabled probe ID 2 (ID 1: dtrace:::BEGIN): invalid address (0x0) in action #1 at BPF pc 156
Notice that the D program moves past the error and continues to run; the system and all
observed processes remain unperturbed. You can also add an ERROR
probe to
any script to handle D errors. For details about the DTrace error mechanism, see ERROR Probe.
Pointer and Array Relationship
A scalar array is represented by a variable that's associated with the address of its
first storage location. A pointer is also the address of a storage location with a defined
type. Thus, D permits the use of the array []
index notation with both
pointer variables and array variables. For example, the following two D fragments are
equivalent in meaning:
p = &a[0]; trace(p[2]);
trace(a[2]);
In the first fragment, the pointer p
is assigned to the address of the
first element in scalar array a
by applying the &
operator to the expression a[0]
. The expression p[2]
traces the value of the third array element (index 2). Because p
now
contains the same address associated with a
, this expression yields the
same value as a[2]
, shown in the second fragment. One consequence of this
equivalence is that D permits you to access any index of any pointer or array. If you access
memory beyond the end of a scalar array's predefined size, you either get an unexpected
result or DTrace reports an invalid address error.
The difference between pointers and arrays is that a pointer variable refers to a separate piece of storage that contains the integer address of some other storage; whereas, an array variable names the array storage itself, not the location of an integer that in turn contains the location of the array.
This difference is manifested in the D syntax if you try to assign pointers and scalar
arrays. If x
and y
are pointer variables, the expression
x = y
is legal; it copies the pointer address in y
to
the storage location that's named by x
. If x
and
y
are scalar array variables, the expression x = y
isn't
legal. Arrays can't be assigned as a whole in D. If p
is a pointer and
a
is a scalar array, the statement p = a
is permitted.
This statement is equivalent to the statement p = &a[0]
.
Pointer Arithmetic
As in C, pointer arithmetic in D isn't identical to integer arithmetic. Pointer arithmetic implicitly adjusts the underlying address by multiplying or dividing the operands by the size of the type referenced by the pointer.
The following D fragment illustrates this property:
int *x;
BEGIN
{
trace(x);
trace(x + 1);
trace(x + 2);
}
This fragment creates an integer pointer x
and then traces its value, its
value incremented by one, and its value incremented by two. If you create and run this
program, DTrace reports the integer values 0
, 4
, and
8
.
Because x
is a pointer to an int
(size 4 bytes),
incrementing x
adds 4 to the underlying pointer value. This property is
useful when using pointers to reference consecutive storage locations such as arrays. For
example, if x
was assigned to the address of an array a
,
the expression x + 1
would be equivalent to the expression
&a[1]
. Similarly, the expression *(x + 1)
would
reference the value a[1]
. Pointer arithmetic is implemented by the D
compiler whenever a pointer value is incremented by using the +
,
++
, or =+
operators. Pointer arithmetic is also applied
as follows; when an integer is subtracted from a pointer on the left-hand side, when a
pointer is subtracted from another pointer, or when the --
operator is
applied to a pointer.
For example, the following D program would trace the result
2
:
int *x, *y;
int a[5];
BEGIN
{
x = &a[0];
y = &a[2];
trace(y - x);
}
Generic Pointers
Sometimes it's useful to represent or manipulate a generic pointer address in a D program
without specifying the type of data referred to by the pointer. Generic pointers can be
specified by using the type void *
, where the keyword void
represents the absence of specific type information, or by using the built-in type alias
uintptr_t
, which is aliased to an unsigned integer type of size that's
appropriate for a pointer in the current data model. You can't apply pointer arithmetic to
an object of type void *
, and these pointers can't be dereferenced without
casting them to another type first. You can cast a pointer to the uintptr_t
type when you need to perform integer arithmetic on the pointer value.
Pointers to void
can be used in any context
where a pointer to another data type is required, such as an
associative array tuple expression or the right-hand side of an
assignment statement. Similarly, a pointer to any data type can
be used in a context where a pointer to void
is required. To use a pointer to a non-void
type in place of another non-void
pointer
type, an explicit cast is required. You must always use explicit
casts to convert pointers to integer types, such as
uintptr_t
, or to convert these integers back
to the appropriate pointer type.
Pointers to DTrace Objects
The D compiler prohibits you from using the &
operator to obtain
pointers to DTrace objects such as associative arrays, built-in functions, and variables.
You're prohibited from obtaining the address of these variables so that the DTrace runtime
environment is free to relocate them as needed between probe firings . In this way, DTrace
can more efficiently manage the memory required for programs. If you create composite
structures, it's possible to construct expressions that retrieve the kernel address of
DTrace object storage. Avoid creating such expressions in D programs. If you need to use
such an expression, don't rely on the address being the same across probe firings.
Pointers and Address Spaces
A pointer is an address that provides a translation within some virtual address space to a piece of physical memory. DTrace runs D programs within the address space of the OS kernel itself. The Linux system manages many address spaces: one for the OS kernel itself, and one for each user process. Because each address space provides the illusion that it can access all the memory on the system, the same virtual address pointer value can be reused across address spaces, but translate to different physical memory. Therefore, when writing D programs that use pointers, you must be aware of the address space corresponding to the pointers you intend to use.
For example, if you use the syscall
provider to instrument entry to a
system call that takes a pointer to an integer or array of integers as an argument, such as,
pipe()
, it would not be valid to dereference that pointer or array using
the *
or []
operators because the address in question is
an address in the address space of the user process that performed the system call. Applying
the *
or []
operators to this address in D would result in
kernel address space access, which would result in an invalid address error or in returning
unexpected data to the D program, depending on whether the address happened to match a valid
kernel address.
To access user-process memory from a DTrace probe, you must apply one of the copyin
, copyinstr
, or
copyinto
functions. To avoid confusion, take care when writing D programs to name and comment
variables storing user addresses appropriately. You can also store user addresses as
uintptr_t
so that you don't accidentally compile D code that dereferences
them..
Structs and Unions
Collections of related variables can be grouped together into composite data objects called structs and unions. You define these objects in D by creating new type definitions for them. You can use any new types for any D variables, including associative array values. This section explores the syntax and semantics for creating and manipulating these composite types and the D operators that interact with them.
Structs
The D keyword struct
, short for structure, is used to introduce a
new type that's composed of a group of other types. The new struct
type can
be used as the type for D variables and arrays, enabling you to define groups of related
variables under a single name. D structs are the same as the corresponding construct in C
and C++. If you have programmed in the Java programming language, think of a D struct as a
class that contains only data members and no methods.
Suppose you want to create a more sophisticated system call tracing program in D that
records several things about each read()
and write()
system call that's run for an application, for example, the elapsed time, number of calls,
and the largest byte count passed as an argument.
You could write a D clause to record these properties in four separate associative arrays, as shown in the following example:
int ts[string]; /* declare ts */
int calls[string]; /* declare calls */
int elapsed [string]; /* declare elapsed */
int maxbytes[string]; /* declare maxbytes */
syscall::read:entry, syscall::write:entry
/pid == $target/
{
ts[probefunc] = timestamp;
calls[probefunc]++;
maxbytes[probefunc] = arg2 > maxbytes[probefunc] ?
arg2 : maxbytes[probefunc];
}
syscall::read:return, syscall::write:return
/ts[probefunc] != 0 && pid == $target/
{
elapsed[probefunc] += timestamp - ts[probefunc];
}
END
{
printf(" calls max bytes elapsed nsecs\n");
printf("------ ----- --------- -------------\n");
printf(" read %5d %9d %d\n",
calls["read"], maxbytes["read"], elapsed["read"]);
printf(" write %5d %9d %d\n",
calls["write"], maxbytes["write"], elapsed["write"]);
}
You can make the program easier to read and maintain by using a struct. A struct provides a logical grouping pf data items that belong together. It also saves storage space because all data items can be stored with a single key.
First, declare a new struct
type at the top
of the D program source file:
struct callinfo {
uint64_t ts; /* timestamp of last syscall entry */
uint64_t elapsed; /* total elapsed time in nanoseconds */
uint64_t calls; /* number of calls made */
size_t maxbytes; /* maximum byte count argument */
};
The struct
keyword is followed by an optional identifier that's used to
refer back to the new type, which is now known as struct callinfo
. The
struct members are then within a set of braces {}
and the entire
declaration ends with a semicolon (;
). Each struct member is defined by
using the same syntax as a D variable declaration, with the type of the member listed first
followed by an identifier naming the member and another semicolon (;
).
The struct
declaration defines the new type. It doesn't create any
variables or allocate any storage in DTrace. When declared, you can use struct
callinfo
as a type throughout the remainder of the D program. Each variable of
type struct callinfo
stores a copy of the four variables that are described
by our structure template. The members are arranged in memory in order, according to the
member list, with padding space introduced between members, as required for data object
alignment purposes.
You can use the member identifier names to access the individual
member values using the “.
” operator by
writing an expression of the following form:
variable-name.member-name
The following example is an improved program that uses the new
structure type. In a text editor, type the following D program
and save it in a file named rwinfo.d
:
struct callinfo {
uint64_t ts; /* timestamp of last syscall entry */
uint64_t elapsed; /* total elapsed time in nanoseconds */
uint64_t calls; /* number of calls made */
size_t maxbytes; /* maximum byte count argument */
};
struct callinfo i[string]; /* declare i as an associative array */
syscall::read:entry, syscall::write:entry
/pid == $target/
{
i[probefunc].ts = timestamp;
i[probefunc].calls++;
i[probefunc].maxbytes = arg2 > i[probefunc].maxbytes ?
arg2 : i[probefunc].maxbytes;
}
syscall::read:return, syscall::write:return
/i[probefunc].ts != 0 && pid == $target/
{
i[probefunc].elapsed += timestamp - i[probefunc].ts;
}
END
{
printf(" calls max bytes elapsed nsecs\n");
printf("------ ----- --------- -------------\n");
printf(" read %5d %9d %d\n",
i["read"].calls, i["read"].maxbytes, i["read"].elapsed);
printf(" write %5d %9d %d\n",
i["write"].calls, i["write"].maxbytes, i["write"].elapsed);
}
Run the program to return the results for a command. For example run the dtrace
-q -s rwinfo.d -c /bin/date command. The date program runs and is traced until
it exits and fires the END
probe which prints the results:
# dtrace -q -s rwinfo.d -c date
...
calls max bytes elapsed nsecs
------ ----- --------- -------------
read 2 4096 10689
write 1 29 9817
Pointers to Structs
Referring to structs by using pointers is common in C and D. You can use the operator
->
to access struct members through a pointer. If struct
s
has a member m
, and you have a pointer to this struct named
sp
, where sp
is a variable of type struct s
*
, you can either use the *
operator to first dereference the
sp
pointer to access the member:
struct s *sp;
(*sp).m
Or, you can use the ->
operator to achieve the same thing:
struct s *sp;
sp->m
DTrace provides several built-in variables that are pointers to
structs. For example, the pointer curpsinfo
refers to struct
psinfo
and its content provides a snapshot of information about the
state of the process associated with the thread that fired the
current probe. The following table lists a few example
expressions that use curpsinfo
, including
their types and their meanings.
Example Expression | Type | Meaning |
---|---|---|
|
|
Current process ID |
|
|
Executable file name |
|
|
Initial command line arguments |
The next example uses the pr_fname
member to
identify a process of interest. In an editor, type the following
script and save it in a file named procfs.d
:
syscall::write:entry
/ curpsinfo->pr_fname == "date" /
{
printf("%s run by UID %d\n", curpsinfo->pr_psargs, curpsinfo->pr_uid);
}
This clause uses the expression curpsinfo->pr_fname
to access and
match the command name so that the script selects the correct write()
requests before tracing the arguments. Notice that by using operator ==
with a left-hand argument that's an array of char
and a right-hand argument
that's a string, the D compiler infers that the left-hand argument can be promoted to a
string and a string comparison is performed. Type the command dtrace -q -s
procs.d in one shell and then run several variations of the
date command in another shell. The output that's displayed by
DTrace might be similar to the following, indicating that
curpsinfo->pr_psargs
can show how the command is invoked and also any
arguments that are included with the command:
# dtrace -q -s procfs.d
date run by UID 500
/bin/date run by UID 500
date -R run by UID 500
...
^C
#
Complex data structures are used often in C programs, so the ability to describe and reference structs from D also provides a powerful capability for observing the inner workings of the Oracle Linux OS kernel and its system interfaces.
Unions
Unions are another kind of composite type available in ANSI C and D and are related to structs. A union is a composite type where a set of members of different types are defined and the member objects all occupy the same region of storage. A union is therefore an object of variant type, where only one member is valid at any particular time, depending on how the union has been assigned. Typically, some other variable, or piece of state is used to indicate which union member is currently valid. The size of a union is the size of its largest member. The memory alignment that's used for the union is the maximum alignment required by the union members.
Member Sizes and Offsets
You can determine the size in bytes of any D type or expression,
including a struct
or
union
, by using the sizeof
operator. The sizeof
operator can be applied
either to an expression or to the name of a type surrounded by
parentheses, as illustrated in the following two examples:
sizeof expression
sizeof (type-name)
For example, the expression sizeof (uint64_t)
would return the value
8
, and the expression sizeof (callinfo.ts)
would also
return 8
, if inserted into the source code of the previous example program.
The formal return type of the sizeof
operator is the type alias
size_t
, which is defined as an unsigned integer that's the same size as a
pointer in the current data model and is used to represent byte counts. When the
sizeof
operator is applied to an expression, the expression is validated
by the D compiler, but the resulting object size is computed at compile time and no code for
the expression is generated. You can use sizeof
anywhere an integer
constant is required.
You can use the companion operator offsetof
to determine the offset in
bytes of a struct or union member from the start of the storage that's associated with any
object of the struct
or union
type. The
offsetof
operator is used in an expression of the following form:
offsetof (type-name, member-name)
Here, type-name is the name of any
struct
or union
type or
type alias, and member-name is the
identifier naming a member of that struct or union. Similar to
sizeof
, offsetof
returns a
size_t
and you can use it anywhere in a D
program that an integer constant can be used.
Bit-Fields
D also permits the definition of integer struct and union members of arbitrary numbers of bits, known as bit-fields. A bit-field is declared by specifying a signed or unsigned integer base type, a member name, and a suffix indicating the number of bits to be assigned for the field, as shown in the following example:
struct s
{
int a : 1;
int b : 3;
int c : 12;
};
The bit-field width is an integer constant that's separated from the member name by a trailing colon. The bit-field width must be positive and must be of a number of bits not larger than the width of the corresponding integer base type. Bit-fields that are larger than 64 bits can't be declared in D. D bit-fields provide compatibility with and access to the corresponding ANSI C capability. Bit-fields are typically used in situations when memory storage is at a premium or when a struct layout must match a hardware register layout.
A bit-field is a compiler construct that automates the layout of an integer and a set of
masks to extract the member values. The same result can be achieved by defining the masks
yourself and using the &
operator. The C and D compilers try to pack
bits as efficiently as possible, but they're free to do so in any order or fashion.
Therefore, bit-fields aren't guaranteed to produce identical bit layouts across differing
compilers or architectures. If you require stable bit layout, construct the bit masks
yourself and extract the values by using the &
operator.
A bit-field member is accessed by specifying its name with the “.
” or
->
operators, similar to any other struct or union member. The
bit-field is automatically promoted to the next largest integer type for use in any
expressions. Because bit-field storage can't be aligned on a byte boundary or be a round
number of bytes in size, you can't apply the sizeof
or
offsetof
operators to a bit-field member. The D compiler also prohibits
you from taking the address of a bit-field member by using the &
operator.
DTrace String Processing
DTrace provides facilities for tracing and manipulating strings. This section describes the complete set of D language features for declaring and manipulating strings. Unlike ANSI C, strings in D have their own built-in type and operator support to enable you to easily and unambiguously use them in tracing programs.
String Representation
In DTrace, strings are represented as an array of characters ending in a null byte, which
is a byte with a value of zero, usually written as '\0'
. The visible part
of the string is of variable length, depending on the location of the null byte, but DTrace
stores each string in a fixed-size array so that each probe traces a consistent amount of
data. Strings cannot exceed the length of the predefined string limit. However, the limit
can be modified in your D program or on the dtrace command line by
tuning the strsize
option. The default string limit is 256 bytes.
The D language provides an explicit string
type rather than using the
type char *
to refer to strings. The string type is equivalent to
char *
, in that it's the address of a sequence of characters, but the D
compiler and D functions such as trace
provide enhanced capabilities when
applied to expressions of type string. For example, the string type removes the ambiguity of
type char *
when you need to trace the actual bytes of a string.
In the following D statement, if s
is of type
char *
, DTrace traces the value of the
pointer s
, which means it traces an integer
address value:
trace(s);
In the following D statement, by the definition of the
*
operator, the D compiler dereferences the
pointer s
and traces the single character at
that location:
trace(*s);
These behaviors enable you to manipulate character pointers that refer to either single characters, or to arrays of byte-sized integers that aren't strings and don't end with a null byte.
In the next D statement, if s
is of type
string
, the string type indicates to the D
compiler that you want DTrace to trace a null terminated string
of characters whose address is stored in the variable
s
:
trace(s);
You can also perform lexical comparison of expressions of type string. See String Comparison.
String Constants
String constants are enclosed in pairs of double quotes (""
) and are
automatically assigned the type string
by the D compiler. You can define
string constants of any length, limited only by the amount of memory DTrace is permitted to
consume on your system and by whatever limit you have set for the strsize
DTrace runtime option. The terminating null byte (\0
) is added
automatically by the D compiler to any string constants that you declare. The size of a
string constant object is the number of bytes associated with the string, plus one
additional byte for the terminating null byte.
A string constant can't contain a literal newline character. To create strings containing
newlines, use the \n
escape sequence instead of a literal newline. String
constants can also contain any of the special character escape sequences that are defined
for character constants.
String Assignment
Unlike the assignment of char *
variables, strings are copied by value
and not by reference. The string assignment operator =
copies the actual
bytes of the string from the source operand up to and including the null byte to the
variable on the left-hand side, which must be of type string
.
You can use a declaration to create a string variable:
string s;
Or you can create a string variable by assigning it an expression of type
string
.
For example, the D statement:
s = "hello";
creates a variable s
of type string
and copies the six
bytes of the string "hello"
into it (five printable characters, plus the
null byte).
String assignment is analogous to the C library function strcpy()
, with
the exception that if the source string exceeds the limit of the storage of the destination
string, the resulting string is automatically truncated by a null byte at this limit.
You can also assign to a string variable an expression of a type that's compatible with
strings. In this case, the D compiler automatically promotes the source expression to the
string type and performs a string assignment. The D compiler permits any expression of type
char *
or of type char[n]
, a scalar array of
char
of any size, to be promoted to a string.
String Conversion
Expressions of other types can be explicitly converted to type
string
by using a cast expression or by
applying the special stringof
operator, which
are equivalent in the following meaning:
s = (string) expression;
s = stringof (expression);
The expression is interpreted as an address to the string.
The stringof
operator binds very tightly to the operand on its right-hand
side. You can optionally surround the expression by using parentheses, for clarity.
Scalar type expressions, such as a pointer or integer, or a scalar array address can be
converted to strings, in that the scalar is interpreted as an address to a char type.
Expressions of other types such as void
may not be converted to
string
. If you erroneously convert an invalid address to a string, the
DTrace safety features prevents you from damaging the system or DTrace, but you might end up
tracing a sequence of undecipherable characters.
String Comparison
D overloads the binary relational operators and permits them to
be used for string comparisons, as well as integer comparisons.
The relational operators perform string comparison whenever both
operands are of type string
or when one
operand is of type string
and the other
operand can be promoted to type string
. See
String Assignment for a detailed description.
See also Table 3-13, which lists the
relational operators that can be used to compare strings.
Table 3-13 D Relational Operators for Strings
Operator | Description |
---|---|
|
Left-hand operand is less than right-operand. |
|
Left-hand operand is less than or equal to right-hand operand. |
|
Left-hand operand is greater than right-hand operand. |
|
Left-hand operand is greater than or equal to right-hand operand. |
|
Left-hand operand is equal to right-hand operand. |
|
Left-hand operand is not equal to right-hand operand. |
As with integers, each operator evaluates to a value of type
int
, which is equal to one if the condition
is true or zero if it is false.
The relational operators compare the two input strings
byte-by-byte, similarly to the C library routine
strcmp()
. Each byte is compared by using its
corresponding integer value in the ASCII character set until a
null byte is read or the maximum string length is reached. See
the ascii(7)
manual page for more
information. Some example D string comparisons and their results
are shown in the following table.
D string comparison | Result |
---|---|
|
Returns 1 (true) |
|
Returns 1 (true) |
|
Returns 0 (false) |
Note:
Identical Unicode strings might compare as being different if one or the other of the strings isn't normalized.
Aggregations
Aggregations enable you to accumulate data for statistical analysis. The aggregation is calculated at runtime, so that post-processing isn't required and processing is highly efficient and accurate. Aggregations function similarly to associative arrays, but are populated by aggregating functions. In D, the syntax for an aggregation is as follows:
@name[ keys ] = aggfunc( args );
The aggregation name is a D identifier that's prefixed with the
special character @
. All aggregations that are named in D programs
are global variables. Aggregations can't have thread-local or clause-local scope.
The aggregation names are kept in an identifier namespace that's separate from other
D global variables. If you reuse names, remember that a
and
@a
are not the same variable. The special aggregation
name @
can be used to name an anonymous aggregation in D programs.
The D compiler treats this name as an alias for the aggregation name
@_
.
Aggregations can be regular or indexed. Indexed aggregations use keys, where keys are a comma-separated list of D expressions, similar to the tuples of expressions used for associative arrays. Regular aggregations are treated similarly to indexed aggregations, but don't use keys for indexing.
The aggfunc is one of the DTrace aggregating functions, and args is a comma-separated list of arguments appropriate to that function. Most aggregating functions take a single argument that represents the new datum.
Aggregation Functions
The following functions are aggregating functions that can be used in a program to collect data and present it in a meaningful way.
-
avg
: Stores the arithmetic average of the specified expressions in an aggregation. -
count
: Stores an incremented count value in an aggregation. -
max
: Stores the largest value among the specified expressions in an aggregation. -
min
: Stores the smallest value among the specified expressions in an aggregation. -
sum
: Stores the total value of the specified expression in an aggregation. -
stddev
: Stores the standard deviation of the specified expressions in an aggregation. -
quantize
: Stores a power-of-two frequency distribution of the values of the specified expressions in an aggregation. An optional increment can be specified. -
lquantize
: Stores the linear frequency distribution of the values of the specified expressions, sized by the specified range, in an aggregation. -
llquantize
: Stores the log-linear frequency distribution in an aggregation.
Printing Aggregations
By default, several aggregations are displayed in the order in which they're introduced in
the D program. You can override this behavior by using the printa
function to print the
aggregations. The printa
function also lets you precisely format the
aggregation data by using a format string.
If an aggregation isn't formatted with a printa
statement in a D program,
the dtrace command snapshots the aggregation data and prints the
results after tracing has completed, using the default aggregation format. If an aggregation
is formatted with a printa
statement, the default behavior is disabled. You
can achieve the same results by adding the
printa(@aggregation-name)
statement to an
END
probe clause in a program.
The default output format for the avg
, count
,
min
, max
, stddev
, and
sum
aggregating functions displays an integer decimal value corresponding
to the aggregated value for each tuple. The default output format for the
quantize
, lquantize
, and llquantize
aggregating functions displays an ASCII histogram with the results. Aggregation tuples are
printed as though trace
had been applied to each tuple element.
Data Normalization
When aggregating data over some period, you might want to normalize the data based on some
constant factor. This technique lets you compare disjointed data more easily. For example,
when aggregating system calls, you might want to output system calls as a per-second rate
instead of as an absolute value over the course of the run. The DTrace normalize
function lets you
normalize data in this way. The parameters to normalize
are an aggregation
and a normalization factor. The output of the aggregation shows each value divided by the
normalization factor.
Speculation
DTrace includes a speculative tracing facility that can be used to tentatively trace data at one or more probe locations. You can then decide to commit the data to the principal buffer at another probe location. You can use speculation to trace data that only contains the output that's of interest; no extra processing is required and the DTrace overhead is minimized.
- Setting up a temporary speculation buffer
- Instructing on or more clauses to trace to the speculation buffer
- Committing the data in the speculation buffer to the primary buffer; or discarding the speculation buffer.
You can choose to commit or discard speculation data when certain conditions are met, by using the appropriate functions within a clause. By using speculation, you can trace data for a set of probes until a condition is met and then either dispose of the data if it isn't useful, or keep it.
The following table describes DTrace speculation functions.
Table 3-14 DTrace Speculation Functions
Function | Args | Description |
---|---|---|
None |
Returns an identifier for a new speculative buffer. |
|
ID |
Denotes that the remainder of the clause must be traced to the speculative buffer specified by ID. |
|
ID |
Commits the speculative buffer that's associated with ID. |
|
ID |
Discards the speculative buffer that's associated with ID. |
Example 3-1 How to use speculation
The following example illustrates how to use speculation. All speculation functions must be used together for speculation to work correctly.
The speculation is created for the syscall::open:entry
probe and the ID
for the speculation is attached to a thread-local variable. The first argument of the
open()
system call is traced to the speculation buffer by using the
printf
function.
Three more clauses are included for the syscall::open:return
probe. In the
first of these clauses, the errno
is traced to the speculative buffer. The
predicate for the second of the clauses filters for a non-zero errno
value
and commits the speculation buffer. The predicate of the third of the clauses filters for a
zero errno
value and discards the speculation buffer.
The output of the program is returned for the primary data buffer, so the program
effectively returns the file name and error number when an open()
system
call fails. If the call doesn't fail, the information that was traced into the speculation
buffer is discarded.
syscall::open:entry
{
/*
* The call to speculation() creates a new speculation. If this fails,
* dtrace will generate an error message indicating the reason for
* the failed speculation(), but subsequent speculative tracing will be
* silently discarded.
*/
self->spec = speculation();
speculate(self->spec);
/*
* Because this printf() follows the speculate(), it is being
* speculatively traced; it will only appear in the primary data buffer if the
* speculation is subsequently committed.
*/
printf("%s", copyinstr(arg0));
}
syscall::open:return
/self->spec/
{
/*
* Trace the errno value into the speculation buffer.
*/
speculate(self->spec);
trace(errno);
}
syscall::open:return
/self->spec && errno != 0/
{
/*
* If errno is non-zero, commit the speculation.
*/
commit(self->spec);
self->spec = 0;
}
syscall::open:return
/self->spec && errno == 0/
{
/*
* If errno is not set, discard the speculation.
*/
discard(self->spec);
self->spec = 0;
}