Collections of related variables can be grouped together into composite data objects called structs and unions. You can define these objects in D by creating new type definitions for them. You can use your new types for any D variables, including associative array values. This chapter explores the syntax and semantics for creating and manipulating these composite types and the D operators that interact with them.
The D keyword struct, short for structure, is
used to introduce a new type composed of a group of other types. The new
struct type can be used as the type for D variables and arrays,
enabling you to define groups of related variables under a single name. D structs are the
same as the corresponding construct in C and C++. If you have programmed in the Java
programming language, think of a D struct as a class that contains only data members and no
methods.
Suppose you want to create a more sophisticated system call
tracing program in D that records a number of things about each
read() and write() system
call executed by your shell, such as the elapsed time, number of
calls, and the largest byte count passed as an argument. You
could write a D clause to record these properties in three
separate associative arrays as shown in the following example:
int maxbytes; /* declare maxbytes */
syscall::read:entry, syscall::write:entry
/pid == 12345/
{
ts[probefunc] = timestamp;
calls[probefunc]++;
maxbytes[probefunc] = arg2 > maxbytes[probefunc] ?
arg2 : maxbytes[probefunc];
} However, this clause is inefficient because DTrace must create three separate
associative arrays and store separate copies of the identical tuple values corresponding to
probefunc for each one. Instead, you can conserve space and make your
program easier to read and maintain by using a struct. First, declare a new
struct type at the top of the D program source file:
struct callinfo {
uint64_t ts; /* timestamp of last syscall entry */
uint64_t elapsed; /* total elapsed time in nanoseconds */
uint64_t calls; /* number of calls made */
size_t maxbytes; /* maximum byte count argument */
}; The struct keyword is followed by an optional identifier used to
refer back to our new type, which is now known as struct callinfo. The
struct members are then enclosed in a set of braces {} and the entire
declaration is terminated by a semicolon (;). Each struct member is
defined using the same syntax as a D variable declaration, with the type of the member
listed first followed by an identifier naming the member and another semicolon
(;).
The struct declaration itself simply defines
the new type; it does not create any variables or allocate any
storage in DTrace. Once declared, you can use struct
callinfo as a type throughout the remainder of your D
program, and each variable of type struct
callinfo will store a copy of the four variables
described by our structure template. The members will be
arranged in memory in order according to the member list, with
padding space introduced between members as required for data
object alignment purposes.
You can use the member identifier names to access the individual
member values using the “.” operator by
writing an expression of the form:
variable-name.member-name The following example is an improved program using the new structure type. Go to your
editor, type in the following D program and save it in a file named
rwinfo.d:
struct callinfo {
uint64_t ts; /* timestamp of last syscall entry */
uint64_t elapsed; /* total elapsed time in nanoseconds */
uint64_t calls; /* number of calls made */
size_t maxbytes; /* maximum byte count argument */
};
struct callinfo i[string]; /* declare i as an associative array */
syscall::read:entry, syscall::write:entry
/pid == $1/
{
i[probefunc].ts = timestamp;
i[probefunc].calls++;
i[probefunc].maxbytes = arg2 > i[probefunc].maxbytes ?
arg2 : i[probefunc].maxbytes;
}
syscall::read:return, syscall::write:return
/i[probefunc].ts != 0 && pid == $1/
{
i[probefunc].elapsed += timestamp - i[probefunc].ts;
}
END
{
printf(" calls max bytes elapsed nsecs\n");
printf("------ ----- --------- -------------\n");
printf(" read %5d %9d %d\n",
i["read"].calls, i["read"].maxbytes, i["read"].elapsed);
printf(" write %5d %9d %d\n",
i["write"].calls, i["write"].maxbytes, i["write"].elapsed);
} After you have typed in the program, run dtrace -q -s rwinfo.d,
specifying one of your shell processes. Then type in a few commands in your shell and, when
you have finished entering your shell commands, type Ctrl-C to fire the
END probe and print the results, for example:
#dtrace -q -s rwinfo.d `pgrep -n bash`^Ccalls max bytes elapsed nsecs ------ ----- --------- ------------- read 25 1024 8775036488 write 33 22 1859173 #
Referring to structs using pointers is very common in C and D. You can use the operator
-> to access struct members through a pointer. If struct
s has a member m and you have a pointer to this struct named
sp (that is, sp is a variable of type
struct s *), you can either use the * operator to
first dereference sp pointer to access the member:
struct s *sp; (*sp).m
or you can use the -> operator as a shorthand for this notation. The
following two D fragments are equivalent if sp is a pointer to a struct:
(*sp).m sp->m
DTrace provides several built-in variables which are pointers to structs. For example,
the pointer curpsinfo refers to the struct
psinfo and its content provides a snapshot of information
about the state of the process associated with the thread that fired the current probe. Here
are few example expressions using curpsinfo, including their types and
their meanings:
Example Expression | Type | Meaning |
|---|---|---|
|
| Current process ID |
|
| Executable file name |
|
| Initial command-line arguments |
For more information, see Section 11.5.4, “psinfo_t”.
The next example uses the pr_fname member to identify a process of
interest. In an editor, type in this script and save it in a file named
procfs.d:
syscall::write:entry
/ curpsinfo->pr_fname == "date" /
{
printf("%s run by UID %d\n", curpsinfo->pr_psargs, curpsinfo->pr_uid);
} This clause uses the expression curpsinfo->pr_fname to access and
match the command name so that the script selects the correct write()
requests before tracing the arguments. Notice that by using operator ==
with a left-hand argument that is an array of char and a right-hand argument that is a
string, the D compiler infers that the left-hand argument should be promoted to a string and
a string comparison should be performed. Enter the command dtrace -q -s
procs.d in one shell, and the date command several times in
another shell. You see output from DTrace similar to the following example:
#dtrace -q -s procfs.ddate run by UID 500 /bin/date run by UID 500 date -R run by UID 500 ...^C#
Complex data structures are used frequently in C programs, so the ability to describe and reference structs from D also provides a powerful capability for observing the inner workings of the Oracle Linux operating system kernel and its system interfaces.
Unions are another kind of composite type supported by ANSI C and D, and are closely related to structs. A union is a composite type where a set of members of different types are defined and the member objects all occupy the same region of storage. A union is therefore an object of variant type, where only one member is valid at any given time, depending on how the union has been assigned. Typically, some other variable or piece of state is used to indicate which union member is currently valid. The size of a union is the size of its largest member, and the memory alignment used for the union is the maximum alignment required by the union members.
You can determine the size in bytes of any D type or expression,
including a struct or
union, using the sizeof
operator. The sizeof operator can be applied
either to an expression or to the name of a type surrounded by
parentheses, as illustrated by the following two examples:
sizeofexpressionsizeof (type-name)
For example, the expression sizeof (uint64_t) would return the value
8, and the expression sizeof (callinfo.ts) would also return 8 if
inserted into the source code of our example program above. The formal return type of the
sizeof operator is the type alias size_t, which is
defined to be an unsigned integer of the same size as a pointer in the current data model,
and is used to represent byte counts. When the sizeof operator is applied
to an expression, the expression is validated by the D compiler but the resulting object
size is computed at compile time and no code for the expression is generated. You can use
sizeof anywhere an integer constant is required.
You can use the companion operator offsetof to determine the offset
in bytes of a struct or union member from the start of the storage associated with any
object of the struct or union type. The
offsetof operator is used in an expression of the following form:
offsetof (type-name,member-name)
Here type-name is the name of any struct
or union type or type alias, and member-name
is the identifier naming a member of that struct or union. Similar to
sizeof, offsetof returns a
size_t and you can use it anywhere in a D program that an integer constant can
be used.
D also permits the definition of integer struct and union members of arbitrary numbers of bits, known as bit-fields. A bit-field is declared by specifying a signed or unsigned integer base type, a member name, and a suffix indicating the number of bits to be assigned for the field, as shown in the following example:
struct s
{
int a : 1;
int b : 3;
int c : 12;
};The bit-field width is an integer constant separated from the member name by a trailing colon. The bit-field width must be positive and must be of a number of bits not larger than the width of the corresponding integer base type. Bit-fields larger than 64 bits may not be declared in D. D bit-fields provide compatibility with and access to the corresponding ANSI C capability. Bit-fields are typically used in situations when memory storage is at a premium or when a struct layout must match a hardware register layout.
A bit-field is a compiler construct that automates the layout of
an integer and a set of masks to extract the member values. The
same result can be achieved by simply defining the masks
yourself and using the & operator. C and
D compilers try to pack bits as efficiently as possible, but
they are free to do so in any order or fashion they desire, so
bit-fields are not guaranteed to produce identical bit layouts
across differing compilers or architectures. If you require
stable bit layout, you should construct the bit masks yourself
and extract the values using the &
operator.
A bit-field member is accessed by simply specifying its name in combination with the
“.” or -> operators like any other struct or union
member. The bit-field is automatically promoted to the next largest integer type for use in
any expressions. Because bit-field storage may not be aligned on a byte boundary or be a
round number of bytes in size, you may not apply the sizeof or
offsetof operators to a bit-field member. The D compiler also prohibits
you from taking the address of a bit-field member using the &
operator.