2.12 Structs and Unions

2.12.1 Structs
2.12.2 Pointers to Structs
2.12.3 Unions
2.12.4 Member Sizes and Offsets
2.12.5 Bit-Fields

Collections of related variables can be grouped together into composite data objects called structs and unions. You can define these objects in D by creating new type definitions for them. You can use your new types for any D variables, including associative array values. This chapter explores the syntax and semantics for creating and manipulating these composite types and the D operators that interact with them.

2.12.1 Structs

The D keyword struct, short for structure, is used to introduce a new type composed of a group of other types. The new struct type can be used as the type for D variables and arrays, enabling you to define groups of related variables under a single name. D structs are the same as the corresponding construct in C and C++. If you have programmed in the Java programming language, think of a D struct as a class that contains only data members and no methods.

Suppose you want to create a more sophisticated system call tracing program in D that records a number of things about each read() and write() system call executed by your shell, such as the elapsed time, number of calls, and the largest byte count passed as an argument. You could write a D clause to record these properties in three separate associative arrays as shown in the following example:

int maxbytes; /* declare maxbytes */ 
syscall::read:entry, syscall::write:entry
/pid == 12345/
{
  ts[probefunc] = timestamp;
  calls[probefunc]++;
  maxbytes[probefunc] = arg2 > maxbytes[probefunc] ?
        arg2 : maxbytes[probefunc];
}

However, this clause is inefficient because DTrace must create three separate associative arrays and store separate copies of the identical tuple values corresponding to probefunc for each one. Instead, you can conserve space and make your program easier to read and maintain by using a struct. First, declare a new struct type at the top of the D program source file:

struct callinfo {
  uint64_t ts;       /* timestamp of last syscall entry */
  uint64_t elapsed;  /* total elapsed time in nanoseconds */
  uint64_t calls;    /* number of calls made */
  size_t maxbytes;   /* maximum byte count argument */
};

The struct keyword is followed by an optional identifier used to refer back to our new type, which is now known as struct callinfo. The struct members are then enclosed in a set of braces {} and the entire declaration is terminated by a semicolon (;). Each struct member is defined using the same syntax as a D variable declaration, with the type of the member listed first followed by an identifier naming the member and another semicolon (;).

The struct declaration itself simply defines the new type; it does not create any variables or allocate any storage in DTrace. Once declared, you can use struct callinfo as a type throughout the remainder of your D program, and each variable of type struct callinfo will store a copy of the four variables described by our structure template. The members will be arranged in memory in order according to the member list, with padding space introduced between members as required for data object alignment purposes.

You can use the member identifier names to access the individual member values using the “.” operator by writing an expression of the form:

variable-name.member-name

The following example is an improved program using the new structure type. Go to your editor, type in the following D program and save it in a file named rwinfo.d:

struct callinfo {
  uint64_t ts; /* timestamp of last syscall entry */
  uint64_t elapsed; /* total elapsed time in nanoseconds */
  uint64_t calls; /* number of calls made */
  size_t maxbytes; /* maximum byte count argument */
};

struct callinfo i[string]; /* declare i as an associative array */

syscall::read:entry, syscall::write:entry
/pid == $1/
{
  i[probefunc].ts = timestamp;
  i[probefunc].calls++;
  i[probefunc].maxbytes = arg2 > i[probefunc].maxbytes ?
        arg2 : i[probefunc].maxbytes;
}

syscall::read:return, syscall::write:return
/i[probefunc].ts != 0 && pid == $1/
{
  i[probefunc].elapsed += timestamp - i[probefunc].ts;
}

END
{
  printf("       calls max bytes elapsed nsecs\n");
  printf("------ ----- --------- -------------\n");
  printf("  read %5d %9d %d\n",
  i["read"].calls, i["read"].maxbytes, i["read"].elapsed);
  printf(" write %5d %9d %d\n",
  i["write"].calls, i["write"].maxbytes, i["write"].elapsed);
}

After you have typed in the program, run dtrace -q -s rwinfo.d, specifying one of your shell processes. Then type in a few commands in your shell and, when you have finished entering your shell commands, type Ctrl-C to fire the END probe and print the results, for example:

# dtrace -q -s rwinfo.d `pgrep -n bash`
^C
       calls max bytes elapsed nsecs
------ ----- --------- -------------
  read    25      1024 8775036488
 write    33        22 1859173
#

2.12.2 Pointers to Structs

Referring to structs using pointers is very common in C and D. You can use the operator -> to access struct members through a pointer. If struct s has a member m and you have a pointer to this struct named sp (that is, sp is a variable of type struct s *), you can either use the * operator to first dereference sp pointer to access the member:

struct s *sp;
(*sp).m

or you can use the -> operator as a shorthand for this notation. The following two D fragments are equivalent if sp is a pointer to a struct:

(*sp).m 
sp->m

DTrace provides several built-in variables which are pointers to structs. For example, the pointer curpsinfo refers to the struct psinfo and its content provides a snapshot of information about the state of the process associated with the thread that fired the current probe. Here are few example expressions using curpsinfo, including their types and their meanings:

Example Expression

Type

Meaning

curpsinfo->pr_pid

pid_t

Current process ID

curpsinfo->pr_fname

char []

Executable file name

curpsinfo->pr_psargs

char []

Initial command-line arguments

For more information, see Section 11.5.4, “psinfo_t”.

The next example uses the pr_fname member to identify a process of interest. In an editor, type in this script and save it in a file named procfs.d:

syscall::write:entry
/ curpsinfo->pr_fname == "date" /
{
  printf("%s run by UID %d\n", curpsinfo->pr_psargs, curpsinfo->pr_uid);
}

This clause uses the expression curpsinfo->pr_fname to access and match the command name so that the script selects the correct write() requests before tracing the arguments. Notice that by using operator == with a left-hand argument that is an array of char and a right-hand argument that is a string, the D compiler infers that the left-hand argument should be promoted to a string and a string comparison should be performed. Enter the command dtrace -q -s procs.d in one shell, and the date command several times in another shell. You see output from DTrace similar to the following example:

# dtrace -q -s procfs.d 
date  run by UID 500
/bin/date  run by UID 500
date -R  run by UID 500
...
^C
#

Complex data structures are used frequently in C programs, so the ability to describe and reference structs from D also provides a powerful capability for observing the inner workings of the Oracle Linux operating system kernel and its system interfaces.

2.12.3 Unions

Unions are another kind of composite type supported by ANSI C and D, and are closely related to structs. A union is a composite type where a set of members of different types are defined and the member objects all occupy the same region of storage. A union is therefore an object of variant type, where only one member is valid at any given time, depending on how the union has been assigned. Typically, some other variable or piece of state is used to indicate which union member is currently valid. The size of a union is the size of its largest member, and the memory alignment used for the union is the maximum alignment required by the union members.

2.12.4 Member Sizes and Offsets

You can determine the size in bytes of any D type or expression, including a struct or union, using the sizeof operator. The sizeof operator can be applied either to an expression or to the name of a type surrounded by parentheses, as illustrated by the following two examples:

sizeof expression 
sizeof (type-name)

For example, the expression sizeof (uint64_t) would return the value 8, and the expression sizeof (callinfo.ts) would also return 8 if inserted into the source code of our example program above. The formal return type of the sizeof operator is the type alias size_t, which is defined to be an unsigned integer of the same size as a pointer in the current data model, and is used to represent byte counts. When the sizeof operator is applied to an expression, the expression is validated by the D compiler but the resulting object size is computed at compile time and no code for the expression is generated. You can use sizeof anywhere an integer constant is required.

You can use the companion operator offsetof to determine the offset in bytes of a struct or union member from the start of the storage associated with any object of the struct or union type. The offsetof operator is used in an expression of the following form:

offsetof (type-name, member-name)

Here type-name is the name of any struct or union type or type alias, and member-name is the identifier naming a member of that struct or union. Similar to sizeof, offsetof returns a size_t and you can use it anywhere in a D program that an integer constant can be used.

2.12.5 Bit-Fields

D also permits the definition of integer struct and union members of arbitrary numbers of bits, known as bit-fields. A bit-field is declared by specifying a signed or unsigned integer base type, a member name, and a suffix indicating the number of bits to be assigned for the field, as shown in the following example:

struct s 
{
  int a : 1;
  int b : 3;
  int c : 12;
};

The bit-field width is an integer constant separated from the member name by a trailing colon. The bit-field width must be positive and must be of a number of bits not larger than the width of the corresponding integer base type. Bit-fields larger than 64 bits may not be declared in D. D bit-fields provide compatibility with and access to the corresponding ANSI C capability. Bit-fields are typically used in situations when memory storage is at a premium or when a struct layout must match a hardware register layout.

A bit-field is a compiler construct that automates the layout of an integer and a set of masks to extract the member values. The same result can be achieved by simply defining the masks yourself and using the & operator. C and D compilers try to pack bits as efficiently as possible, but they are free to do so in any order or fashion they desire, so bit-fields are not guaranteed to produce identical bit layouts across differing compilers or architectures. If you require stable bit layout, you should construct the bit masks yourself and extract the values using the & operator.

A bit-field member is accessed by simply specifying its name in combination with the “.” or -> operators like any other struct or union member. The bit-field is automatically promoted to the next largest integer type for use in any expressions. Because bit-field storage may not be aligned on a byte boundary or be a round number of bytes in size, you may not apply the sizeof or offsetof operators to a bit-field member. The D compiler also prohibits you from taking the address of a bit-field member using the & operator.