Solaris Dynamic Tracing Guide

Chapter 7 Structs and Unions

Collections of related variables can be grouped together into composite data objects called structs and unions. You can define these objects in D by creating new type definitions for them. You can use your new types for any D variables, including associative array values. This chapter explores the syntax and semantics for creating and manipulating these composite types and the D operators that interact with them. The syntax for structs and unions is illustrated using several example programs that demonstrate the use of the DTrace fbt and pid providers.

Structs

The D keyword struct, short for structure, is used to introduce a new type composed of a group of other types. The new struct type can be used as the type for D variables and arrays, enabling you to define groups of related variables under a single name. D structs are the same as the corresponding construct in C and C++. If you have programmed in the Java programming language, think of a D struct as a class, but one with data members only and no methods.

Let's suppose you want to create a more sophisticated system call tracing program in D that records a number of things about each read(2) and write(2) system call executed by your shell, such as the elapsed time, number of calls, and the largest byte count passed as an argument. You could write a D clause to record these properties in three separate associative arrays as shown in the following example:

syscall::read:entry, syscall::write:entry
/pid == 12345/
{
	ts[probefunc] = timestamp;
	calls[probefunc]++;
	maxbytes[probefunc] = arg2 > maxbytes[probefunc] ?
	    arg2 : maxbytes[probefunc];
}

However, this clause is inefficient because DTrace must create three separate associative arrays and store separate copies of the identical tuple values corresponding to probefunc for each one. Instead, you can conserve space and make your program easier to read and maintain by using a struct. First, declare a new struct type at the top of the program source file:

struct callinfo {
	uint64_t ts;      /* timestamp of last syscall entry */
	uint64_t elapsed; /* total elapsed time in nanoseconds */
	uint64_t calls;   /* number of calls made */
	size_t maxbytes;  /* maximum byte count argument */
};

The struct keyword is followed by an optional identifier used to refer back to our new type, which is now known as struct callinfo. The struct members are then enclosed in a set of braces { } and the entire declaration is terminated by a semicolon (;). Each struct member is defined using the same syntax as a D variable declaration, with the type of the member listed first followed by an identifier naming the member and another semicolon (;).

The struct declaration itself simply defines the new type; it does not create any variables or allocate any storage in DTrace. Once declared, you can use struct callinfo as a type throughout the remainder of your D program, and each variable of type struct callinfo will store a copy of the four variables described by our structure template. The members will be arranged in memory in order according to the member list, with padding space introduced between members as required for data object alignment purposes.

You can use the member identifier names to access the individual member values using the “.” operator by writing an expression of the form:

variable-name.member-name

The following example is an improved program using the new structure type. Go to your editor and type in the following D program and save it in a file named rwinfo.d:

Example 7–1 `rwinfo.d`: Gather read(2) and write(2) Statistics

struct callinfo {
	uint64_t ts;      /* timestamp of last syscall entry */
	uint64_t elapsed; /* total elapsed time in nanoseconds */
	uint64_t calls;   /* number of calls made */
	size_t maxbytes;  /* maximum byte count argument */
};

struct callinfo i[string];	/* declare i as an associative array */

syscall::read:entry, syscall::write:entry
/pid == $1/
{
	i[probefunc].ts = timestamp;
	i[probefunc].calls++;
	i[probefunc].maxbytes = arg2 > i[probefunc].maxbytes ?
		arg2 : i[probefunc].maxbytes;
}

syscall::read:return, syscall::write:return
/i[probefunc].ts != 0 && pid == $1/
{
	i[probefunc].elapsed += timestamp - i[probefunc].ts;
}

END
{
	printf("        calls  max bytes  elapsed nsecs\n");
	printf("------  -----  ---------  -------------\n");
	printf("  read  %5d  %9d  %d\n",
	    i["read"].calls, i["read"].maxbytes, i["read"].elapsed);
	printf(" write  %5d  %9d  %d\n",
	    i["write"].calls, i["write"].maxbytes, i["write"].elapsed);
}

After you type in the program, run dtrace -q -s rwinfo.d, specifying one of your shell processes. Then go type in a few commands in your shell and, when you're done entering your shell commands, type Control-C in the dtrace terminal to fire the END probe and print the results:

# dtrace -q -s rwinfo.d `pgrep -n ksh`
^C
        calls  max bytes  elapsed nsecs
------  -----  ---------  -------------
  read     36       1024  3588283144
 write     35         59  14945541
#

Pointers to Structs

Referring to structs using pointers is very common in C and D. You can use the operator -> to access struct members through a pointer. If a struct s has a member m and you have a pointer to this struct named sp (that is, sp is a variable of type struct s *), you can either use the * operator to first dereference sp pointer in order to access the member:

struct s *sp;

(*sp).m

or you can use the -> operator as a shorthand for this notation. The following two D fragments are equivalent in meaning if sp is a pointer to a struct:

(*sp).m				sp->m

DTrace provides several built-in variables which are pointers to structs, including curpsinfo and curlwpsinfo. These pointers refer to the structs psinfo and lwpsinfo respectively, and their content provides a snapshot of information about the state of the current process and lightweight process (LWP) associated with the thread that has fired the current probe. A Solaris LWP is the kernel's representation of a user thread, upon which the Solaris threads and POSIX threads interfaces are built. For convenience, DTrace exports this information in the same form as the /proc filesystem files /proc/pid/psinfo and /proc/pid/lwps/lwpid/lwpsinfo. The /proc structures are used by observability and debugging tools such as ps(1), pgrep(1), and truss(1), and are defined in the system header file <sys/procfs.h> and are described in the proc(4) man page. Here are few example expressions using curpsinfo, their types, and their meanings:

`curpsinfo->pr_pid`	`pid_t`	current process ID
`curpsinfo->pr_fname`	`char []`	executable file name
`curpsinfo->pr_psargs`	`char []`	initial command line arguments

You should review the complete structure definition later by examining the <sys/procfs.h> header file and the corresponding descriptions in proc(4). The next example uses the pr_psargs member to identify a process of interest by matching command-line arguments.

Structs are used frequently to create complex data structures in C programs, so the ability to describe and reference structs from D also provides a powerful capability for observing the inner workings of the Solaris operating system kernel and its system interfaces. In addition to using the aforementioned curpsinfo struct, the next example examines some kernel structs as well by observing the relationship between the ksyms(7D) driver and read(2) requests. The driver makes use of two common structs, known as uio(9S) and iovec(9S), to respond to requests to read from the character device file /dev/ksyms.

The uio struct, accessed using the name struct uio or type alias uio_t, is described in the uio(9S) man page and is used to describe an I/O request that involves copying data between the kernel and a user process. The uio in turn contains an array of one or more iovec(9S) structures which each describe a piece of the requested I/O, in the event that multiple chunks are requested using the readv(2) or writev(2) system calls. One of the kernel device driver interface (DDI) routines that operates on struct uio is the function uiomove(9F), which is one of a family of functions kernel drivers use to respond to user process read(2) requests and copy data back to user processes.

The ksyms driver manages a character device file named /dev/ksyms, which appears to be an ELF file containing information about the kernel's symbol table, but is in fact an illusion created by the driver using the set of modules that are currently loaded into the kernel. The driver uses the uiomove(9F) routine to respond to read(2) requests. The next example illustrates that the arguments and calls to read(2) from /dev/ksyms match the calls by the driver to uiomove(9F) to copy the results back into the user address space at the location specified to read(2).

We can use the strings(1) utility with the -a option to force a bunch of reads from /dev/ksyms. Try running strings -a /dev/ksyms in your shell and see what output it produces. In an editor, type in the first clause of the example script and save it in a file named ksyms.d:

syscall::read:entry
/curpsinfo->pr_psargs == "strings -a /dev/ksyms"/
{
	printf("read %u bytes to user address %x\n", arg2, arg1);
}

This first clause uses the expression curpsinfo->pr_psargs to access and match the command-line arguments of our strings(1) command so that the script selects the correct read(2) requests before tracing the arguments. Notice that by using operator == with a left-hand argument that is an array of char and a right-hand argument that is a string, the D compiler infers that the left-hand argument should be promoted to a string and a string comparison should be performed. Type in and execute the command dtrace -q -s ksyms.d in one shell, and then type in the command strings -a /dev/ksyms in another shell. As strings(1) executes, you will see output from DTrace similar to the following example:

# dtrace -q -s ksyms.d
read 8192 bytes to user address 80639fc
read 8192 bytes to user address 80639fc
read 8192 bytes to user address 80639fc
read 8192 bytes to user address 80639fc
...
^C
#

This example can be extended using a common D programming technique to follow a thread from this initial read(2) request deeper into the kernel. Upon entry to the kernel in syscall::read:entry, the next script sets a thread-local flag variable indicating this thread is of interest, and clears this flag on syscall::read:return. Once the flag is set, it can be used as a predicate on other probes to instrument kernel functions such as uiomove(9F). The DTrace function boundary tracing (fbt) provider publishes probes for entry and return to functions defined within the kernel, including those in the DDI. Type in the following source code which uses the fbt provider to instrument uiomove(9F) and again save it in the file ksyms.d:

Example 7–2 `ksyms.d`: Trace read(2) and uiomove(9F) Relationship

/*
 * When our strings(1) invocation starts a read(2), set a watched flag on
 * the current thread.  When the read(2) finishes, clear the watched flag.
 */
syscall::read:entry
/curpsinfo->pr_psargs == "strings -a /dev/ksyms"/
{
	printf("read %u bytes to user address %x\n", arg2, arg1);
	self->watched = 1;
}

syscall::read:return
/self->watched/
{
	self->watched = 0;
}

/*
 * Instrument uiomove(9F).  The prototype for this function is as follows:
 * int uiomove(caddr_t addr, size_t nbytes, enum uio_rw rwflag, uio_t *uio);
 */
fbt::uiomove:entry
/self->watched/
{
	this->iov = args[3]->uio_iov;

	printf("uiomove %u bytes to %p in pid %d\n",
	    this->iov->iov_len, this->iov->iov_base, pid);
}

The final clause of the example uses the thread-local variable self->watched to identify when a kernel thread of interest enters the DDI routine uiomove(9F). Once there, the script uses the built-in args array to access the fourth argument (args[3]) to uiomove(), which is a pointer to the struct uio representing the request. The D compiler automatically associates each member of the args array with the type corresponding to the C function prototype for the instrumented kernel routine. The uio_iov member contains a pointer to the struct iovec for the request. A copy of this pointer is saved for use in our clause in the clause-local variable this->iov. In the final statement, the script dereferences this->iov to access the iovec members iov_len and iov_base, which represent the length in bytes and destination base address for uiomove(9F), respectively. These values should match the input parameters to the read(2) system call issued on the driver. Go to your shell and run dtrace -q -s ksyms.d and then again enter the command strings -a /dev/ksyms in another shell. You should see output similar to the following example:

# dtrace -q -s ksyms.d
read 8192 bytes at user address 80639fc
uiomove 8192 bytes to 80639fc in pid 101038
read 8192 bytes at user address 80639fc
uiomove 8192 bytes to 80639fc in pid 101038
read 8192 bytes at user address 80639fc
uiomove 8192 bytes to 80639fc in pid 101038
read 8192 bytes at user address 80639fc
uiomove 8192 bytes to 80639fc in pid 101038
...
^C
#

The addresses and process IDs will be different in your output, but you should observe that the input arguments to read(2) match the parameters passed to uiomove(9F) by the ksyms driver.

Unions

Unions are another kind of composite type supported by ANSI-C and D, and are closely related to structs. A union is a composite type where a set of members of different types are defined and the member objects all occupy the same region of storage. A union is therefore an object of variant type, where only one member is valid at any given time, depending on how the union has been assigned. Typically, some other variable or piece of state is used to indicate which union member is currently valid. The size of a union is the size of its largest member, and the memory alignment used for the union is the maximum alignment required by the union members.

The Solaris kstat framework defines a struct containing a union that is used in the following example to illustrate and observe C and D unions. The kstat framework is used to export a set of named counters representing kernel statistics such as memory usage and I/O throughput. The framework is used to implement utilities such as mpstat(1M) and iostat(1M). This framework uses struct kstat_named to represent a named counter and its value and is defined as follows:

struct kstat_named {
	char name[KSTAT_STRLEN]; /* name of counter */
	uchar_t data_type;	/* data type */
	union {
		char c[16];
		int32_t i32;
		uint32_t ui32;
		long l;
		ulong_t ul;
		...
	} value;	/* value of counter */
};

The examine declaration is shortened the declaration for illustrative purposes. The complete structure definition can be found in the <sys/kstat.h> header file and is described in kstat_named(9S). The declaration above is valid in both ANSI-C and D, and defines a struct containing as one of its members a union value with members of various types, depending on the type of the counter. Notice that since the union itself is declared inside of another type, struct kstat_named, a formal name for the union type is omitted. This declaration style is known as an anonymous union. The member named value is of a union type described by the preceding declaration, but this union type itself has no name because it does not need to be used anywhere else. The struct member data_type is assigned a value that indicates which union member is valid for each object of type struct kstat_named. A set of C preprocessor tokens are defined for the values of data_type. For example, the token KSTAT_DATA_CHAR is equal to zero and indicates that the member value.c is where the value is currently stored.

Example 7–3 demonstrates accessing the kstat_named.value union by tracing a user process. The kstat counters can be sampled from a user process using the kstat_data_lookup(3KSTAT) function, which returns a pointer to a struct kstat_named. The mpstat(1M) utility calls this function repeatedly as it executes in order to sample the latest counter values. Go to your shell and try running mpstat 1 and observe the output. Press Control-C in your shell to abort mpstat after a few seconds. To observe counter sampling, we would like to enable a probe that fires each time the mpstat command calls the kstat_data_lookup(3KSTAT) function in libkstat. To do so, we're going to make use of a new DTrace provider: pid. The pid provider permits you to dynamically create probes in user processes at C symbol locations such as function entry points. You can ask the pid provider to create a probe at a user function entry and return sites by writing probe descriptions of the form:

pidprocess-ID:object-name:function-name:entry
pidprocess-ID:object-name:function-name:return

For example, if you wanted to create a probe in process ID 12345 that fires on entry to kstat_data_lookup(3KSTAT), you would write the following probe description:

pid12345:libkstat:kstat_data_lookup:entry

The pid provider inserts dynamic instrumentation into the specified user process at the program location corresponding to the probe description. The probe implementation forces each user thread that reaches the instrumented program location to trap into the operating system kernel and enter DTrace, firing the corresponding probe. So although the instrumentation location is associated with a user process, the DTrace predicates and actions you specify still execute in the context of the operating system kernel. The pid provider is described in further detail in Chapter 30, pid Provider.

Instead of having to edit your D program source each time you wish to apply your program to a different process, you can insert identifiers called macro variables into your program that are evaluated at the time your program is compiled and replaced with the additional dtrace command-line arguments. Macro variables are specified using a dollar sign $ followed by an identifier or digit. If you execute the command dtrace -s script foo bar baz, the D compiler will automatically define the macro variables $1, $2, and $3 to be the tokens foo, bar, and baz respectively. You can use macro variables in D program expressions or in probe descriptions. For example, the following probe descriptions instrument whatever process ID is specified as an additional argument to dtrace:

pid$1:libkstat:kstat_data_lookup:entry
{
	self->ksname = arg1;
}

pid$1:libkstat:kstat_data_lookup:return
/self->ksname != NULL && arg1 != NULL/
{
	this->ksp = (kstat_named_t *)copyin(arg1, sizeof (kstat_named_t));
	printf("%s has ui64 value %u\n", copyinstr(self->ksname),
	    this->ksp->value.ui64);
}

pid$1:libkstat:kstat_data_lookup:return
/self->ksname != NULL && arg1 == NULL/
{
	self->ksname = NULL;
}

Macro variables and reusable scripts are described in further detail in Chapter 15, Scripting. Now that we know how to instrument user processes using their process ID, let's return to sampling unions. Go to your editor and type in the source code for our complete example and save it in a file named kstat.d:

Example 7–3 `kstat.d`: Trace Calls to kstat_data_lookup(3KSTAT)

pid$1:libkstat:kstat_data_lookup:entry
{
	self->ksname = arg1;
}

pid$1:libkstat:kstat_data_lookup:return
/self->ksname != NULL && arg1 != NULL/
{
	this->ksp = (kstat_named_t *) copyin(arg1, sizeof (kstat_named_t));
	printf("%s has ui64 value %u\n",
	    copyinstr(self->ksname), this->ksp->value.ui64);
}

pid$1:libkstat:kstat_data_lookup:return
/self->ksname != NULL && arg1 == NULL/
{
	self->ksname = NULL;
}

Now go to one of your shells and execute the command mpstat 1 to start mpstat(1M) running in a mode where it samples statistics and reports them once per second. Once mpstat is running, execute the command dtrace -q -s kstat.d `pgrep mpstat` in your other shell. You will see output corresponding to the statistics that are being accessed. Press Control-C to abort dtrace and return to the shell prompt.

# dtrace -q -s kstat.d `pgrep mpstat`
cpu_ticks_idle has ui64 value 41154176
cpu_ticks_user has ui64 value 1137
cpu_ticks_kernel has ui64 value 12310
cpu_ticks_wait has ui64 value 903
hat_fault has ui64 value 0
as_fault has ui64 value 48053
maj_fault has ui64 value 1144
xcalls has ui64 value 123832170
intr has ui64 value 165264090
intrthread has ui64 value 124094974
pswitch has ui64 value 840625
inv_swtch has ui64 value 1484
cpumigrate has ui64 value 36284
mutex_adenters has ui64 value 35574
rw_rdfails has ui64 value 2
rw_wrfails has ui64 value 2
...
^C
#

If you capture the output in each terminal window and subtract each value from the value reported by the previous iteration through the statistics, you should be able to correlate the dtrace output with the mpstat output. The example program records the counter name pointer on entry to the lookup function, and then performs most of the tracing work on return from kstat_data_lookup(3KSTAT). The D built-in functions copyinstr() and copyin() copy the function results from the user process back into DTrace when arg1 (the return value) is not NULL. Once the kstat data has been copied, the example reports the ui64 counter value from the union. This simplified example assumes that mpstat samples counters that use the value.ui64 member. As an exercise, try recoding kstat.d to use multiple predicates and print out the union member corresponding to the data_type member. You can also try to create a version of kstat.d that computes the difference between successive data values and actually produces output similar to mpstat.

Member Sizes and Offsets

You can determine the size in bytes of any D type or expression, including a struct or union, using the sizeof operator. The sizeof operator can be applied either to an expression or to the name of a type surrounded by parentheses, as illustrated by the following two examples:

sizeof expression				sizeof (type-name)

For example, the expression sizeof (uint64_t) would return the value 8, and the expression sizeof (callinfo.ts) would also return 8 if inserted into the source code of our example program above. The formal return type of the sizeof operator is the type alias size_t, which is defined to be an unsigned integer of the same size as a pointer in the current data model, and is used to represent byte counts. When the sizeof operator is applied to an expression, the expression is validated by the D compiler but the resulting object size is computed at compile time and no code for the expression is generated. You can use sizeof anywhere an integer constant is required.

You can use the companion operator offsetof to determine the offset in bytes of a struct or union member from the start of the storage associated with any object of the struct or union type. The offsetof operator is used in an expression of the following form:

offsetof (type-name, member-name)

Here type-name is the name of any struct or union type or type alias, and member-name is the identifier naming a member of that struct or union. Similar to sizeof, offsetof returns a size_t and can be used anywhere in a D program that an integer constant can be used.

Bit-Fields

D also permits the definition of integer struct and union members of arbitrary numbers of bits, known as bit-fields. A bit-field is declared by specifying a signed or unsigned integer base type, a member name, and a suffix indicating the number of bits to be assigned for the field, as shown in the following example:

struct s {
	int a : 1;
	int b : 3;
	int c : 12;
};

The bit-field width is an integer constant separated from the member name by a trailing colon. The bit-field width must be positive and must be of a number of bits not larger than the width of the corresponding integer base type. Bit-fields larger than 64 bits may not be declared in D. D bit-fields provide compatibility with and access to the corresponding ANSI-C capability. Bit-fields are typically used in situations when memory storage is at a premium or when a struct layout must match a hardware register layout.

A bit-field is a compiler construct that automates the layout of an integer and a set of masks to extract the member values. The same result can be achieved by simply defining the masks yourself and using the & operator. C and D compilers try to pack bits as efficiently as possible, but they are free to do so in any order or fashion they desire, so bit-fields are not guaranteed to produce identical bit layouts across differing compilers or architectures. If you require stable bit layout, you should construct the bit masks yourself and extract the values using the & operator.

A bit-field member is accessed by simply specifying its name in combination with the “.” or -> operators like any other struct or union member. The bit-field is automatically promoted to the next largest integer type for use in any expressions. Because bit-field storage may not be aligned on a byte boundary or be a round number of bytes in size, you may not apply the sizeof or offsetof operators to a bit-field member. The D compiler also prohibits you from taking the address of a bit-field member using the & operator.