Go to main content

Oracle® Solaris 11.4 DTrace (Dynamic Tracing) Guide

Exit Print View

Updated: September 2020
 
 

Structs and Unions in DTrace

Collections of related variables can be grouped together into composite data objects called structs and unions. You can define these objects in D by creating new type definitions for them. You can use your new types for any D variables, including associative array values. This section explores the syntax and semantics for creating and manipulating these composite types and the D operators that interact with them. The syntax for structs and unions is illustrated using several example programs that demonstrate the use of the DTrace function boundary tracing (fbt) and pid providers.

Structs in DTrace

The D keyword struct, short for structure, is used to introduce a new type composed of a group of other types. The struct type can be used as the type for D variables and arrays, enabling you to define groups of related variables under a single name. D structs are the same as the corresponding construct in C and C++. If you have programmed in the Java programming language, think of a D struct as a class, but one with data members only and no methods.

Suppose you want to create a more sophisticated system call tracing program in D that records a number of things about each read and write system call executed by your shell, such as the elapsed time, number of calls, and the largest byte count passed as an argument. You could write a D clause to record these properties in three separate associative arrays as shown in the following example:

int maxbytes;    /* declare maxbytes */

syscall::read:entry, syscall::write:entry
/pid == 12345/
{
        ts[probefunc] = timestamp;
        calls[probefunc]++;
        maxbytes[probefunc] = arg2 > maxbytes[probefunc] ?
            arg2 : maxbytes[probefunc];
}

However, this clause is inefficient because DTrace must create three separate associative arrays and store separate copies of the identical tuple values corresponding to probefunc for each one. Use a struct type to conserve space and make the program easier to read and maintain. For example, declare a struct type at the top of the program source file as follows:

struct callinfo {
        uint64_t ts;      /* timestamp of last syscall entry */
        uint64_t elapsed; /* total elapsed time in nanoseconds */
        uint64_t calls;   /* number of calls made */
        size_t maxbytes;  /* maximum byte count argument */
};

The struct keyword is followed by an optional identifier used to refer back to the new type, which is now known as struct callinfo. The struct members are then enclosed in a set of braces {} and the entire declaration is terminated by a semicolon (;). Each struct member is defined using the same syntax as a D variable declaration, with the type of the member listed first followed by an identifier naming the member and another semicolon (;).

The struct declaration itself simply defines the new type; it does not create any variables or allocate any storage in DTrace. Once declared, you can use struct callinfo as a type throughout the remainder of your D program, and each variable of type struct callinfo will store a copy of the four variables described by the structure template. The members will be arranged in memory in order according to the member list, with padding space introduced between members as required for data object alignment purposes.

You can use the member identifier names to access the individual member values using the "." operator by writing an expression of the form:

variable-name.member-name 

The following example is an improved program using the new structure type.

Example 9  Gathering read and write System Call Statistics

In a text editor type the following program and save as rwinfo.d.

struct callinfo {
        uint64_t ts;      /* timestamp of last syscall entry */
        uint64_t elapsed; /* total elapsed time in nanoseconds */
        uint64_t calls;   /* number of calls made */
        size_t maxbytes;  /* maximum byte count argument */
};

struct callinfo i[string];      /* declare i as an associative array */

syscall::read:entry, syscall::write:entry
/pid == $1/
{
        i[probefunc].ts = timestamp;
        i[probefunc].calls++;
        i[probefunc].maxbytes = arg2 > i[probefunc].maxbytes ?
                arg2 : i[probefunc].maxbytes;
}

syscall::read:return, syscall::write:return
/i[probefunc].ts != 0 && pid == $1/
{
        i[probefunc].elapsed += timestamp - i[probefunc].ts;
}

END
{
        printf("        calls  max bytes  elapsed nsecs\n");
        printf("------  -----  ---------  -------------\n");
        printf("  read  %5d  %9d  %d\n",
            i["read"].calls, i["read"].maxbytes, i["read"].elapsed);
        printf(" write  %5d  %9d  %d\n",
            i["write"].calls, i["write"].maxbytes, i["write"].elapsed);
}

After you type in the program, run dtrace -q -s rwinfo.d, specifying one of your shell processes. Then go type in a few commands in your shell and, when you're done entering your shell commands, type Control-C in the dtrace terminal to fire the END probe and print the results:

# dtrace -q -s rwinfo.d `pgrep -n 

bash`
^C
        calls  max bytes  elapsed nsecs
------  -----  ---------  -------------
  read     36       1024  3588283144
 write     35         59  14945541

Pointers to Structs

Referring to structs using pointers is very common in C and D. You can use the operator -> to access struct members through a pointer. If a struct s has a member m and you have a pointer to this struct named sp (that is, sp is a variable of type struct s *), you can either use the * operator to first dereference sp pointer in order to access the member:

struct s *sp;
(*sp).m

or you can use the -> operator as a shorthand for this notation. The following two D fragments are equivalent in meaning if sp is a pointer to a struct:

(*sp).m                         sp->m

DTrace provides several built-in variables which are pointers to structs, including curpsinfo and curlwpsinfo. These pointers refer to the structs psinfo and lwpsinfo respectively, and their content provides a snapshot of information about the state of the current process and lightweight process (LWP) associated with the thread that has fired the current probe. An Oracle Solaris LWP is the kernel's representation of a user thread, upon which the Oracle Solaris threads and POSIX threads interfaces are built. For convenience, DTrace exports this information in the same form as the /proc filesystem files /proc/pid/psinfo and /proc/pid/lwps/lwpid/lwpsinfo. The /proc structures are defined in the system header file sys/procfs.h and are used by observability and debugging tools such as ps, pgrep, and truss. For more information, see ps(1), pgrep(1), truss(1), and proc(5) man pages. The following table lists example expressions using curpsinfo, their types, and their meanings:

Table 14  curpsinfo in Expressions
Expression
Type
Description
curpsinfo->pr_pid
pid_t
Current process ID
curpsinfo->pr_fname
char []
Executable file name
curpsinfo->pr_psargs
char []
Initial command line arguments

You should review the complete structure definition later by examining the sys/procfs.h header file. For more information, see proc(5). The next example uses the pr_psargs member to identify a process of interest by matching command-line arguments.

Structs are used frequently to create complex data structures in C programs, so the ability to describe and reference structs from D also provides a powerful capability for observing the inner workings of the Oracle Solaris operating system kernel and its system interfaces. In addition to using the aforementioned curpsinfo struct, the next example examines some kernel structs as well by observing the relationship between the ksyms driver and read() requests. For more information, see the ksyms(4D) man page. The driver makes use of two common structs, known as uio and iovec and, to respond to requests to read from the character device file /dev/ksyms. For more information, see the uio(9S) and iovec(9S) man page.

The uio struct, accessed using the name struct uio or type alias uio_t, is described in the uio man page and is used to describe an I/O request that involves copying data between the kernel and a user process. The uio in turn contains an array of one or more iovec structures which each describe a piece of the requested I/O, in the event that multiple chunks are requested using the readv or writev system calls. For more information, see the readv(2) and writev(2) man page. One of the kernel device driver interface (DDI) routines that operates on struct uio is the function uiomove(), which is one of a family of functions kernel drivers use to respond to user process read requests and copy data back to user processes.

The ksyms driver manages a character device file named /dev/ksyms, which appears to be an ELF file containing information about the kernel's symbol table, but is in fact an illusion created by the driver using the set of modules that are currently loaded into the kernel. The driver uses the uiomove routine to respond to read requests. The next example illustrates that the arguments and calls to read from /dev/ksyms match the calls by the driver to uiomove to copy the results back into the user address space at the location specified to read. For more information, see the uiomove(9F) man page.

Use the strings utility with the –a option to force a bunch of reads from /dev/ksyms. Try running strings -a /dev/ksyms in your shell and see what output it produces. For more information, see the strings(1) man page.

In an editor, type in the first clause of the example script and save it in a file named ksyms.d:

syscall::read:entry
/curpsinfo->pr_psargs == "strings -a /dev/ksyms"/
{
        printf("read %u bytes to user address %x\n", arg2, arg1);
}

This first clause uses the expression curpsinfo->pr_psargs to access and match the command-line arguments of the strings command so that the script selects the correct read requests before tracing the arguments. Notice that by using operator == with a left argument that is an array of char and a right argument that is a string, the D compiler infers that the left argument should be promoted to a string and a string comparison should be performed. Type in and execute the command dtrace -q -s ksyms.d in one shell, and then type in the command strings -a /dev/ksyms in another shell. As strings executes, you will see output from DTrace similar to the following example:

# dtrace -q -s ksyms.d
read 8192 bytes to user address 80639fc
read 8192 bytes to user address 80639fc
read 8192 bytes to user address 80639fc
read 8192 bytes to user address 80639fc
...
^C
#

This example can be extended using a common D programming technique to follow a thread from this initial read request deeper into the kernel. Upon entry to the kernel in syscall::read:entry, the next script sets a thread-local flag variable indicating this thread is of interest, and clears this flag on syscall::read:return. Once the flag is set, it can be used as a predicate on other probes to instrument kernel functions such as uiomove(). The DTrace function boundary tracing (fbt) provider publishes probes for entry and return to functions defined within the kernel, including those in the DDI. Type in the following source code which uses the fbt provider to instrument uiomove() and again save it in the file ksyms.d:

Example 10  Tracing the read and uiomove() Relationship
/*
 * When strings(1) invocation starts a read(2), set a watched flag on
 * the current thread.  When the read(2) finishes, clear the watched flag.
 */
syscall::read:entry
/curpsinfo->pr_psargs == "strings -a /dev/ksyms"/
{
        printf("read %u bytes to user address %x\n", arg2, arg1);
        self->watched = 1;
}

syscall::read:return
/self->watched/
{
        self->watched = 0;
}

/*
 * Instrument uiomove(9F).  The prototype for this function is as follows:
 * int uiomove(caddr_t addr, size_t nbytes, enum uio_rw rwflag, uio_t *uio);
 */
fbt::uiomove:entry
/self->watched/
{
        this->iov = args[3]->uio_iov;
        printf("uiomove %u bytes to %p in pid %d\n",
            this->iov->iov_len, this->iov->iov_base, pid);
}

The final clause of the example uses the thread-local variable self->watched to identify when a kernel thread of interest enters the DDI routine uiomove. Once there, the script uses the built-in args array to access the fourth argument (args[3]) to uiomove, which is a pointer to the struct uio representing the request. The D compiler automatically associates each member of the args array with the type corresponding to the C function prototype for the instrumented kernel routine. The uio_iov member contains a pointer to the struct iovec for the request. A copy of this pointer is saved for use in a clause in the clause-local variable this->iov. In the final statement, the script dereferences this->iov to access the iovec members iov_len and iov_base, which represent the length in bytes and destination base address for uiomove, respectively. These values should match the input parameters to the read system call issued on the driver. For more information, see the read(2) and uiomove(9F) man pages. Go to your shell and run dtrace -q -s ksyms.d and then again enter the command strings -a /dev/ksyms in another shell. You should see output similar to the following example:

# dtrace -q -s ksyms.d
read 8192 bytes at user address 80639fc
uiomove 8192 bytes to 80639fc in pid 101038
read 8192 bytes at user address 80639fc
uiomove 8192 bytes to 80639fc in pid 101038
read 8192 bytes at user address 80639fc
uiomove 8192 bytes to 80639fc in pid 101038
read 8192 bytes at user address 80639fc
uiomove 8192 bytes to 80639fc in pid 101038
...
^C
#

The addresses and process IDs will be different in your output, but you should observe that the input arguments to read match the parameters passed to uiomove by the ksyms driver.

Union Types in DTrace

Unions are another kind of composite type supported by ANSI-C and D, and are closely related to structs. A union is a composite type where a set of members of different types are defined and the member objects all occupy the same region of storage. A union is therefore an object of variant type, where only one member is valid at any given time, depending on how the union has been assigned. Typically, some other variable or piece of state is used to indicate which union member is currently valid. The size of a union is the size of its largest member, and the memory alignment used for the union is the maximum alignment required by the union members.

The Oracle Solaris kstat framework defines a struct containing a union that is used in the following example to illustrate and observe C and D unions. The kstat framework is used to export a set of named counters representing kernel statistics such as memory usage and I/O throughput. The framework is used to implement utilities such as mpstat and iostat. This framework uses struct kstat_named to represent a named counter and its value and is defined as follows:

struct kstat_named {
        char name[KSTAT_STRLEN]; /* name of counter */
        uchar_t data_type;      /* data type */
        union {
                char c[16];
                int32_t i32;
                uint32_t ui32;
                long l;
                ulong_t ul;
                ...
        } value;        /* value of counter */
};

The examined declaration is shortened for illustrative purposes. The complete structure definition can be found in the <sys/kstat.h> header file and is described in kstat_named man page. The preceding declaration is valid in both ANSI-C and D, and defines a struct containing as one of its members a union value with members of various types, depending on the type of the counter. Notice that since the union itself is declared inside of another type, struct kstat_named, a formal name for the union type is omitted. This declaration style is known as an anonymous union. The member named value is of a union type described by the preceding declaration, but this union type itself has no name because it does not need to be used anywhere else. The struct member data_type is assigned a value that indicates which union member is valid for each object of type struct kstat2_named. A set of C preprocessor tokens are defined for the values of data_type. For example, the token KSTAT_DATA_CHAR is equal to zero and indicates that the member value.c is where the value is currently stored. For more information, see the kstat2_named(9S) man page.

The kstat counters can be sampled from a user process using the kstat_data_lookup() function, which returns a pointer to a struct kstat_named. For more information, see the kstat_lookup(3KSTAT) man page. The mpstat utility calls this function repeatedly as it executes in order to sample the latest counter values. Go to your shell and try running mpstat 1 and observe the output. Press Control-C in your shell to abort mpstat after a few seconds. To observe counter sampling, enable a probe that fires each time the mpstat command calls the kstat_data_lookup() function in libkstat. To do so, make use of a new DTrace provider: pid. The pid provider enables you to dynamically create probes in user processes at C symbol locations such as function entry points. You can ask the pid provider to create a probe at a user function entry and return sites by writing probe descriptions of the form:

pidprocess-ID:object-name:function-name:entry
pidprocess-ID:object-name:function-name:return

For example, if you wanted to create a probe in process ID 12345 that fires on entry to kstat_data_lookup, you would write the following probe description:

pid12345:libkstat:kstat_data_lookup:entry

The pid provider inserts dynamic instrumentation into the specified user process at the program location corresponding to the probe description. The probe implementation forces each user thread that reaches the instrumented program location to trap into the operating system kernel and enter DTrace, firing the corresponding probe. So although the instrumentation location is associated with a user process, the DTrace predicates and actions you specify still execute in the context of the operating system kernel. The pid provider is described in further detail in pid Provider.

To apply your D program to different processes, use macro variables. Macro variables are evaluated at compile time and are replaced with additional dtrace command-line arguments. Macro variables are specified using a dollar sign $ followed by an identifier or digit. If you execute the command dtrace -s script foo bar baz, the D compiler will automatically define the macro variables $1, $2, and $3 to be the tokens foo, bar, and baz respectively. You can use macro variables in D program expressions or in probe descriptions.

For more information about macro variables and reusable scripts, see Scripting in DTrace. Now that you know how to instrument user processes using their process ID, return to sampling unions.

Example 11  Tracing Calls to kstat_data_lookup

Type the following source code in a text editor and save it as kstat.d:

pid$1:libkstat:kstat_data_lookup:entry
{
        self->ksname = arg1;
}

pid$1:libkstat:kstat_data_lookup:return
/self->ksname != NULL && arg1 != NULL/
{
        this->ksp = (kstat_named_t *) copyin(arg1, sizeof (kstat_named_t));
        printf("%s has ui64 value %u\n",
            copyinstr(self->ksname), this->ksp->value.ui64);
}

pid$1:libkstat:kstat_data_lookup:return
/self->ksname != NULL && arg1 == NULL/
{
        self->ksname = NULL;
}

Now go to one of your shells and execute the command zonestat to start zonestat running in a mode where it samples statistics and reports them once per second. Once zonestat is running, execute the command dtrace -q -s kstat.d `pgrep zonestatd` in your other shell. You will see output corresponding to the statistics that are being accessed. Press Control-C to abort dtrace and return to the shell prompt.

# dtrace -q -s kstat.d `pgrep 

zonestatd`
hat_fault has ui64 value 0
as_fault has ui64 value 48053
maj_fault has ui64 value 1144
xcalls has ui64 value 123832170
intr has ui64 value 165264090
intrthread has ui64 value 124094974
pswitch has ui64 value 840625
inv_swtch has ui64 value 1484
cpumigrate has ui64 value 36284
mutex_adenters has ui64 value 35574
rw_rdfails has ui64 value 2
rw_wrfails has ui64 value 2
...
^C
#

If you capture the output in each terminal window and subtract each value from the value reported by the previous iteration through the statistics, you should be able to correlate the dtrace output with the mpstat output. The example program records the counter name pointer on entry to the lookup function, and then performs most of the tracing work on return from kstat_data_lookup. The D built-in functions copyinstr() and copyin() copy the function results from the user process back into DTrace when arg1 (the return value) is not NULL. Once the kstat data has been copied, the example reports the ui64 counter value from the union. This simplified example assumes that mpstat samples counters that use the value.ui64 member. As an exercise, try recoding kstat.d to use multiple predicates and print out the union member corresponding to the data_type member. You can also try to create a version of kstat.d that computes the difference between successive data values and actually produces output similar to mpstat.

Member Sizes and Offsets

You can determine the size in bytes of any D type or expression, including a struct or union, using the sizeof operator. The sizeof operator can be applied either to an expression or to the name of a type surrounded by parentheses, as illustrated by the following two examples:

sizeof expression sizeof (type-name)

For example, the expression sizeof (uint64_t) would return the value 8, and the expression sizeof (callinfo.ts) would also return 8 if inserted into the source code of the preceding program. The formal return type of the sizeof operator is the type alias size_t, which is defined to be an unsigned integer of the same size as a pointer in the current data model, and is used to represent byte counts. When the sizeof operator is applied to an expression, the expression is validated by the D compiler but the resulting object size is computed at compile time and no code for the expression is generated. You can use sizeof anywhere an integer constant is required.

You can use the companion operator offsetof to determine the offset in bytes of a struct or union member from the start of the storage associated with any object of the struct or union type. The offsetof operator is used in an expression of the following form:

offsetof (type-name, member-name)

Here type-name is the name of any struct or union type or type alias, and member-name is the identifier naming a member of that struct or union. Similar to sizeof, offsetof returns a size_t and can be used anywhere in a D program that an integer constant can be used.

Bit Fields

D also permits the definition of integer struct and union members of arbitrary numbers of bits, known as bit fields. A bit field is declared by specifying a signed or unsigned integer base type, a member name, and a suffix indicating the number of bits to be assigned for the field, as shown in the following example:

struct s {
        int a : 1;
        int b : 3;
        int c : 12;
};

The bit field width is an integer constant separated from the member name by a trailing colon. The bit field width must be positive and must be of a number of bits not larger than the width of the corresponding integer base type. Bit fields larger than 64 bits may not be declared in D. D bit fields provide compatibility with and access to the corresponding ANSI-C capability. Bit fields are typically used in situations when memory storage is at a premium or when a struct layout must match a hardware register layout.

A bit field is a compiler construct that automates the layout of an integer and a set of masks to extract the member values. The same result can be achieved by simply defining the masks yourself and using the & operator. C and D compliers try to pack bits efficiently The compliers do not follow any order while packing the bits. Therefore, bit fields are not guaranteed to produce identical bit layouts across differing compilers or architectures. If you require stable bit layout, you should construct the bit masks yourself and extract the values using the & operator.

A bit field member is accessed by simply specifying its name in combination with the "." or -> operators like any other struct or union member. The bit field is automatically promoted to the next largest integer type for use in any expressions. Because bit field storage may not be aligned on a byte boundary or be a round number of bytes in size, you may not apply the sizeof or offsetof operators to a bit field member. The D compiler also prohibits you from taking the address of a bit field member using the & operator.