14 Statically Defined Tracing of Kernel Modules

DTrace provides a facility for developers to define customized probes in kernel modules. These static probes appear as additional probes of the sdt provider and impose little to no overhead if the sdt module is not loaded. For example, for x86_64, the overhead is a single-byte NOP, followed by a 4-byte NOP. This chapter provides a full example of how to define and use static probes in a kernel module.

The general principles for naming probes and choosing insertion points are the same for kernel modules as they are for user-space applications. You should define probes that have a semantic meaning that is readily understood by your DTrace user community. Typically, you might name probes for the routine in which you place them and their position in that routine. For example, if your probes provide information about data values on entry to or return from a routine named foo, you might name them foo-entry and foo-return. The data values that are returned by such probes could present the routine as a black box, rather than return intermediate values from the internal implementation of the module. To gather data from deeper within a module, you might insert additional probes with names such as foo-stage1 or foo-post-hardware-init.

In one respect, using static probes with kernel modules can be simpler than for user-space applications. You do not need to modify the build files unless you want to conditionally compile a module to include the probes. Inserting the probes in the source code is slightly more complex, as you cannot use the dtrace -h command to generate the probe macros. However, using a DTRACE_PROBE macro to insert a probe is a relatively simple change to make to the source code.

You can insert sdt static probes in any Oracle Linux kernel module for which you have the source files and the necessary build infrastructure, but note that DTrace supports statically defined tracing of 64-bit kernel modules only.

For more information about the sdt provider, see sdt provider.

For an introduction to the concepts of statically defined tracing as applied to user-space applications, see Statically Defined Tracing of User Applications.

Inserting Static Probe Points

You can embed static probes within the source code for which you want to capture the current state of a module and its data.

The following example pseudo character device driver consists of three source files:

revdev.h

Is the header file for the module.

rev_mod.c

Defines the module's properties and its init and exit routines.

rev_dev.c

Defines the driver's open, read, release, unlocked_ioctl, and write routines. The static probes are inserted in the read, unlocked_ioctl, and write routines, although probes could also be inserted in the other routines, if required.

revdev.h Example

The module header file revdev.h must be prepared, as indicated in bold font in the following example, by adding lines to include linux/sdt.h and to define probe macros.

#include <asm/uaccess.h>
#include <linux/cdev.h>
#include <linux/fs.h>
#include <linux/kernel.h>
#include <linux/miscdevice.h>
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/types.h>
#include <linux/sdt.h>

#define DEVICE "revdev"

#define REVDEV_IOCTL_ENTRY_PROBE(name, file, cmd, arg) \
         DTRACE_PROBE3(ioctl__##name, struct file *, file, \
                       unsigned int, cmd, unsigned long, arg)
#define REVDEV_IOCTL_RETURN_PROBE(name, str) \
        DTRACE_PROBE1(ioctl__##name, struct char *, str)
#define REVDEV_READ_ENTRY_PROBE(name,fp,buf) \
        DTRACE_PROBE2(read__##name, file *, fp, char *, buf)
#define REVDEV_READ_RETURN_PROBE(name,buf,n) \
        DTRACE_PROBE2(read__##name, char *, buf, size_t, n)
#define REVDEV_WRITE_ENTRY_PROBE(name,fp,buf,n) \
        DTRACE_PROBE3(write__##name, file *, fp, char *, buf, size_t, n)
#define REVDEV_WRITE_RETURN_PROBE(name,buf,n) \
        DTRACE_PROBE2(write__##name, char *, buf, size_t, n)

The DTRACE_PROBE macros that are defined in /lib/modules/`uname -r`/build/include/linux/sdt.h support from zero to eight arguments.

You can define your own macros for the inserted probes, as shown in the preceding example. Unlike user-space static probes, you cannot use the dtrace -h command to create a header file that includes suitable probe definitions. You do not need to create a provider definition file for the probes.

The probes are named according to the first argument of the DTRACE_PROBE macro. The suffix N in the macro name DTRACE_PROBEN refers the number of arguments that are passed to the probe. The first argument to the probe macro is the probe name. As described in Declaring Probes, two consecutive underscores are converted to a single dash. The remaining macro arguments are pairs of arguments that define the DTrace argn variables that are assigned when the probe fires. Each pair of arguments defines the variable type and a variable name, for example:

#define REVDEV_WRITE_ENTRY_PROBE(name, fp, buf, n) \
        DTRACE_PROBE3(write__##name, file *, fp, char *, buf, size_t, n)

The values of fp, buf, and n are made available by the arg0, arg1, and arg2 variables in DTrace when the probe fires.

The provider, module, and function elements of the complete probe are named for sdt, the driver module name (without the .ko), and the driver routine.

The probes inherit the stability attributes of the sdt provider.

rev_mod.c Example

No changes are made in the following example, which does not insert any probes in the module's init and exit routines. Note that there is no restriction on inserting probes in these routines.

#include "revdev.h"

MODULE_AUTHOR("DTrace Example");
MODULE_DESCRIPTION("Using DTrace SDT probes with a device driver");
MODULE_VERSION("v1.0");
MODULE_LICENSE("GPL");

extern const struct file_operations revdev_fops;

static struct miscdevice revdev = {
    .minor = 0,
    .name = DEVICE,
    .fops = &revdev_fops,
};

DEFINE_MUTEX(revdev_mutex);

static int revdev_entry(void){ /* Register device */
    int retval;
    retval = misc_register(&revdev);
    if (retval < 0) {
        printk(KERN_ERR "revdev: Could not register device");
        return retval;
    }
    mutex_init(&revdev_mutex);
    return 0;
}

static void revdev_exit(void){
    misc_deregister(&revdev);
}

/* Define module init and exit calls */
module_init(revdev_entry);
module_exit(revdev_exit);

rev_dev.c Example

No existing lines of code are modified in this example. Only line insertions are required for the entry and return probes in each of the read, unlocked_ioctl, and write routines.

The changes in this example appear in bold font.

#include "revdev.h"

static struct device_buffer {
    char data[80];
} devbuf;

static char *oddeven[] = { "Even", "Odd" };

extern struct mutex revdev_mutex;

static long revdev_ioctl(struct file *file, unsigned int cmd,
                         unsigned long arg) {
    char *cp;
    REVDEV_IOCTL_ENTRY_PROBE(entry, file, cmd, arg);
    cp = oddeven[arg%2];
    REVDEV_IOCTL_RETURN_PROBE(return, cp);
    return -EAGAIN;
}

static int revdev_open(struct inode *inode, struct file *fp){
    if (!mutex_trylock(&revdev_mutex)){
        printk(KERN_INFO "revdev: Device already in use");
        return -EBUSY;
    }
    return 0;
}

static void revstr(char *s) { /* After Kernighan and Ritchie */
    int i, j, t;
    for (i = 0, j = strlen(s)-1; i < j; i++, j--)
        t = s[i], s[i] = s[j], s[j] = t;
}

static ssize_t revdev_read(struct file *fp, char* buf, size_t n, loff_t *o){
    int retval;
    REVDEV_READ_ENTRY_PROBE(entry, fp, devbuf.data);
    revstr(devbuf.data);
    n = strlen(devbuf.data);
    retval = copy_to_user(buf, devbuf.data, n);
    REVDEV_READ_RETURN_PROBE(return, buf, n);
    if (retval != 0) return -EINVAL;
    return 0;
}

static ssize_t revdev_write(struct file *fp, const char* buf, size_t n, loff_t *o){
    int retval;
    REVDEV_WRITE_ENTRY_PROBE(entry, fp, buf, n);
    retval = copy_from_user(devbuf.data, buf, n);
    devbuf.data[n-retval] = '\0';
    REVDEV_WRITE_RETURN_PROBE(return, devbuf.data, n);
    if (retval != 0) return -EINVAL;
    return 0;
}

static int revdev_close(struct inode *inode, struct file *fp){
    mutex_unlock(&revdev_mutex);
    return 0;
}

const struct file_operations revdev_fops = {
    .owner = THIS_MODULE,
    .read = revdev_read,
    .write = revdev_write,
    .unlocked_ioctl = revdev_ioctl,
    .open = revdev_open,
    .release = revdev_close,
};

Building Modules With Static Probes

Note:

The following example requires that you link the module against a UEK version that supports the DTrace modules, which can be either UEK R5 or UEK R4 for Oracle Linux 7 or UEK R4 for Oracle Linux 6.

A bug in the current implementation means that a module containing SDT probes must be built from two or more source files.

The following Kbuild and Makefile are used to build the example pseudo driver module revdev.ko and a test program named testrevdev.

Kbuild Example

bj-m            += revdev.o

revdev-y        := rev_dev.o rev_mod.o

Makefile Example

Note:

All of the command lines in the Makefile, such as those beginning with gcc in the following example, must start with tabs.

KERNEL_DIR = /lib/modules/`uname -r`/build

modules:: testrevdev

install:: modules_install

testrevdev: testrevdev.c
  gcc -o testrevdev testrevdev.c

%::
        $(MAKE) -C $(KERNEL_DIR) M=`pwd` $@

The source file for testrevdev is testrevdev.c.

testrevdev.c Example

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define DEVICE_FILE "/dev/revdev"

int main() {
    char buf[81];
    int i, fd, n;

    if ((fd = open(DEVICE_FILE, O_RDWR)) != 0) {
        perror("open");
        exit(1);
    }

    i=0;
    while (1) {
        (i++)%20;
        printf("Write: ");
        scanf(" %80[^\n]", buf);
        n = strlen(buf);
	if (!strncmp(buf, "exit", 4))
            break;
	else if (!strncmp(buf, "ioctl", 5))
            ioctl(fd,128,i);
        else {
            write(fd, buf, n);
            read(fd, buf, n);
            buf[n]='\0';
            printf(" Read: %s\n", buf);
        }
    }

    close(fd);
    exit(0);
}

When run, testrevdev reads a string that you enter, writes the string to the revdev device, and then reads the reversed string from the device.

If the input string begins with ioctl, the program calls ioctl on the open file descriptor, which invokes the device's unlocked_ioctl routine. An input string that begins with exit terminates the program.

To build the module and test program, use the make command:

# make
make -C /lib/modules/`uname -r`/build M=`pwd` modules
make[1]: Entering directory `/usr/src/kernels/4.1.12-version.el6uek.x86_64'
  CC [M]  /root/revdev/rev_dev.o
  CC [M]  /root/revdev/rev_mod.o
  SDTSTB  /root/revdev/revdev.sdtstub.S
  AS [M]  /root/revdev/revdev.sdtstub.o
  LD [M]  /root/revdev/revdev.o
  Building modules, stage 2.
  MODPOST 1 modules
  SDTINF  /root/revdev/revdev.sdtinfo.c
  CC      /root/revdev/revdev.mod.o
  CTF
  LD [M]  /root/revdev/revdev.ko
make[1]: Leaving directory `/usr/src/kernels/4.1.12-version.el6uek.x86_64'

Using DTrace to Test Modules With Static Probes

You can use DTrace to display information when one of the embedded static probes in a module fires.

To test the example module revdev.ko:

  1. Set up a udev rule to create the /dev/revdev device file:

    # echo "KERNEL==\"revdev\", MODE=\"0660\"" > /etc/udev/rules.d/10-revdev.rules
  2. Load the revdev.ko module:

    # insmod revdev.ko

    You can use dtrace to test that the probes are now available:

    # dtrace -l -m revdev
       ID   PROVIDER            MODULE                          FUNCTION NAME
        4        sdt            revdev                      revdev_ioctl ioctl-return
        5        sdt            revdev                      revdev_ioctl ioctl-entry
        6        sdt            revdev                      revdev_write write-return
        7        sdt            revdev                      revdev_write write-entry
        8        sdt            revdev                       revdev_read read-return
        9        sdt            revdev                       revdev_read read-entry
  3. Enter the following DTrace script (traceflow):

    #!/usr/sbin/dtrace -qs
    #pragma D option nspec=10
    
    self int indent;
    
    syscall:::entry
    /execname == "testrevdev"/
    {
      self->specflag = 0;
      self->spec = speculation();
      self->indent += 2;
      speculate(self->spec);
    }
    
    syscall:::entry
    /self->spec/
    {
      speculate(self->spec);
      printf("%*s ", self->indent, "->");
      printf("%s() entry\n",probefunc);
      self->indent += 2;
    }
    
    syscall:::return
    /self->spec/
    {
      speculate(self->spec);
      self->indent -= 2;
      printf("%*s ", self->indent, "<-");
      printf("%s() return\n",probefunc);
    }
    
    syscall:::return
    /self->spec && self->specflag == 0/
    {
      discard(self->spec);
      self->indent -= 2;
      self->spec = 0;
    }
    
    syscall:::return
    /self->spec && self->specflag == 1/
    {
      commit(self->spec);
      self->indent -= 2;
      self->spec = 0;
    }
    
    sdt:revdev::ioctl-entry
    /self->spec/
    {
      speculate(self->spec);
      self->specflag = 1;
      printf("%*s ", self->indent, "=>");
      printf("%s() entry file: %s cmd: %d arg: %d\n",
             probefunc, d_path(&(((struct file *)arg0)->f_path)), arg1, arg2);
    }
    
    sdt:revdev::ioctl-return
    /self->spec/
    {
      speculate(self->spec);
      printf("%*s ", self->indent, "<=");
      printf("%s() return cpstr: %s\n", probefunc, stringof((char*)arg0));
    }
    
    sdt:revdev::read-entry
    /self->spec/
    {
      speculate(self->spec);
      self->specflag = 1;
      printf("%*s ", self->indent, "=>");
      printf("%s() entry file: %s devbuf: %s\n",
             probefunc, d_path(&(((struct file *)arg0)->f_path)),
             stringof((char *)arg1));
    }
    
    sdt:revdev::read-return
    /self->spec/
    {
      speculate(self->spec);
      printf("%*s ", self->indent, "<=");
      printf("%s() return string: %s len: %d\n",
             probefunc, stringof((char *)arg0), arg1);
    }
    
    sdt:revdev::write-entry
    /self->spec/
    {
      speculate(self->spec);
      self->specflag = 1;
      printf("%*s ", self->indent, "=>");
      printf("%s() entry file: %s string: %s len: %d\n",
             probefunc, d_path(&(((struct file *)arg0)->f_path)),
             stringof((char *)arg1), arg2);
    }
    
    sdt:revdev::write-return
    /self->spec/
    {
      speculate(self->spec);
      printf("%*s ", self->indent, "<=");
      printf("%s() return string: %s len: %d\n",
             probefunc, stringof((char *)arg0), arg1);
    }

    When one of the inserted probes fires, traceflow displays information about data values in the module by using the probe argument variables (arg0, arg1, arg2,...).

    Note:

    Argument variables that return pointer types, such as file * and char *, must be explicitly cast.

    The script uses d_path and stringof to create printable file paths and strings. For example, (struct file *)arg0 casts the value of arg0 to a file pointer (struct file *). The f_path member of the struct file contains the path structure (struct path) for a file. As d_path takes a path pointer (struct path *) as its argument, the & operator is used to return a pointer to the struct path.

    See d_path and String Conversion for more information.

  4. Make traceflow executable:

    # chmod +x traceflow
  5. In one window, run traceflow:

    # ./traceflow
  6. In another window, run testrevdev and enter input, for example:

    # ./testrevdev
    Write: hello
     Read: olleh
    Write: world
     Read: dlrow
    Write: ioctl
    Write: ioctl
    Write: exit

    In the window that traceflow is running, you should see output similar to the following , as DTrace responds to the probes in revdev.ko that are firing:

    # ./traceflow 
    -> write() entry
      => revdev_write() entry file: /dev/revdev string: hello len: 5
      <= revdev_write() return string: hello len: 5
    <- write() return
    -> read() entry
      => revdev_read() entry file: /dev/revdev devbuf: hello
      <= revdev_read() return string: olleh len: 5
    <- read() return
    -> write() entry
      => revdev_write() entry file: /dev/revdev string: world len: 5
      <= revdev_write() return string: world len: 5
    <- write() return
    -> read() entry
      => revdev_read() entry file: /dev/revdev devbuf: world
      <= revdev_read() return string: dlrow len: 5
    <- read() return
    -> ioctl() entry
      => revdev_ioctl() entry file: /dev/revdev cmd: 128 arg: 3
      <= revdev_ioctl() return cpstr: Odd
    <- ioctl() return
    -> ioctl() entry
      => revdev_ioctl() entry file: /dev/revdev cmd: 128 arg: 4
      <= revdev_ioctl() return cpstr: Even
    <- ioctl() return