dtrace_aggregate_walk_valvarrevsorted() Function

`dtrace_aggregate_walk_valvarrevsorted`() Function

This function walks aggregations in reverse order sorted by value and then by aggregation variable ID. This function displays an output similar to the following:

pollsys                                         120515277
portfs                                               2583
pset                                                 1911
p_online                                             1051
pollsys                                        4159836122
p_online                                             9685
portfs                                               6948
pset                                                 3369
pollsys                                              7161
portfs                                               1668
pset                                                 1165
p_online                                              968

These functions can be used in two ways. First, they can be passed as the third argument to the dtrace_aggregate_print() function to control how the data from aggregations prints. For example, you can replace the call to the dtrace_aggregate_print() function with the following code to print the data by using the dtrace_aggregate_walk_keysorted() function:

if (dtrace_aggregate_print(g_dtp, stdout,
    dtrace_aggregate_walk_keysorted) == -1)
        fatal("failed to print aggregation");

A second use is to call the aggregation functions directly and specify an alternate function to handle each aggregation record. This method is useful if you want custom output for the data or if you want to do something other than output the data.

The DTrace stddev() aggregating action calculates the standard deviation over a set of samples but because of the limitations of DTrace, the values are given as integers. In some cases, this level of precision might be insufficient. For example, consider the following D program:

BEGIN
{ 
          @c["foo"] = stddev(1); 
          @c["foo"] = stddev(2);
          @c["foo"] = stddev(3);
          @c["foo"] = stddev(4);
          @c["foo"] = stddev(5);
          @c["bar"] = stddev(6);
          @c["bar"] = stddev(8);
          @c["bar"] = stddev(10);
          @c["bar"] = stddev(12);
          @c["bar"] = stddev(14);
          @c["baz"] = stddev(17); 
          @c["baz"] = stddev(20);
          @c["baz"] = stddev(23);
          @c["baz"] = stddev(26);  
          @c["baz"] = stddev(29);
          exit(0);
}

The default dtrace output for this program would appear as follows:

foo                    1
bar                    2
baz                    4

The values in the output are rounded to the closest integer value. However, the actual values are 1.414, 2.828, and 4.243. To get better approximation of the correct values, you can write a custom DTrace consumer that uses aggregation to process the raw data.

Aggregation stores a small amount of data in the aggregation buffer, and this data is copied from the kernel periodically by the consumer. Some aggregations, such as count(), min(), max(), and sum(), store a single value. For example, count() stores a running count, and sum() stores a running sum. No further processing of this data is needed before the output is displayed. The avg() aggregation stores two values, a count of the number of data points and the sum of those data points. Before displaying the output, the final sum is divided by the final count to yield the average. Similarly, stddev() stores three values: the count, the sum, and the sum of the squares of the values. The standard deviation is computed from these values. For information, see DTrace Buffers and Buffering.

If you know how the stddev() aggregation is implemented, you can implement a function to extract the raw aggregation data and use it to calculate the standard deviation with more precision. Because the data is stored by the stddev() aggregation is a superset of the data stored for the avg() aggregation, you can also report the count of data points and their average.

Example 20-2 Using the walk() Function

This example shows the walk() function.

static int
walk(const dtrace_aggdata_t *data, void *arg)
{
        dtrace_aggdesc_t *aggdesc = data->dtada_desc;
        dtrace_recdesc_t *namerec, *datarec;
        char *name;
        uint64_t count, sum, sumsquares;  
        double avg, avgsquares, stddev;
        int i; 
        namerec = &aggdesc->dtagd_rec[1]; 
        name = data->dtada_data + namerec->dtrd_offset;
        datarec = &aggdesc->dtagd_rec[2];
        count = *((uint64_t *)(data->dtada_data + datarec->dtrd_offset));
        sum = *((uint64_t *)(data->dtada_data + datarec->dtrd_offset) + 1);
        sumsquares = *((uint64_t *)(data->dtada_data + datarec->dtrd_offset)+ 2); 
        avg = (double)sum / count;
        avgsquares = (double)sumsquares / count;
        stddev = sqrt (avgsquares - avg * avg); 
        printf("%10s %10lu %11.3f %11.3f\n", name, count, avg, stddev);
        return (DTRACE_AGGWALK_NEXT);
}

The walk() function is passed as an argument to the dtrace_aggregate_walk_keysorted() function, as shown in the following example:

printf("%10s %10s %11s %11s\n", "NAME", "COUNT", "AVG", "STDDEV");

if (dtrace_aggregate_walk_keysorted(g_dtp, walk, NULL) == -1)
        fatal("aggregation walk failed");

When the D program is run, the consumer generates the following output:

NAME           COUNT           AVG            STDDEV
bar                5        10.000             2.828
baz                5        23.000             4.243
foo                5         3.000             1.414

The walk() function uses dtrace_aggdata_t, dtrace_aggdesc_t, and dtrace_recdesc_t data structures. The function passes a pointer to the dtrace_aggdata_t structure, which describes the data for a single entry in an aggregation.

struct dtrace_aggdata {
      dtrace_hdl_t *dtada_handle;                         /* handle to DTrace library */
      dtrace_aggdesc_t *dtada_desc;                       /* aggregation description */ 
      dtrace_eprobedesc_t *dtada_edesc;                   /* enabled probe description */
      dtrace_probedesc_t *dtada_pdesc;                    /* probe description */ 
      caddr_t dtada_data;                                 /* pointer to raw data */
      uint64_t dtada_normal;                              /* the normal -- 1 for denorm */
      size_t dtada_size;                                  /* total size of the data */
      caddr_t dtada_delta;                                /* delta data, if available */
      caddr_t *dtada_percpu;                              /* per CPU data, if avail */
      caddr_t *dtada_percpu_delta;                        /* per CPU delta, if avail */
};

Example 20-2 uses of two of the fields. It first pulls the aggregation description out of the structure by using the dtada_desc member. It later accesses the raw data stored in the aggregation by using the dtada_data member.

The aggregation description is contained in the dtrace_aggdesc_t data structure.

typedef struct dtrace_aggdesc {
        DTRACE_PTR(char, dtagd_name);           /* not filled in by kernel */
        dtrace_aggvarid_t dtagd_varid;          /* not filled in by kernel */
        int dtagd_flags;                        /* not filled in by kernel */
        dtrace_aggid_t dtagd_id;                /* aggregation ID */
        dtrace_epid_t dtagd_epid;               /* enabled probe ID */
        uint32_t dtagd_size;                    /* size in bytes */
        int dtagd_nrecs;                        /* number of records */
        uint32_t dtagd_pad;                     /* explicit padding */
        dtrace_recdesc_t dtagd_rec[1];          /* record descriptions */
} dtrace_aggdesc_t;

Example 20-2 uses only the dtagd_rec array. This is an array of descriptions of the records for this entry. You can use these record descriptions to access the name and the data associated with this entry. Though only a single entry is statically-allocated for the dtagd_rec array, the array dynamically allocates to contain dtagd_nrecs entries.

The record descriptions are contained in the dtrace_recdesc_t data structure:

typedef struct dtrace_recdesc {
        dtrace_actkind_t dtrd_action;                 /* kind of action */
        uint32_t dtrd_size;                           /* size of record */
        uint32_t dtrd_offset;                         /* offset in ECB's data */
        uint16_t dtrd_alignment;                      /* required alignment */
        uint16_t dtrd_format;                         /* format, if any */
        uint64_t dtrd_arg;                            /* action argument */ 
        uint64_t dtrd_uarg;                           /* user argument */
} dtrace_recdesc_t;

Example 20-2 uses only the dtrd_offset member. The consumer deals with the single stddev() action type and you do not require any of the other members. For example, if you want to include min() and max() aggregation data, you must examine the dtrd_action member to determine which data is contained in the current aggregation entry.

The definitions of these data structures are in the /usr/include/dtrace.h and /usr/include/sys/dtrace.h files. The following diagram shows the interaction between these data structures in this consumer.

Interactions Between the Data Structures

This figure shows that sumsquares is stored in two pieces. To avoid overflow, the stddev() aggregation stores and operates on the value as a 128-bit value. Note that the Example 20-2 is only an approximation to how the stddev() aggregation works, because it assumes that the sum of the squares never exceeds the maximum 64-bit value.