Programming Utilities Guide

Chapter 1 Tracing Program Execution With the TNF Utilities

This chapter explains how to use the Trace Normal Form (TNF) utilities to collect data from your program.

The TNF utilities consist of a library and a group of programs that create, manipulate, and read Trace Normal Form binary files. Use the TNF utilities to:

These utilities are equally useful to both the novice programmer who is tracking events in an executing program, and to the sophisticated programmer who is tracking the interaction among multiple threads and the kernel in a multithreaded program.

Read the first part of this chapter for an overview of the TNF utilities. Detailed information follows the overview. Novice programmers are encouraged to stop reading when they've learned enough to accomplish their tasks. Sophisticated programmers should read the entire chapter.

Defining Types of Users

Aside from the differences between novice and experienced programmers, users can be defined by the way they use the TNF utilities.

Using Existing Probe Points

If you use the probe points that are shipped with your program or library, concentrate your reading on the prex and tnfdump sections. You can skip the instructions for inserting probes.

Debugging a Program

If you are debugging a program, read as far into the chapter as is useful for you. How much you read depends upon how complicated your tracing goals are.

Inserting Probe Points in a Library

If you are inserting probe points in a library that will be shipped, all the sections in this chapter are probably helpful. Also, be sure to give your customers information about your probe points.

Tracing Kernel Execution

A small number of TNF probes in the kernel record system calls, thread state transitions, page faults, swapping, and I/O. For the most part, you use the same procedures to trace the kernel as you use to trace a user-level process.

How TNF Works

The TNF utilities consist of the libtnfprobe library and the prex(1) and tnfdump(1) commands:

You insert commands in your code at points (probe points) where you want to collect data. You start running your executable object with prex, which lets you manipulate the probe points. The information collected is written to a trace file in a format called Trace Normal Form (TNF). tnfdump converts the binary TNF trace file to an ASCII file.

You can insert probe points anywhere in C or C++ code, including .init() sections, .fini() sections, multithreaded code, shared objects, and shared objects opened by dlopen(3X).

You can also connect functions called probe functions to these probe points. Probe functions perform actions based on the information collected. (Currently, the only available probe function is a debug function that writes output to stderr.)

In a typical TNF session, you do the following:

  1. Insert probe points in your programs. (Your source code might already have probe points built into it.)

  2. Start your program with prex (this automatically preloads libtnfprobe). (Alternatively, you can link with libtnfprobe when you start your program. If you do this, you can attach prex to the running program. See "Attaching prex to a Running Process".)

  3. Manipulate the probe points through prex to gather the information that you want from the program.

  4. Convert the binary information trace file produced by the program to an ASCII file with tnfdump. (See "Converting the Binary File to Readable Format".)

  5. Examine the information in the ASCII file.

Inserting Probe Points

The details of inserting probe points are covered fully in "Advanced Topics".

"A Sample C Program" shows the design and placement of some probe points. "A Sample prex Session" and "Reading the Trace File" show how probe points are manipulated and what information they provide.

Starting prex

You can either load your program with prex or attach prex to an already running process. The following two sections explain both procedures.

Starting Your Program With prex

When you start your program with prex, it preloads the libtnfprobe library into the target program, so the program does not have to be explicitly linked with libtnfprobe.

prex starts your program running, then stops the program before any user code is executed. Execution is stopped even before the .init() sections are executed, so you can place probe points in the .init() sections to trace initialization code.

In the simplest case, with an executable named a.out that takes no arguments and that contains probe points, the following command can be used:

$ prex a.out

Note -

(prex uses the $PATH environment variable to find the executable.)

To specify options, use:

prex [-o outfilename] [-s kbytes_size] 
   [-l sharedobjs] cmd [cmdargs...]

The -o, -s, and -l options are explained in "Command Line Options for prex ".

Attaching prex to a Running Process

If the program named a.out is running and the ps(1) command shows that a.out has a process ID (PID) of 2374, then the following command attaches prex to a.out:

$ prex -p 2374

If you are attaching prex to your already-running program, libtnfprobe must be linked to your program first. If you have not linked with libtnfprobe, prex exits with the error message that libtnfprobe is not linked to your program.

Linking With libtnfprobe

Linking with the library increases the size of your program by about 33 kilobytes (25 kilobytes shareable) and by the size of the trace file that is allocated (controllable through a command line option to prex). If you are never going to attach to a running program using prex, don't link with libtnfprobe.

However, if a program is not easy to restart and is a long-running program that you might someday want to gather information about, then do link with libtnfprobe. A window server is a good example of this--not the sort of program you want to stop and restart often--so you probably want to link with libtnfprobe before you start it running.

You can link with libtnfprobe in one of two ways:

If you are compiling your program, include -ltnfprobe on the cc line (if using -lthread, always put -ltnfprobe before -lthread).

$ cc -ltnfprobe -lthread -o cookie cookie.c

If your program is already compiled or you don't want to build your program with an explicit dependency on libtnfprobe, use the following command:

$ executable_object_name

See the Linker and Libraries Guide for an explanation of LD_PRELOAD.

Halting and Continuing prex

After you use prex to load the program, or attach prex to a running program, use Control-c to stop your program and pass control to prex.

When you see the prex prompt, prex is running and your program is halted.


Enter prex commands to control the probe points. For example, you can list probe points, turn on tracing, and debug your program.

To continue running your program, enter the continue command.

prex> continue

To stop your program and return to prex, press Control-c.

$ ^c

Table 1-1 lists the standard prex command line options.

Command Line Options for prex

Table 1-1 prex Command Line Options




The trace file contains information gathered by the prex command. The trace_file location is assumed to be relative to the directory from which you started prex. When no trace_file is specified, the default location is $TMPDIR/trace-pid where pid is the process ID of your program. If $TMPDIR is not set, /tmp is used.


When a program is traced, the trace file that is set at the start (the default or the -o name) is used for the life of the program.

-l libraries

The libraries argument contains the names of libraries to preload for your programs. It follows the LD_PRELOAD (see ld(1)) rules on how libraries should be specified and where they are found. Enclose the space-separated list of library names in double quotes.


When prex loads the program, the default library is You cannot use this option when attaching prex to an already-running process.

-s size

size is the size of the trace file in kilobytes (210 bytes). The default size of a trace file is four megabytes. The minimum size that can be specified is 128 kilobytes.

Running prex

After you have prex connected to your program, use prex commands to change the parameters of the data to be collected. You can use the prex list command to see all the probes in your program. As your program runs, each time a probe point is executed, information is recorded in the trace file. You look at the collected information with tnfdump(), described in "Reading the tnfdump File".

You can attach and detach prex from your program repeatedly, creating multiple sessions. For example, in your first session you could start the target with prex, set up the probes, and type quit resume.

This will start prex, define the information you want to collect, quit prex, and start collecting information by resuming execution of your program. The commands to do this are shown in Table 1-2.

In a following session, you could attach prex to the same program, reconfigure the probes, and again type continue. You can do this any number of times.

Note -

All traces are written to the trace file name specified in the first session --this file name cannot be changed.

Table 1-2 Basic prex Commands



% prex a.out

Attach prex to your program and start prex.

prex> enable $all

Enable all the probes. 

prex> quit resume

Quit prex and resume execution of program.

Components of the prex Command

The programmer who has inserted the probe points in the source code has assigned various attributes and (optional) values to each probe point. For example, each probe point has a "name" attribute whose value is the name the programmer chose for it.

Look at Table 1-3 for a list of the predefined attributes and the meanings of their values. The prex command lets you select a group of probe points by matching on their attributes or values. You can then trace or debug the selected probe points.

Table 1-5 lists the prex commands and their actions.

In code Example 1-1, the TNF_PROBE macro defines the probe point named work_start. This probe point has two arguments whose values will be logged--state and message. Each time the probe named work_start is encountered, a time stamp and the values of the variables state and message are logged to the trace file.

Example 1-1 prex Attributes and Values

	#include <tnf/probe.h>
	work(int state, char *message)
    TNF_PROBE_2(work_start, "work_module work"
               "sunw%debug in function work",
               tnf_long, int_input, state,
               tnf_string, string_input, message);

For more information on TNF_PROBE macros, including this example, see "Using the TNF_PROBE Macros".

Table 1-3 lists the predefined attributes.


Attributes are characteristics by which you identify probe points in your code. Table 1-3 lists the predefined attributes. You can add more attributes with the TNF_PROBE macros. See "Using the TNF_PROBE Macros".

Table 1-3 Predefined Attributes





A probe point performs the action that it is set up for only if it is enabled. For example, even when the tracing state is set, tracing occurs only if the probe point has been enabled. 

OFF (default) 


The name of the file containing the probe point. 


Shows the list of probe functions connected to this probe. Currently, only the debug function is available. 

<no value> (default) 


The groups to which the probe point belongs. If any key in a probe point is enabled, then that probe point is enabled.  




The line number in the code on which the probe point occurs. 



The name of the probe point. 



The name of the shared object or executable that the probe is in. Useful for selecting all the probes in a particular module. 



The names of the probe point arguments (arg_name_n, see Failed Cross Reference Format.)




When tracing for a probe point is on, a line is written to the trace file each time the probe point is executed in your program. 

ON (default) 

Grammar for Matching Probes

Select probes based on their attributes and values. The selection specification is called a selector_list. Each attribute or value can be described as one of the following:

A selector_list consists of one or more selector= selector items. If the initial selector= is not provided, it defaults to keys=. For example, the enable command is specified as :

enable selector_list

An example of this command is:

enable name=/first/ file='sampleZ.c'

This enables all probe points with either the name attribute containing the value first (a regular expression match), or the file attribute equal to the value sampleZ.c (Note that the trace is a disjunction and not a conjunction.)

Use $set_name as a shorthand alias for a selector_list specification. In the following example, the set_name is myprobes.

create $myprobes name=/first/ file='sampleZ.c'
enable $myprobes

This does the same thing as the previous example. The set_name follows the identifier naming rules. The $all default set selects all probes in the program.

Reserved Words

Table 1-4 lists the reserved words. Enclose them in single quotes if you use them to select attributes or values.

Table 1-4 Reserved Words

For example:

enable 'trace'='on'

means enable all probes whose trace attribute has the value of ON. Both trace and on are reserved words, so they both have to be enclosed in single quotation marks.

Table 1-5 prex Commands



clear $set_name

clear selector_list

Disconnect connected probe functions. 


connect &debug $set_name

connect &debug selector_list

Connect the debug function to the probe points. This does not enable the probe points. The debug function sends its output to stderr and not to the trace file.



Resume execution of your program, leaving prex attached.

create $set_name selector_list

Create a set with probe points matching selector_list. Also creates an alias, $set_name, for the selector_list.

enable $set_name

enable selector_list

disable $set_name

disable selector_list

Control whether the probe points perform the action they are set up for. Probe points are disabled by default; prex does not turn on tracing. The cheapest way (in terms of execution time at the probe point) to stop a probe point from tracing is to use the disable command.

The enable and disable commands are a master switch. If a probe point is not enabled, even if the probe point is connected to the debug() function and trace is on, no information is sent to stderr or to the trace file.



List all available prex commands.

list attributes probes selector_list

list attributes probes $set_name

List whether specified probe points are enabled or disabled, whether they have tracing on or off, and what the connected probe function is. Attributes are selectors as described in "Attributes ". For example,

list name file probes $all

lists only name and file values for matching probe points, while the command

list probes $all

lists all default attributes and their values (name, enable, trace, file, line, and funcs).


list fcns

List the defined functions (currently, only &debug is defined).

list history

List the control command history. The history of the commands used with connect, clear, trace, untrace, enable, and disable are executed whenever a new shared object is brought into your program through dlopen(). See "dlopen() and dlclose() and History".

list sets

List the defined sets. 

list values attributes

List the unique values associated with the specified attributes. For example, list values keys lists all the unique keys in the program.

source filename

Source a file of prex commands. filename is a quoted string.

trace $set_name

>trace selector_list

untrace $set_name

untrace selector_list

Control the tracing action of the probe points. trace and untrace determine whether a probe point generates a trace record when it is executed. Neither trace nor untrace enables the probe points.


The default mode is that tracing is on. 


The untrace command is useful when you are interested in getting only debug output. If you are using this, your probe should be enabled with tracing off and debug on.


Tracing disturbs your program less than the debug function does because trace writes to an mmap'd file while debug writes to stderr.



Quit prex; if your program was loaded with prex, it will be killed; if your program was attached to prex, it will be resumed.

quit kill

Quit prex and kill your program.

quit resume

Quit prex and resume execution of your program.

quit suspend

Quit prex and leave your program suspended.

Tracing, Enabling, and Connecting

To trace at a probe point, the probe point must have trace on and it must be enabled. To debug at a probe point, the probe point must be connected to the debug function and must be enabled.

Table 1-6 shows which combinations of tracing, enabling, and connecting result in which actions.

Table 1-6 Tracing, Enabling, and Connecting Actions

Enabled or Disabled 

Tracing State (On/Off) 

Debug State (Connected/Cleared) 

Results In 




Tracing and Debugging 




Tracing only 




Debugging only 





















Controlling prex From a Script

Enter commands to prex either from the prex command line or from a file containing prex commands.

When you start prex, it searches for a file named .prexrc first in $HOME/ and next in the directory from which you started prex. Commands are read from all files that are found. Therefore it is possible to have a .prexrc file in the current directory that overrides defaults that are set up by the .prexrc file in the home directory.

After reading any .prexrc files that are found, input is expected from the prex command line. To set up an experiment entirely using .prexrc files, the last statement in it can be quit resume, which quits prex but lets your program resume.

Also, when prex is running you can use the source filename command to specify a file from which prex reads commands. This file can have any name you like.

As with commands entered from the prex command line, all commands in the script should be in ASCII. The following rules apply:

A Sample C Program

This program, cookie.c, asks for a cookie. The response must be entered in uppercase, or it is incorrect. You can also find the prime factor of a number. In the process of doing these two things, you can see how the trace function works.

To compile this program and get an executable called cookie(), use

$ cc -o cookie cookie.c 

Five probe points are defined (and highlighted) in this program. They are named start (line 17), inloop (line 33), factor_start (line 60), found_a_factor (line 65), and factor_end (line 72). More information about these probe points is gathered and explained in "A Sample prex Session".

Example 1-2 Code for cookie.c()

#include <sys/types.h>
#include <stdio.h>
#include <string.h>
#include <tnf/probe.h>


static void find_factor(int n);

main(int argc, char **argv)
    boolean_t	shouldexit = B_FALSE;
    int 	sum = 0, max_loop = 5;
    int		i;

    TNF_PROBE_0(start, "cookie main", 
         "sunw%debug starting main");

    while (!shouldexit) {
	      char	response[MAX_RESPONSE_SIZE];
	      int	factor_input;

      	(void) printf("give me a COOKIE! ");
	      (void) scanf("%s", response);

      	if (!strcmp(response, "COOKIE")) {
	           (void) printf("thanks!\n");
	           shouldexit = B_TRUE;

      	else if (!strcmp(response, "loop")) {
       	    for (i = 0; i < max_loop; i++) {
	             	TNF_PROBE_2(inloop, "cookie main
                           loop","sunw%debug in the loop",
			                       	tnf_long, loop_count, 	i,
		                       		tnf_long, total_iterations, 	sum);
	         		max_loop += 2 ;

	    	else if (!strcmp(response, "factor")) {
         			(void) printf("number you want factored? ");
	         		(void) scanf("%d", &factor_input);

	    	else {
	         		(void) printf("not a %s, ", response);
    return 0;

}   /* end main */
static void
find_factor(int n)
    int i;

    (void) printf("\tfactors of %d = ", n);
    TNF_PROBE_1(factor_start, "factor", "",
      		tnf_long, input_number, n);

    for (i=2; i <= n; i++) {
      		while (n % i == 0) {
		        	TNF_PROBE_2(found_a_factor, "cookie find_factor",
               "", tnf_long, searching_for, n,
	           			tnf_long, factor, i);
	        		(void) printf("%d ", i);
	        		n /= i;
    TNF_PROBE_0(factor_end, "factor", "");
    (void) printf("\n");

A Sample prex Session

This sample of a prex session is designed to show the different capabilities of prex. The data collected when cookie is run is shown in "Reading the tnfdump File".

% prex cookie     /* prex is loading the executable cookie. */
Target process stopped
Type "continue" to resume the target, "help" for help ...
prex> list sets
$all      'keys'=/.*/    /* One set is defined--$all 
                            (which is all the probes). */
prex> list fcns
&debug    tnf_probe_debug      /* The debug function is 
                                  the only one available. */
prex> list probes $all
name=inloop enable=off trace=on file=cookie.c line=35 funcs=<no value>
name=factor_end enable=off trace=on file=cookie.c line=72 funcs=<no value>
name=factor_start enable=off trace=on file=cookie.c line=61 funcs=<no value>
name=found_a_factor enable=off trace=on file=cookie.c line=67 funcs=<no value>
name=start enable=off trace=on file=cookie.c line=17 funcs=<no value>
prex>/*  The line number shows the end of each of the five probes. */
prex> create $factor /factor/    /* Create a new set that
                            matches any probe whose "keys" */
prex>       /* attribute contains the string "factor". */
prex> list sets
$all      'keys'=/.*/
$factor   'keys'=/factor/        
/* A new set named "factor" 
                                   is created and now shows
                                   up in */
  /* the list of sets. */
prex> list probes $factor            
/* This line tells you which probes matched the set
name=factor_end enable=off trace=on file=cookie.c line=72 funcs=<no value>
name=factor_start enable=off trace=on file=cookie.c line=61 funcs=<no value>
name=found_a_factor enable=off trace=on file=cookie.c line=67 funcs=<no value>
prex> list probes $all          /* Check to see if any 
                                 probes are enabled. */
name=inloop enable=off trace=on file=cookie.c line=35 funcs=<no value>
name=factor_end enable=off trace=on file=cookie.c line=72 funcs=<no value>
name=factor_start enable=off trace=on file=cookie.c line=61 funcs=<no value>
name=found_a_factor enable=off trace=on file=cookie.c line=67 funcs=<no value>
name=start enable=off trace=on file=cookie.c line=17 funcs=<no value>
prex>                  /* None are enabled, 
                      but all have trace on. */

prex> enable $all             /* Enable all the probes.*/
prex> list probes $all        /* Check again to see if any 
                              probes are enabled. */  
name=inloop enable=on trace=on file=cookie.c line=35 funcs=<no value>
name=factor_end enable=on trace=on file=cookie.c line=72 funcs=<no value>
name=factor_start enable=on trace=on file=cookie.c line=61 funcs=<no value>
name=found_a_factor enable=on trace=on file=cookie.c line=67 funcs=<no value>
name=start enable=on trace=on file=cookie.c line=17 funcs=<no value
prex> list values name              /* Find out what the probe names are. */
name =                              
prex> list values /.*/          /* List all predefined
                              attributes with their values. */
enable =              /* Only unique attributes are listed.*/
file =
funcs =
keys =
line =
name =
object =
slots =
sunw%debug =           /* The user-defined macro, sunw%debug, 
                       is also listed. */
	   in     /* This macro is defined in line 17 of cookie.c. */
trace =
prex> list values object
object =
prex> connect &debug name=inloop
prex> list /.*/ probes $all         /* List all the
                           information about all the probes */
enable=on trace=on object=cookie funcs=<no value> name=inloop slots=loop_count
total_iterations keys=cookie main loop file=cookie.c line=35 sunw%debug=in the loop 
enable=on trace=on object=cookie funcs=<no value> name=factor_end slots=<no value>
keys=factor file=cookie.c line=72
enable=on trace=on object=cookie funcs=<no value> name=factor_start slots=input_number
keys=factor file=cookie.c line=61 
enable=on trace=on object=cookie funcs=<no value> name=found_a_factor slots=searching_for
factor keys=cookie find_factor file=cookie.c line=67 
enable=on trace=on object=cookie funcs=<no value> name=start slots=<no value> keys=cookie
main file=cookie.c line=17 sunw%debug=starting main 
prex> continue
give me a COOKIE! loop       
 /* An example of loop counts*/
probe inloop; sunw%debug "in the loop"; loop_count=0; total_iterations=0;
probe inloop; sunw%debug "in the loop"; loop_count=1; total_iterations=1;
probe inloop; sunw%debug "in the loop"; loop_count=2; total_iterations=2;
probe inloop; sunw%debug "in the loop"; loop_count=3; total_iterations=3;
probe inloop; sunw%debug "in the loop"; loop_count=4; total_iterations=4;
give me a COOKIE! factor	
number you want factored? 25
	factors of 25 = 5 5
give me a COOKIE! factor
number you want factored? 43645729
	factors of 43645729 = 43645729
give me a COOKIE! ^C Target process stopped
Type "continue" to resume the target, "help" for help...
prex> continue
give me a COOKIE! biscuit
not a biscuit, give me a COOKIE!  cookie
not a cookie, give me a COOKIE! COOKIE
prex: target process finished

Reading the Trace File

The binary trace file you create with prex contains information determined by the prex commands you chose (see "Running prex").

By default, this file is created in /$TMPDIR/trace-pid, where pid is the process ID of the target program. If $TMPDIR is not set, the file is created in /tmp/trace-pid. You can override these default locations with the -o option to the prex command (see "Command Line Options for prex " for a full explanation).

When the trace file has been filled, newer events overwrite the older events. The default size of a trace file is four megabytes. This can be changed with the -s option for prex.

Note -

After a program is tracing to a file, there is no way to clear the trace file or to give it a different name for the lifetime of that program. If you disconnect from the target and attach later with a different trace file name, the newer name is ignored.

Converting the Binary File to Readable Format

To convert the binary trace file to an ASCII file, use the tnfdump command and the name of the binary trace file. Because tnfdump output goes to stdout by default, you probably want to redirect it into a file.

$ tnfdump filename > newfile

The- -r option to tnfdump provides detailed (raw) TNF output. Reading this output requires an understanding of TNF that is beyond the scope of this chapter.

Reading the tnfdump File

The following table shows output from the prex cookie command described in "A Sample prex Session". The output of the tnfdump file is very wide--open a very wide window to display it:

probe	 tnf_name: "start" tnf_string: "keys cookie main;file cookie.c;line 17;sunw%debug
starting main"
probe	 tnf_name: "factor_start" tnf_string: "keys factor;file cookie.c;line 61;"
probe	 tnf_name: "found_a_factor" tnf_string: "keys cookie find_factor;file cookie.c;line
probe	 tnf_name: "factor_end" tnf_string: "keys factor;file cookie.c;line 72;"
---------------- ---------------- ----- ----- ---------- --- -------------------------
    Elapsed (ms)  Delta (ms)     PID  LWPID  TID  CPU Probe Name        Data / Description . . .
---------------- ---------------- ----- ----- ---------- --- -------------------------
	 0.000000	0.000000	5354	1	0	- start	
	4551.625000	4551.625000	5354	1	0	- factor_start	input_number: 25
	4571.278000	19.653000	5354	1	0	- found_a_factor	searching_for: 25 factor: 5
	4571.543000	0.265000	5354	1	0	- found_a_factor	searching_for: 5 factor: 5
	4571.732000	0.189000	5354	1	0	- factor_end	
	 23151.434000	18579.702000	5354	1	0	- factor_start	input_number: 101247
	 23151.509000	0.075000	5354	1	0	- found_a_factor	searching_for: 101247 factor: 3
	 23228.090000	76.581000	5354	1	0	- found_a_factor	searching_for: 33749 factor: 33749
	 23228.250000	0.160000	5354	1	0	- factor_end	
	 89041.868000	65813.618000	5354	1	0	- factor_start	input_number: -1690908149
	 89041.920000	0.052000	5354	1	0	- factor_end	
	108271.852000	19229.932000	5354	1	0	- factor_start	input_number: 43645729
	208857.756000	 100585.904000	5354	1	0	- found_a_factor	searching_for: 43645729 factor:
	208857.960000	0.204000	5354	1	0	- factor_end	
	334511.548000	 125653.588000	5354	1	0	- factor_start	input_number: 12
	334511.618000	0.070000	5354	1	0	- found_a_factor	searching_for: 12 factor: 2
	334511.689000	0.071000	5354	1	0	- found_a_factor	searching_for: 6 factor: 2
	334511.750000	0.061000	5354	1	0	- found_a_factor	searching_for: 3 factor: 3
	334511.808000	0.058000	5354	1	0	- factor_end	

Looking at the tnfdump display, you can see how long it takes to find a factor by subtracting the factor_start time from the factor_end time. Factoring 43645729 took 208857.960000 - 108271.852000, or 100586.11 milliseconds. Factoring 12 took 334511.808000 - 334511.548000, or .260000 milliseconds.

Note -

Results are reported with nanosecond precision for all hardware platforms. Accuracy, however, depends on the hardware platform used.

Kernel Tracing

Starting with the Solaris 2.5 release, the SunOS kernel has a small number of TNF probes built into it. The probes record kernel events such as system calls, thread state transitions, page faults, swapping, and I/O. You can use these probes to obtain detailed traces of kernel activity under your application workloads. The probes have negligible impact on the performance of the running kernel when the probes are disabled.

The method for tracing the kernel is similar to tracing a user-level process, although there are some differences. To understand the following discussion, you should have read the sections "Running prex" and "Reading the Trace File".You need super-user privileges to trace the kernel.

Use the prex utility to control kernel probes. The standard prex commands to list and manipulate probes are available to you, along with commands to set up and manage kernel tracing.

Kernel probes write trace records into a kernel trace buffer. You must copy the buffer into a TNF file for post-processing; use the tnfxtract utility for this.

You use the tnfdump utility to examine a kernel trace file. This is exactly the same as examining a user-level trace file.

The steps you typically follow to take a kernel trace are:

  1. Become superuser (su).

  2. Allocate a kernel trace buffer of the desired size (prex).

  3. Select the probes you want to trace and enable (prex).

  4. Turn kernel tracing on (prex).

  5. Run your application.

  6. Turn kernel tracing off (prex).

  7. Extract the kernel trace buffer (tnfxtract).

  8. Disable all probes (prex).

  9. Deallocate the kernel trace buffer (prex).

  10. Examine the trace file (tnfdump).

    A convenient way to follow these steps is to use two shell windows; run an interactive prex session in one, and run your application and tnfxtract in the other.You might find it simpler to use the ktrace shell script shown in " Shell Script for Kernel Tracing".

Controlling Kernel Tracing (prex)

Start prex on the kernel with the -k flag (make sure you are root). After prex successfully attaches to the kernel, it prompts you for commands.

# prex -k

Note -

Only one prex session can be attached to the kernel at any given time.

Buffer Allocation

The first step in taking a kernel trace is to allocate a kernel trace buffer. The trace buffer is circular, which means that newer data overwrites older data after the buffer fills up.

The default size for the buffer is 384 kilobytes, but you can override this when you allocate the buffer. When deciding on the buffer size, keep in mind that the buffer occupies both physical memory and kernel virtual memory, so it will impact the system you are tracing--the more RAM you have, the less the impact of measurement on your experiment, and the bigger the buffer you can allocate.

Use the buffer command in prex to allocate the buffer. For example, you could allocate a 512 kilobyte buffer as follows:

prex> buffer			      # do we have a buffer?
No trace buffer allocated 	
prex> buffer alloc 512k# allocate 512KB buffer
Buffer of size 524288 bytes allocated 	

Note -

The minimum buffer size is 128 kilobytes; prex allocates a buffer of this size if you request anything smaller.

Selecting and Enabling Probes

Use the standard prex list command to list kernel probes and their attributes. For example, to list the name and keys attributes of the probe named pagein:

prex> list name=pagein keys
name=pagein keys=vm pageio io

To list the name and keys attributes of all probes in the io group:

prex> list name keys probes io 
name=biodone keys=io blockio
name=physio_start keys=io rawio
name=pagein keys=vm pageio io
name=pageout keys=vm pageio io
name=physio_end keys=io rawio
name=strategy keys=io blockio

The next step is to trace and enable the probes you need. For example, to select all probes whose keys specify thread, vm, and io, and prepare them for tracing:

prex> trace thread vm io    	# attach trace function to probes 	
prex> enable thread vm io	   # enable probes

Note -

Unlike user-level probes, kernel probes are not automatically `traced' (in prex terminology) when prex attaches to the kernel. You must explicitly issue the trace and enable commands as shown above.

At this point the relevant probes are ready for tracing, but kernel tracing has not been globally enabled so no trace records are being written.

Process Filtering

If you want to trace all system activity (for example, on a busy server), you can proceed to "Enabling and Disabling Tracing". This is the most common use of kernel tracing, and usually the most informative.

However, you also have the option of restricting trace data generation to selected processes; this can reduce the amount of data you need to collect and analyze.

prex uses two abstractions to allow you to do process filtering.

Use the prex pfilter command to control process filtering.

prex> pfilter             			# are we filtering?
Process filtering is off 	
Process filter set is empty. 	
prex> pfilter add 408     		# add PID 408 to filter set 	
prex> pfilter 	
Process filtering is off 	
Process filter set is {408} 	
prex> pfilter on          		# enable process filter mode 	

Note -

System threads (such as interrupt threads) are treated as belonging to process 0.

Enabling and Disabling Tracing

The final step in starting up tracing is to globally enable kernel tracing. When you do this, the probes you have enabled write records into the kernel trace buffer as they are encountered.

prex> ktrace			        # check tracing status 	
Tracing is off 	
prex> ktrace on			     # enable kernel tracing 	

To trace the kernel behavior of your application (perhaps with user-level tracing enabled as well), start it running at this time. When your application finishes, or when you decide you have enough trace data to sample its behavior, globally disable kernel tracing.

prex> ktrace off		     # disable kernel tracing 	

The trace buffer is still present in the kernel. Copy it out into a TNF file using tnfxtract. The procedure that you follow is explained in the subsection " Extracting Kernel Trace Data (tnfxtract)".

Resetting Kernel Tracing

After you have copied the trace buffer into a TNF file, reset kernel tracing by disabling all probes and deallocating the buffer. This is important because it brings the performance of the kernel back to where it was before you started tracing.

Use the standard prex commands to disable and untrace all probes.

prex> disable $all      # disable all probes 	
prex> untrace $all      # untrace all probes 	

Finally, use the prex buffer command to deallocate the trace buffer.

Caution - Caution -

Do not deallocate the trace buffer until you have copied it out into a trace file. Otherwise, you will lose the trace data you collected for your experiment.

prex> buffer dealloc	   # deallocate buffer 	
buffer deallocated

You can now quit prex and examine the trace data that you have collected.

prex> quit 	

Extracting Kernel Trace Data (tnfxtract)

Use the tnfxtract utility to make a copy (or snapshot) of the active kernel trace buffer into an external TNF trace file. You typically run the utility after tracing has been disabled, although you can also run it concurrently with tracing. The utility ensures that it reads and writes only consistent TNF data.

Specify a file name that tnfxtract can use to hold the extracted kernel trace data. This file will be overwritten and truncated to the size of the kernel trace buffer. For example, to extract it into a temporary file named ktrace.tnf:

# tnfxtract /tmp/ktrace.tnf 	
# ls -l /tmp/ktrace.tnf 	
-rw------- 1 root other 524288 Aug 15 16:00 /tmp/ktrace.tnf 	

You might find it convenient to change the permissions on the trace file so that it is world-readable. This makes it easier for you to run analysis tools over the file when you are not superuser.

Note -

You can also use tnfxtract on a kernel crash dump; see tnfxtract(1) for details.

Examining Kernel Trace Data (tnfdump)

Use tnfdump to get a time-ordered ASCII dump of a kernel trace file on stdout (exactly as in user-level tracing). Because the trace output might be very large, you typically run it through a pager (such as more) or redirect it to an output file.

# tnfdump /tmp/ktrace.tnf | more

Note -

tnfdump accepts multiple TNF files as input; it produces a time-sorted output of all event records in its input files, so you can use it to combine multiple user-level trace files along with the kernel trace file.

Available Kernel Probes (tnf_probes)

The SunOS kernel probes supply information approximately at the level of statistics-based Solaris performance monitoring tools such as vmstat, iostat, and sar. However, they provide more detail -- each probe records a high-resolution timestamp, a thread (LWP, thread, and process identifiers), a CPU, and individual probes identify the system resources (such as disks, files, CPUs, and so on) associated with an event.

You can use kernel tracing, along with user-level tracing, to correlate the events recorded by your application or library probes with the events recorded by the kernel probes. Thus you can get a detailed view of how your code uses kernel services, and how the demands your application places on system resources interact with the demands placed by other clients.

The text below summarizes the SunOS kernel probes; see the tnf_probes(4) manual page for detailed information on the event record fields.

Thread Probes

The thread_create probe traces kernel thread creation. It records the process identifier, the kernel thread identifier, and the kernel address of the start routine of the new thread.

The thread_exit probe records the termination of the current thread.

Probes that trace thread-state transitions are called microstate probes. They record a thread state and, optionally, a thread identifier. If a thread identifier is provided, the state change applies to the indicated thread. Otherwise, it applies to the writing thread.

Thread state values use the microstate constants defined in <sys/msacct.h>. The states are recorded as shown in Table 1-7.

Table 1-7 Thread Microstate Constants




Running in user mode 


Running in system mode 


Initial state for user text fault 


Initial state for user data fault 


Initial state for other trap 


Asleep, waiting for user-mode lock 


Asleep for any other reason 


Waiting for a CPU (runnable) 


Stopped (/proc, jobcontrol, lwp_stop)

Note -

The kernel does not trace transitions between the system and user states that are implied by system calls. This is to reduce trace volume. You need to enable the system call probes to get this information; for convenience, they are automatically enabled for you when you enable probes with the thread key.

System Call Probes

System call entry and exit probes identify the system operations explicitly requested by user code.

The syscall_start probe marks the start of a system call, and records the system call number. The probe does not capture system call arguments, because this is fairly expensive. (Some of this information can be captured by interposing on the C library entry points at the user level.) The syscall_start probe also implicitly marks the current thread as entering the system state.

The syscall_end probe marks the end of a system call, and records the two return values of the call, as well as the errno value. The syscall_end probe also implicitly marks the current thread as entering the user state.

Note -

System call implementation at this level can change from release to release. Do not rely on a consistent mapping of system calls to numbers.

VM Probes

The virtual memory subsystem (VM) probes provide information on page faults, page I/O, the page daemon, and the swapper.

Page Faults

Page fault probes relate virtual addresses with fault types and with files (vnodes).

The address_fault probe traces address-space faults; it records the faulting virtual address, the type of fault, and the desired access protection.

The fault type and access type values use the constants defined in <vm/seg_enum.h>. Fault types are invalid page (minor fault), protection fault, or software requests to lock and unlock pages for physical I/O (softlock and softunlock). Access types are read, write, execute and create.

The major_fault probe traces major page faults; it records the vnode and offset (which together identify a file system page) from which the fault should be resolved. This data can be correlated with the immediately preceding address_fault event for the current thread to obtain the faulting virtual address.

The anon_private probe traces copy-on-write faults.

The anon_zero probe traces zero-fill faults.

The page_unmap probe marks the dissociation between a physical page and a file system page (for example, when a page is renamed or destroyed).

Page I/O

The pagein probe traces the initiation of a pagein request; it records the vnode, offset, and size of the pagein.

The pageout probe traces the completion of a pageout request; it records the number of pages paged out, the number of pages freed, and the number of pages reclaimed after the pageout.

Page Daemon

Iterations of the page daemon (page stealer) are traced by two probes: pageout_scan_start and pageout_scan_end. The probes report the number of free pages needed before the scan, the number of pages examined during the scan, and the free page counts before and after the scan. Potentially, more pages are freed when pageout requests queued by the scan are completed.


Three probes trace the activity of the swapper.

Local I/O Probes

The strategy probe traces the initiation of local block device I/O by the kernel. It records the device number, logical block number, size, buffer pointer, and buffer flags associated with the transfer. The flag values are the buffer status flags as defined in <sys/buf.h>.

The biodone probe traces the completion of a buffered I/O transfer, that is, calls to the kernel biodone(9f) routine. It records the device number, logical block number, and buffer pointer associated with the transfer.

Physical (raw) I/O is traced by two probes in physio(9f): physio_start and physio_end. These probes record the device number, offset, size, and direction of the I/O transfer.

Other Probes

The thread_queue probe traces thread scheduling; it records the thread identifier of the scheduled thread, the CPU associated with the dispatch queue on which it is placed, the thread's dispatch priority, and the current number of runnable threads on the dispatch queue.

Shell Script for Kernel Tracing

Example 1-3 ktrace Script

# ktrace
#	Reset and enable kernel tracing
#	Run command in background, if specified
#	Sleep for duration, if specified
#	Wait for command completion, timeout, or keyboard interrupt
#	(Note: keyboard interrupt kills command)
#	Disable kernel tracing
#	Extract kernel trace buffer
#	Reset kernel tracing, if specified



# usage message
usage() {
	echo ""
	echo $1
	echo "
Usage: ktrace	[-o <output_file>]	# default /tmp/ktrace.tnf
		[-s <buffer_size>]	# default 512k
		-e <events>		# kernel probes (keys) to enable
		[-r]			# don't reset kernel tracing
					# default is to reset after command
		<cmd> | -t <seconds>

	# ktrace -e `thread vm io' -t 10
	# ktrace -e `thread' -s 256k myapp ...
	exit 1
# failure message
fail() {
	while [ $# -gt 0 ]
		echo $1
	echo "ktrace failed"
	exit 2

# Reset kernel tracing
ktrace_reset() {
	if [ $1 = "y" ]; then
		echo "Resetting kernel tracing"
		prex -k >/dev/null 2>&1 <<-EOF
			ktrace off
			untrace /.*/
			disable /.*/
			buffer dealloc
		test $? -ne 0 && fail "Could not reset kernel tracing" \
			"`su root' and retry"

# Enable kernel tracing
ktrace_on() {
	echo "Enabling kernel tracing"
	prex -k >/dev/null 2>&1 <<-EOF
		buffer alloc $buffer_size
		trace $events
		enable $events
		ktrace on
	test $? -ne 0 && fail "Could not enable kernel tracing" \
		"Check syntax of `-e' argument" \
		"Check buffer size is not too high"

# Disable kernel tracing
ktrace_off() {
	prex -k >/dev/null 2>&1 <<-EOF
		ktrace off
	test $? -ne 0 && fail "Could not disable kernel tracing"
	echo "Kernel tracing disabled"

# Extract kernel trace buffer
ktrace_xtract() {
	echo "Extracting kernel trace buffer"
	tnfxtract $output_file || fail "Could not extract kernel trace
	ls -l $output_file

# Run command, sleep for duration, or wait for command to complete
run_command() {
	trap `interrupt' 0 1 2 3 15
	if [ "$command" ]; then
		$command &
		echo "`$command' pid is $child"

	if [ "$duration" ]; then
		sleep $duration &
		wait $alarm
		# XXX test -z "$child" || kill -15 $child
		wait $child
	trap 0 1 2 3 15

# Keyboard interrupt
interrupt() {
	test -z "$alarm" || kill -15 $alarm
	test -z "$child" || kill -15 $child

# Parse options
while getopts o:s:t:e:r opt
	case $opt in
	 o)	output_file="$OPTARG";;
	 s)	buffer_size="$OPTARG";;
	 t)	duration="$OPTARG";;
	 e)	events="$OPTARG";;
	 r)	do_reset="n";;
	 \?)	usage;;

shift `expr $OPTIND - 1`

# Get command to run
test $# -gt 0 && command="$*"

# Check and normalize options
test -z "$events" && usage "No kernel events specified"
test -z "$command" && test -z "$duration" && \
	usage "No command or time duration specified"

# Perform experiment
ktrace_reset y				# Reset kernel tracing
ktrace_on				# Enable kernel tracing
run_command				# Run command, wait, or sleep
ktrace_off				# Disable kernel tracing
ktrace_xtract				# Extract buffer
ktrace_reset $do_reset			# Reset kernel tracing again
exit 0

Advanced Topics

Inserting Probe Points

Insert probe points in your code to:

Using the TNF_PROBE Macros

In the simplest case, TNF_PROBE_0, you give no argument types:

TNF_PROBE_0 (name, keys, detail);

The variables are:

Table 1-8 gives the values that are matched on for the command shown above:

Table 1-8 Examples of User-Defined Attributes



Values prex matches on


entering function A

entering or function or A


no file

no or file



/.*/ (regular expression)


red blue

red or blue

libtnfprobe reserves all attribute names that are not prefixed by a vendor symbol (it reserves all attributes that do not have the % character in them). The code for cookie.c in "A Sample C Program" contains the following use of TNF_PROBE_0:

TNF_PROBE_0(start, "cookie main", "sunw%debug starting main");

Note -

Compiling with the preprocessor option -DNPROBE (see cc(1)), or with the preprocessor control statement #define NPROBE ahead of the #include <tnf/probe.h> statement, stops probe points as well as TNF type extension code from being compiled into the program.


The numbers 1 through 5 in the argument names are used here to illustrate the number of variables you give to the probe point. For example, the syntax for TNF_PROBE_1 is:

TNF_PROBE_1(name, keys, detail,
arg_type_1, arg_name_1, arg_value_1);

and the syntax for TNF_PROBE_5 is:

TNF_PROBE_5(name, keys, detail,
arg_type_1, arg_name_1, arg_value_1
arg_type_2, arg_name_2, arg_value_2
arg_type_3, arg_name_3, arg_value_3
arg_type_4, arg_name_4, arg_value_4
arg_type_5, arg_name_5, arg_value_5);

The arguments are:

Table 1-9 Predefined Types


Associated C Type and Semantics 


int, long 


unsigned int, unsigned long 


long long (if implemented in compilation system) 


unsigned long long (if implemented in compilation system) 






char * 


void * 

For example, the cookie.c program on "A Sample C Program" uses TNF_PROBE_2 as follows:

TNF_PROBE_2(inloop, "cookie main loop","sunw%debug in the loop",
				tnf_long, loop_count, 		i,
				tnf_long, total_iterations, 	sum);

Table 1-10 explains some of the macro definitions in cookie.c.

Table 1-10 TNF Macro Definitions in cookie.c


A probe with no argument types 


The name of the probe 

"cookie main", 

The list of groups the probe belongs to cookie and main (the values of the keys attribute)

"sunw%debug starting main"); 

User-defined attribute=sunw%debug; value=starting main (used by debug probe function)


A probe with two variables 


The name of the probe 

"cookie main loop", 

The keys - cookie, main, and loop

"sunw%debug in the loop", 

Values for the debug probe function 


The type of the first variable 


The name of the first variable (value of the slots attribute)


The first variable 


The type of the second variable 


The name of the second variable (value of the slots attribute)


The second variable 



Example--Timing Functions

In Example 1-4, probe points are placed at the entry and exit of a function to see how much time is spent in the function. The probe at the function entry also logs the arguments to the function.

When prex encounters a probe point at run time that is enabled for tracing, it writes a record to the trace file. Each probe point logs the time when it was encountered and also references a tag record containing information like the file name, line number, name, keys, and detail of the probe point. These tag records are written only once, and are never overwritten in the trace file.

The first probe point, work_args, also logs the value of the two arguments of the probe point (state and message).

Example 1-4 Probe Points at Entry and Exit of Function

#include <tnf/probe.h>

work(int state, char *message)
    TNF_PROBE_2(work_start, "work_module work"
               "sunw%debug in function work",
               tnf_long, int_input, state,
               tnf_string, string_input, message);
    TNF_PROBE_0(work_end, "work_module work", "");

Defining User Types for Probe Points

To trace a structure in your program, define a new type with the TNF_DECLARE_RECORD and TNF_DEFINE_RECORD_n macros. These are parts of a compile time interface for extending the types sent in to probe points.

TNF_DECLARE_RECORD(c_type, tnf_type);
TNF_DEFINE_RECORD_1(c_type, tnf_type, 
TNF_DEFINE_RECORD_2(c_type, tnf_type,  
TNF_DEFINE_RECORD_3(c_type, tnf_type, 
TNF_DEFINE_RECORD_4(c_type, tnf_type, 
TNF_DEFINE_RECORD_5(c_type, tnf_type, 

Create only one TNF_DECLARE_RECORD and one TNF_DEFINE_RECORD for each new type you define. The TNF_DECLARE_RECORD should precede the TNF_DEFINE_RECORD. It can be in a header file that multiple source files share if those source files need to use the tnf_type being defined. The TNF_DEFINE_RECORD should appear in only one of the source files.

The TNF_DEFINE_RECORD macro interface defines a function as well as several data structures. Therefore, use this interface in a source file (.c file or .cc file) at file scope and not inside a function.

Note -

Do not put a semicolon after the TNF_DEFINE_RECORD statement; it will generate a compiler warning.

The variables are:

Examples--Defining TNF Types

Example 1-5 shows how a new TNF type is defined and used in a probe.

Example 1-5 is assumed to be part of a fictitious library called that uses the prefix pal for all its symbols.

Example 1-5 Defining a new TNF type

#include <tnf/probe.h>

typedef struct pal_header {
        long    size;
        char * descriptor;
        struct pal_header *next;
} pal_header_t;

TNF_DECLARE_RECORD(pal_header_t, pal_tnf_header);
TNF_DEFINE_RECORD_2(pal_header_t, pal_tnf_header,
                        tnf_long,   size,
                        tnf_string, descriptor)

 * Note: name space prefixed by pal_tnf_header should not be  * used by this client any more.

pal_free(pal_header_t *header_p)
        int state;

        TNF_PROBE_2(pal_free_start, "palloc pal_free",
                "sunw%debug entering pal_free",
                tnf_long,       state_var, state,
                pal_tnf_header, header_var, header_p);
        . . .

It is possible to make a tnf_type definition recursive or mutually recursive, such as in a structure that uses the next field to point to itself (a linked list).

When such a structure is sent in to a TNF_PROBE, then the entire linked list is logged to the trace file (until the next field is NULL). But, when the list is circular, it results in an infinite loop. To break the recursion, either omit the next field from the tnf_type, or define the type of the next member as tnf_opaque.

Performance Issues

Don't place probe points in sections of code that are traversed frequently, as in a mutex lock that is used often.

Estimate about 30 words of working set memory (10 words data and 20 words text) for each probe and about 200ns for each disabled probe on a SPARCStation10. You can control the performance degradation of the application by controlling the number and placement of probes.

If you are shipping a library with probe points, it is important to run benchmarks to ensure that the performance is still at an acceptable level. Reduce the number of probes or change their positions to increase performance.


dbx, truss, and prex all use /proc to control the target process. /proc allows only one client to control a target process safely. Because of this, you cannot run programs like dbx and prex simultaneously on the same target program. If you try to run prex on a target while dbx or truss is running on the same target, prex displays the message "Cannot attach to target."

You can, however, interleave prex and dbx execution by following these steps:

  1. Start prex.

  2. Set up the state of the probe points.

  3. Give the command quit suspend.

  4. Start dbx.

  5. Attach to the suspended program.

    The target will not execute any code between prex and dbx.

    You can also suspend the target by sending it a SIGSTOP signal, then type "quit resume" to prex. If you do this, you should also send a SIGCONT signal after invoking dbx on the stopped process (or else dbx hangs).

dlopen() and dlclose() and History

Probes in shared objects that are brought in by a dlopen(3X) are automatically set up according to the prex command history. When a shared object is removed by a dlclose(3X), prex refreshes its understanding of the probes in the target program. This implies that there is more work to do for dlopen and dlclose, so they take slightly longer.

If you are not interested in this feature and don't want dlopen and dlclose to be perturbed, detach prex from the target.


prex does not interfere with signals that are delivered directly to the target program. However, prex receives all terminal-generated signals, such as Control-c (SIGINT) and Control-z (SIGSTOP), and does not forward them to the target program.

Use the kill(1) command from a shell to signal the target program.

Failure of Event-Writing Operations

A few failure points, like system call failures, are possible when writing events to trace files. These failures result in a failure code being set in the target process. The target process continues normally (but no trace records are written).

Whenever a user types Control-c to prex to get to a prex prompt, prex checks the failure code in the target and informs the user if there was a tracing failure.

Target Executing a fork() or exec()

If your program does a fork(), any probes that the child encounters are logged to the same trace file. Events are annotated with a process ID, so it is possible to determine which process a particular event came from.

A thread in a multithreaded program doing a fork while the other threads are still running can cause a race conditon. For the trace file to stay uncorrupted, make sure that the other threads are quiescent when doing a fork, or else use fork1(2).

If the target program itself (not any children it might fork) does an exec(2), prex detaches from the target and exits. The user can reconnect prex with:

$ prex -p pid