-A P P E N D I X

A P P E N D I X A

Profiling Programs With prof, gprof, and tcov

The tools discussed in this appendix are standard utilities for timing programs and obtaining performance data to analyze, and are called traditional profiling tools. The profiling tools prof and gprof are provided with the SolarisOperating System. tcov is a code coverage tool provided with the Suncompilers and tools.

Note - If you want to track how many times a function is called or how often a line of source code is executed, use the traditional profiling tools. If you want a detailed analysis of where your program is spending time, you can get more accurate information using the Collector and Performance Analyzer. See Chapter 3 and the online help for information on using these tools.

TABLE A-1 describes the information that is generated by these standard performance profiling tools.


Command	Output
`prof`	Generates a statistical profile of the CPU time used by a program and an exact count of the number of times each function is entered.
`gprof`	Generates a statistical profile of the CPU time used by a program, along with an exact count of the number of times each function is entered and the number of times each arc (caller-callee pair) in the program's call graph is traversed.
`tcov`	Generates exact counts of the number of times each statement in a program is executed.

Not all the traditional profiling tools work on modules written in programming languages other than C. See the sections on each tool for more information about languages.

This appendix covers the following topics:

Using prof to Generate a Program Profile

Using gprof to Generate a Call Graph Profile

Using tcov for Statement-Level Analysis

Using tcov Enhanced for Statement-Level Analysis

Using `prof` to Generate a Program Profile

prof generates a statistical profile of the CPU time used by a program and counts the number of times each function in a program is entered. Different or more detailed data is provided by the gprof call-graph profile and the tcov code coverage tools.

To generate a profile report using prof:

1. Compile your program with the -p compiler option.

2. Run your program.

Profiling data is sent to a profile file called mon.out. This file is overwritten each time you run the program.

3. Run prof to generate a profile report.

The syntax of the prof command is as follows.

% prof program-name

Here, program-name is the name of the executable. The profile report is written to stdout. It is presented as a series of rows for each function under these column headings:

%Time--The percentage of the total CPU time consumed by this function.

Seconds--The total CPU time accounted for by this function.

Cumsecs--A running sum of the number of seconds accounted for by this function and those listed before it.

#Calls--The number of times this function is called.

msecs/call--The average number of milliseconds this function consumes each time it is called.

Name--The name of the function.

The use of prof is illustrated in the following example.

% cc -p -o index.assist index.assist.c

% index.assist

% prof index.assist

The profile report from prof is shown in the table below:

`%Time`	`Seconds`	`Cumsecs`	`#Calls`	`msecs/call`	`Name`
`19.4`	`3.28`	`3.28`	`11962`	`0.27`	`compare_strings`
`15.6`	`2.64`	`5.92`	`32731`	`0.08`	`_strlen`
`12.6`	`2.14`	`8.06`	`4579`	`0.47`	`__doprnt`
`10.5`	`1.78`	`9.84`			`mcount`
`9.9`	`1.68`	`11.52`	`6849`	`0.25`	`_get_field`
`5.3`	`0.90`	`12.42`	`762`	`1.18`	`_fgets`
`4.7`	`0.80`	`13.22`	`19715`	`0.04`	`_strcmp`
`4.0`	`0.67`	`13.89`	`5329`	`0.13`	`_malloc`
`3.4`	`0.57`	`14.46`	`11152`	`0.05`	`_insert_index_entry`
`3.1`	`0.53`	`14.99`	`11152`	`0.05`	`_compare_entry`
`2.5`	`0.42`	`15.41`	`1289`	`0.33`	`lmodt`
`0.9`	`0.16`	`15.57`	`761`	`0.21`	`_get_index_terms`
`0.9`	`0.16`	`15.73`	`3805`	`0.04`	`_strcpy`
`0.8`	`0.14`	`15.87`	`6849`	`0.02`	`_skip_space`
`0.7`	`0.12`	`15.99`	`13`	`9.23`	`_read`
`0.7`	`0.12`	`16.11`	`1289`	`0.09`	`ldivt`
`0.6`	`0.10`	`16.21`	`1405`	`0.07`	`_print_index`
`.` `.` `.` `(The rest of the output is insignificant)`

The profile report shows that most of the program execution time is spent in the compare_strings() function; after that, most of the CPU time is spent in the _strlen() library function. To make this program more efficient, the user would concentrate on the compare_strings() function, which consumes nearly 20% of the total CPU time, and improve the algorithm or reduce the number of calls.

It is not obvious from the prof profile report that compare_strings() is heavily recursive, but you can deduce this by using the call graph profile described in Using gprof to Generate a Call Graph Profile. In this particular case, improving the algorithm also reduces the number of calls.

Note - For the Solaris Operating System versions 7 and 8, the profile of CPU time is accurate for programs that use multiple CPUs, but the fact that the counts are not locked may affect the accuracy of the counts for functions.

Using `gprof` to Generate a Call Graph Profile

While the flat profile from prof can provide valuable data for performance improvements, a more detailed analysis can be obtained by using a call graph profile to display a list identifying which modules are called by other modules, and which modules call other modules. Sometimes removing calls altogether can result in performance improvements.

Note - gprof attributes the time spent within a function to the callers in proportion to the number of times that each arc is traversed. Because all calls are not equivalent in performance, this behavior might lead to incorrect assumptions. See the performance analyzer tutorial on developers.sun.com for an example.

Like prof, gprof generates a statistical profile of the CPU time that is used by a program and it counts the number of times that each function is entered. gprof also counts the number of times that each arc in the program's call graph is traversed. An arc is a caller-callee pair.

Note - The profile of CPU time is accurate for programs that use multiple CPUs for the Solaris Operating System versions 7 and 8, but the fact that the counts are not locked may affect the accuracy of the counts for functions.

To generate a profile report using gprof:

1. Compile your program with the appropriate compiler option.

For C programs, use the -xpg option.

For Fortran programs, use the -pg option.

2. Run your program.

Profiling data is sent to a profile file called gmon.out. This file is overwritten each time you run the program.

3. Run gprof to generate a profile report.

The syntax of the prof command is as follows.

% gprof program-name

Here, program-name is the name of the executable. The profile report is written to stdout, and can be large. The report consists of two major items:

The full call graph profile, which shows information about the callers and callees of each function in the program. The format is illustrated in the example given below.

The flat profile, which is similar to the summary the prof command supplies.

The profile report from gprof contains an explanation of what the various parts of the summary mean and identifies the granularity of the sampling, as shown in the following example.

granularity: each sample hit covers 4 byte(s) for 0.07% of 14.74 seconds

The 4 bytes means resolution to a single instruction. The 0.07% of 14.74 seconds means that each sample, representing ten milliseconds of CPU time, accounts for 0.07% of the run.

The use of gprof is illustrated in the following example.

% cc -xpg -o index.assist index.assist.c

% index.assist

% gprof index.assist > g.output

The following table is part of the call graph profile.

				`called/total parents`
`index`	`%time`	`self`	`descendants`	`called+self`	`name`	`index`
				`called/total children`
`-----------------------------------------------`
		`0.00`	`14.47`	`1/1`	`start`	`[1]`
`[2]`	`98.2`	`0.00`	`14.47`	`1`	`_main`	`[2]`
		`0.59`	`5.70`	`760/760`	`_insert_index_entry`	`[3]`
		`0.02`	`3.16`	`1/1`	`_print_index`	`[6]`
		`0.20`	`1.91`	`761/761`	`_get_index_terms`	`[11]`
		`0.94`	`0.06`	`762/762`	`_fgets`	`[13]`
		`0.06`	`0.62`	`761/761`	`_get_page_number`	`[18]`
		`0.10`	`0.46`	`761/761`	`_get_page_type`	`[22]`
		`0.09`	`0.23`	`761/761`	`_skip_start`	`[24]`
		`0.04`	`0.23`	`761/761`	`_get_index_type`	`[26]`
		`0.07`	`0.00`	`761/820`	`_insert_page_entry`	`[34]`
`-----------------------------------------------`
				`10392`	`_insert_index_entry`	`[3]`
		`0.59`	`5.70`	`760/760`	`_main`	`[2]`
`[3]`	`42.6`	`0.59`	`5.70`	`760+10392`	`_insert_index_entry`	`[3]`
		`0.53`	`5.13`	`11152/11152`	`_compare_entry`	`[4]`
		`0.02`	`0.01`	`59/112`	`_free`	`[38]`
		`0.00`	`0.00`	`59/820`	`_insert_page_entry`	`[34]`
				`10392`	`_insert_index_entry`	`[3]`
`-----------------------------------------------`

In this example there are 761 lines of data in the input file to the index.assist program. The following conclusions can be made:

fgets() is called 762 times. The last call to fgets() returns an end-of-file.

The insert_index_entry() function is called 760 times from main().

In addition to the 760 times that insert_index_entry() is called from main(), insert_index_entry() also calls itself 10,392 times. insert_index_entry() is heavily recursive.

compare_entry(), which is called from insert_index_entry(), is called 11,152 times, which is equal to 760+10,392 times. There is one call to compare_entry() for every time that insert_index_entry() is called. This is correct. If there were a discrepancy in the number of calls, you would suspect some problem in the program logic.

insert_page_entry() is called 820 times in total: 761 times from main() while the program is building index nodes, and 59 times from insert_index_entry(). This frequency indicates that there are 59 duplicated index entries, so their page number entries are linked into a chain with the index nodes. The duplicate index entries are then freed; hence the 59 calls to free().

Using `tcov` for Statement-Level Analysis

The tcov utility gives information on how often a program executes segments of code. It produces a copy of the source file, annotated with execution frequencies. The code can be annotated at the basic block level or the source line level. A basic block is a linear segment of source code with no branches. The statements in a basic block are executed the same number of times, so a count of basic block executions also tells you how many times each statement in the block was executed. The tcov utility does not produce any time-based data.

Note - Although tcov works with both C and C++ programs, it does not support files that contain #line or #file directives. tcov does not enable test coverage analysis of the code in the #include header files.

To generate annotated source code using tcov:

1. Compile your program with the appropriate compiler option.

For C programs, use the -xa option.

For Fortran and C++ programs, use the -a option.

If you compile with the -a or -xa option you must also link with it. The compiler creates a coverage data file with the suffix .d for each object file. The coverage data file is created in the directory specified by the environment variable TCOVDIR. If TCOVDIR is not set, the coverage data file is created in the current directory.

Note - Programs compiled with -xa (C) or -a (other compilers) run more slowly than they normally would, because updating the .d file for each execution takes considerable time.

2. Run your program.

When your program completes, the coverage data files are updated.

3. Run tcov to generate annotated source code.

The syntax of the tcov command is as follows.

% tcov options source-file-list

Here, source-file-list is a list of the source code filenames. For a list of options, see the tcov(1) man page. The default output of tcov is a set of files, each with the suffix .tcov, which can be changed with the -o filename option.

A program compiled for code coverage analysis can be run multiple times (with potentially varying input); tcov can be used on the program after each run to compare behavior.

The following example illustrates the use of tcov.

% cc -xa -o index.assist index.assist.c

% index.assist

% tcov index.assist.c

This small fragment of the C code from one of the modules of index.assist shows the insert_index_entry() function, which is called recursively. The numbers to the left of the C code show how many times each basic block was executed. The insert_index_entry() function is called 11,152 times.

			struct index_entry *

11152		->	insert_index_entry(node, entry)

			struct index_entry *node;

			struct index_entry *entry;

				int result;

				int level;

				result = compare_entry(node, entry);

				if (result == 0) {						/* exact match */

										/* Place the page entry for the duplicate */

										/* into the list of pages for this node */

59		->				insert_page_entry(node, entry->page_entry);

						free(entry);

						return(node);

11093		->		if (result > 0)						/* node greater than new entry -- */

											/* move to lesser nodes */

3956		->				if (node->lesser != NULL)

3626		->						insert_index_entry(node->lesser, entry);

						else {

330		->						node->lesser = entry;

								return (node->lesser);

					else					/* node less than new entry -- */

										/* move to greater nodes */

7137		->				if (node->greater != NULL)

6766		->						insert_index_entry(node->greater, entry);

						else {

371		->						node->greater = entry;

								return (node->greater);

The tcov utility places a summary like the following at the end of the annotated program listing. The statistics for the most frequently executed basic blocks are listed in order of execution frequency. The line number is the number of the first line in the block.

The following is the summary for the index.assist program:

	`Top 10 Blocks`

	`Line`	`Count`
	`240`	`21563`
	`241`	`21563`
	`245`	`21563`
	`251`	`21563`
	`250`	`21400`
	`244`	`21299`
	`255`	`20612`
	`257`	`16805`
	`123`	`12021`
	`124`	`11962`

`77`	`Basic blocks in this file`
`55`	`Basic blocks executed`
`71.43`	`Percent of the file executed`

	`439144`	`Total basic block executions`
	`5703.17`	`Average executions per basic block`

Creating `tcov` Profiled Shared Libraries

It is possible to create a tcov profiled shareable library and use it in place of the corresponding library in binaries which have already been linked. Include the -xa (C) or -a (other compilers) option when creating the shareable libraries, as shown in this example.

% cc -G -xa -o foo.so.1 foo.o

This command includes a copy of the tcov profiling functions in the shareable libraries, so that clients of the library do not need to relink. If a client of the library is already linked for profiling, then the version of the tcov functions used by the client is used to profile the shareable library.

Locking Files

tcov uses a simple file-locking mechanism for updating the block coverage database in the .d files. It employs a single file, tcov.lock, for this purpose. Consequently, only one executable compiled with -xa (C) or -a (other compilers) should be running on the system at a time. If the execution of the program compiled with the -xa (or -a) option is manually terminated, then the tcov.lock file must be deleted manually.

Files compiled with the -xa or -a option call the profiling tool functions automatically when a program is linked for tcov profiling. At program exit, these functions combine the information collected at runtime for file xyz.f (for example) with the existing profiling information stored in file xyz.d. To ensure this information is not corrupted by several people simultaneously running a profiled binary, a xyz.d.lock lock file is created for xyz.d for the duration of the update. If there are any errors in opening or reading xyz.d or its lock file, or if there are inconsistencies between the runtime information and the stored information, the information stored in xyz.d is not changed.

If you edit and recompile xyz.f the number of counters in xyz.d can change. This is detected if an old profiled binary is run.

If too many people are running a profiled binary, some of them cannot obtain a lock. An error message is displayed after a delay of several seconds. The stored information is not updated. This locking is safe across a network. Since locking is performed on a file-by-file basis, other files may be correctly updated.

The profiling functions attempt to deal with automounted file systems that have become inaccessible. They still fail if the file system containing a coverage data file is mounted with different names on different machines, or if the user running the profiled binary does not have permission to write to either the coverage data file or the directory containing it. Be sure all the directories are uniformly named and writable by anyone expected to run the binary.

Errors Reported by `tcov` Runtime Functions

The following error messages may be reported by the tcov runtime functions:

The user running the binary lacks permission to read or write to the coverage data file. The problem also occurs if the coverage data file has been deleted.

tcov_exit: Could not open coverage data file 'coverage-data-file-name' because 'system-error-message-string'.

The user running the binary lacks permission to write to the directory containing the coverage data file. The problem also occurs if the directory containing the coverage data file is not mounted on the machine where the binary is being run.

tcov_exit: Could not write coverage data file 'coverage-data-file-name' because 'system-error-message-string'.

Too many users are trying to update a coverage data file at the same time. The problem also occurs if a machine has crashed while a coverage data file is being updated, leaving behind a lock file. In the event of a crash, the longer of the two files should be used as the post-crash coverage data file. Manually remove the lock file.

tcov_exit: Failed to create lock file 'lock-file-name' for coverage data file 'coverage-data-file-name' after 5 tries. Is someone else running this executable?

No memory is available, and the standard I/O package will not work. You cannot update the coverage data file at this point.

tcov_exit: Stdio failure, probably no memory left.

The lock file name is longer by six characters than the coverage data file name. Therefore, the derived lock file name may not be legal.

tcov_exit: Coverage data file path name too long (length characters) 'coverage-data-file-name'.

A library or binary that has tcov profiling enabled is simultaneously being run, edited, and recompiled. The old binary expects a coverage data file of a certain size, but the editing often changes that size. If the compiler creates a new coverage data file at the same time that the old binary is trying to update the old coverage data file, the binary may see an apparently empty or corrupt coverage file.

tcov_exit: Coverage data file 'coverage-data-file-name' is too short. Is it out of date?

Using `tcov` Enhanced for Statement-Level Analysis

Like the original tcov, tcov Enhanced gives line-by-line information on how a program executes. It produces a copy of the source file, annotated to show which lines are used and how often. It also gives a summary of information about basic blocks. tcov Enhanced works with both C and C++ source files.

tcov Enhanced overcomes some of the shortcomings of the original tcov. The improved features of tcov Enhanced are:

It provides more complete support for C++.

It supports code found in #include header files and corrects a flaw that obscured coverage numbers for template classes and functions.

Its runtime is more efficient than the original tcov runtime.

It is supported for all the platforms that the compilers support.

To generate annotated source code using tcov Enhanced:

1. Compile your program with the -xprofile=tcov compiler option.

Unlike tcov, tcov Enhanced does not generate any files at compile time.

2. Run your program.

A directory is created to store the profile data, and a single coverage data file called tcovd is created in that directory. By default, the directory is created in the location where you run the program program-name, and it is called program-name.profile. The directory is also known as the profile bucket. The defaults can be changed using environment variables (see tcov Directories and Environment Variables).

3. Run tcov to generate annotated source code.

The syntax of the tcov command is as follows.

% tcov option-list source-file-list

Here, source-file-list is a list of the source code filenames, and option-list is a list of options, which can be obtained from the tcov(1) man page. You must include the -x option to enable tcov Enhanced processing.

The default output of tcov Enhanced is a set of annotated source files whose names are derived by appending .tcov to the corresponding source file name.

The following example illustrates the syntax of tcov Enhanced.

% cc -xprofile=tcov -o index.assist index.assist.c

% index.assist

% tcov -x index.assist.profile index.assist.c

The output of tcov Enhanced is identical to the output from the original tcov.

Creating Profiled Shared Libraries for `tcov` Enhanced

You can create profiled shared libraries for use with tcov Enhanced by including the -xprofile=tcov compiler option, as shown in the following example.

% cc -G -xprofile=tcov -o foo.so.1 foo.o

Locking Files

tcov Enhanced uses a simple file-locking mechanism for updating the block coverage data file. It employs a single file created in the same directory as the tcovd file. The file name is tcovd.temp.lock. If execution of the program compiled for coverage analysis is manually terminated, then the lock file must be deleted manually.

The locking scheme does an exponential back-off if there is a contention for the lock. If, after five tries, the tcov runtime cannot acquire the lock, it exits, and the data is lost for that run. In this case, the following message is displayed.

tcov_exit: temp file exists, is someone else running this executable?

`tcov` Directories and Environment Variables

When you compile a program for tcov and run the program, the running program generates a profile bucket. If a previous profile bucket exists, the program uses that profile bucket. If a profile bucket does not exist, it creates the profile bucket.

The profile bucket specifies the directory where the profile output is generated. The name and location of the profile output are controlled by defaults that you can modify with environment variables.

Note - tcov uses the same defaults and environment variables that are used by the compiler options that you use to gather profile feedback: -xprofile=collect and -xprofile=use. For more information about these compiler options, see the documentation for the relevant compiler.

The default profile bucket is named after the executable with a .profile extension and is created in the directory where the executable is run. Therefore, if you run a program called /usr/bin/xyz from /home/userdir, the default behavior is to create a profile bucket called xyz.profile in /home/userdir.

A UNIX process can change its current working directory during the execution of a program. The current working directory used to generate the profile bucket is the current working directory of the program at exit. In the rare case where a program actually does change its current working directory during execution, you can use the environment variables to control where the profile bucket is generated.

You can set the following environment variables to modify the defaults:

SUN_PROFDATA

Can be used to specify the name of the profile bucket at runtime. The value of this variable is always appended to the value of SUN_PROFDATA_DIR if both variables are set. Doing this may be useful if the name of the executable is not the same as the value in argv[0] (for example, the invocation of the executable was through a symbolic link with a different name).

SUN_PROFDATA_DIR

Can be used to specify the name of the directory that contains the profile bucket. It is used at runtime and by the tcov command.

TCOVDIR

TCOVDIR is supported as a synonym for SUN_PROFDATA_DIR to maintain backward compatibility. Any setting of SUN_PROFDATA_DIR causes TCOVDIR to be ignored. If both SUN_PROFDATA_DIR and TCOVDIR are set, a warning is displayed when the profile bucket is generated.

TCOVDIR is used at runtime and by the tcov command.

Using prof to Generate a Program Profile

Using gprof to Generate a Call Graph Profile

Using tcov for Statement-Level Analysis

Creating tcov Profiled Shared Libraries

Locking Files

Errors Reported by tcov Runtime Functions

Using tcov Enhanced for Statement-Level Analysis

Creating Profiled Shared Libraries for tcov Enhanced

Locking Files

tcov Directories and Environment Variables

Using `prof` to Generate a Program Profile

Using `gprof` to Generate a Call Graph Profile

Using `tcov` for Statement-Level Analysis

Creating `tcov` Profiled Shared Libraries

Errors Reported by `tcov` Runtime Functions

Using `tcov` Enhanced for Statement-Level Analysis

Creating Profiled Shared Libraries for `tcov` Enhanced

`tcov` Directories and Environment Variables