Analyzing Program Performance with Sun WorkShop HomeContentsPreviousNextIndex


Appendix A

Traditional Profiling Tools

The tools discussed in this appendix are standard utilities for timing programs and obtaining performance data to analyze. The profiling tools prof and gprof are provided with Solaris versions 2.6, 7, and 8 of the Solaris Operating Environment SPARC Platform Edition and Solaris Intel Platform Edition. tcov is a code coverage tool.


Note – If you want to track execution counts (how many times a function is called, how often a line of source code executes), use the traditional profiling tools. If you want a detailed analysis of where your program is spending time, you can get more accurate information using the Sampling Collector and Analyzer. See Chapter 3 and Chapter 4 for information on using these applications.

Not all the traditional profiling tools work on modules written in programming languages other than C. See the sections on each tool for more information about languages.

This chapter covers the following topics:

Basic Concepts

prof, gprof, and tcov extend the Sun WorkShop development environment by enabling you to collect and use performance data.

TABLE A-1 describes the information generated by these standard performance profiling tools.

TABLE A-1   Performance Profiling Tools 
Command Output
prof Generates a statistical profile of the CPU time used by a program, along with an exact count of the number of times each function is entered. This tool is included with the Solaris operating environment.
gprof Generates a statistical profile of the CPU time used by a program, along with an exact count of the number of times each function is entered and the number of times each arc (caller-callee pair) in the program's call graph is traversed. This tool is included with the Solaris operating environment.
tcov Generates exact counts of the number of times each statement in a program is executed. There are two versions of tcov: the original tcov and an enhanced tcov. They differ in the runtime support that generates the raw data used in the output.


Using prof to Generate a Program Profile

prof generates a statistical profile of the CPU time used by a program and counts the number of times each function in a program is entered. Different, or more detailed data, is provided by the gprof call-graph profile and the tcov code coverage tools.

Using prof involves three basic steps:

  1. Compiling the program for prof

  2. Running the program to produce a profile data file

  3. Using prof to generate a report that summarizes the data

To compile a program for profiling with prof, use the -p option to the compiler. For example, to compile a C source file named index.assist.c for profiling, you use this compiler command:

% cc -p -o index.assist index.assist.c 

The compiler produces a program called index.assist.

Now you run the index.assist program. Each time you run the program, profiling data is sent to a profile file called mon.out. Every time you run the program a new mon.out file is created that overwrites the old version.

Finally, you would use the prof command to generate a report:

% index.assist 
% ls mon.out 
mon.out
% prof index.assist

Output Example

The prof output resembles this example.

%Time
Seconds
Cumsecs
#Calls
msecs/
call
Name 
19.4
3.28
3.28
11962
0.27
compare_strings 
15.6
2.64
5.92
32731
0.08
_strlen 
12.6
2.14
8.06
4579
0.47
__doprnt 
10.5
1.78
9.84
 
mcount
9.9
1.68
11.52
6849
0.25
_get_field 
5.3
0.90
12.42
762
1.18
_fgets 
4.7
0.80
13.22
19715
0.04
_strcmp 
4.0
0.67
13.89
5329
0.13
_malloc 
3.4
0.57
14.46
11152
0.05
_insert_index_entry 
3.1
0.53
14.99
11152
0.05
_compare_entry 
2.5
0.42
15.41
1289
0.33
lmodt 
0.9
0.16
15.57
761
0.21
_get_index_terms 
0.9
0.16
15.73
3805
0.04
_strcpy 
0.8
0.14
15.87
6849
0.02
_skip_space 
0.7
0.12
15.99
13
9.23
_read 
0.7
0.12
16.11
1289
0.09
ldivt 
0.6
0.10
16.21
1405
0.07
_print_index 
.  
.
.  (The rest of the output is insignificant) 


Sample prof Output

The sample display shows that most of the program execution time is spent in the compare_strings() routine; after that, most of the CPU time is spent in the _strlen() library routine. To make improvements to this program, concentrate on the compare_strings() function.

The results of the profiling sample run are listed under these column headings:

What results can be derived from the profile data? The compare_strings() function consumes nearly 20% of the total CPU time. To improve the runtime of index.assist, you can either improve the algorithm that compare_strings() uses, or cut down the number of calls to compare_strings().

It is not obvious from the flat call graph that compare_strings() is heavily recursive, but you can deduce this by using the call graph profile described in Using gprof to Generate a Call Graph Profile. In this particular case, improving the algorithm also reduces the number of calls.


Note – For Solaris 7 and 8 platforms, the profile of CPU time is accurate for programs that use multiple CPUs, but the fact that the counts are not locked may affect the accuracy of the counts for functions.

Using gprof to Generate a Call Graph Profile

While the flat profile from prof can provide valuable data for performance improvements, a more detailed analysis can be obtained by using a call graph profile to display a list identifying which modules are called by other modules, and which modules call other modules. Sometimes removing calls altogether can result in performance improvements.


Note – gprof allocates the time within a function to the callers in proportion to the number of times that each arc is traversed. Because all calls are not equivalent in performance, this behavior might lead to incorrect assumptions. See "The gprof Fallacy" on page 19 for an example.

Like prof, gprof generates a statistical profile of the CPU time that is used by a program and it counts the number of times that each function is entered. gprof also counts the number of times that each arc in the program's call graph is traversed. An arc is a caller-callee pair.


Note – For Solaris 7 and 8 platforms, the profile of CPU time is accurate for programs that use multiple CPUs, but the fact that the counts are not locked may affect the accuracy of the counts for functions.

Using gprof involves three basic steps:

  1. Compiling the program for gprof

  2. Running the program to produce a profile data file

  3. Using gprof to generate a report that summarizes the data

To compile the program for call graph profiling, use the -xpg option for the C compiler or the -pg option for the Fortran compiler. For example:

% cc -xpg -o index.assist index.assist.c 

Now run the index.assist program. Each time you run a program compiled for gprof, call-graph profile data is sent to a file called gmon.out. This file is overwritten each time you run the program.

Finally, use the gprof command to generate a report of the results of the profile. The output from gprof can be large. You might find the report easier to read if you redirect it to a file. To redirect the output from gprof to a file named g.output, use this sequence of commands:

% index.assist 
% ls gmon.out 
gmon.out 
% gprof index.assist > g.output

The output from gprof consists of two major items:

The output from gprof contains an explanation of what the various parts of the summary mean. gprof also identifies the granularity of the sampling:

granularity: each sample hit covers 4 byte(s) for 0.07% of 14.74 
seconds

The "4 bytes" means resolution to a single instruction. The "0.07% of 14.74 seconds" means that each sample, representing ten milliseconds of CPU time, accounts for 0.07% of the run.

The following example is part of the call graph profile.





called/total parents

index %time self descendants called+self name index




called/total children

-----------------------------------------------


0.00 14.47 1/1 start [1]
[2] 98.2 0.00 14.47 1 _main [2]


0.59 5.70 760/760 _insert_index_entry [3]


0.02 3.16 1/1 _print_index [6]


0.20 1.91 761/761 _get_index_terms [11]


0.94 0.06 762/762 _fgets [13]


0.06 0.62 761/761 _get_page_number [18]


0.10 0.46 761/761 _get_page_type [22]


0.09 0.23 761/761 _skip_start [24]


0.04 0.23 761/761 _get_index_type [26]


0.07 0.00 761/820 _insert_page_entry [34]
-----------------------------------------------




10392 _insert_index_entry [3]


0.59 5.70 760/760 _main [2]
[3] 42.6 0.59 5.70 760+10392 _insert_index_entry [3]


0.53 5.13 11152/11152 _compare_entry [4]


0.02 0.01 59/112 _free [38]


0.00 0.00 59/820 _insert_page_entry [34]




10392 _insert_index_entry [3]
-----------------------------------------------


If you assume that there are 761 lines of data in the input file to the index.assist program, you can conclude:

Using tcov for Statement-Level Analysis

tcov gives line-by-line information on how a program executes. It produces a copy of the source file, annotated to show which lines are used and how often. It also summarizes information about basic blocks. tcov does not product any time-based data.

Using tcov involves three basic steps:

  1. Compiling the program to produce a tcov experiment

  2. Running the experiment

  3. Using tcov to create summaries of execution counts for each statement in the program

Compiling for tcov

To compile a program for code coverage, use the -xa option to the C compiler. For example, you compile a program named index.assist.c for use with tcov with this command:

% cc -xa -o index.assist index.assist.c

Use the -a compiler option instead of -xa with the C++ or Fortran compilers.

The C compiler generates an index.assist.d file, containing database entries for the basic blocks present in index.assist.c. When the program index.assist is run to completion, the compiler updates the index.assist.d file.


Note – Although tcov works with both C and C++ programs, it does not support files that contain #line or #file directives. tcov does not enable test coverage analysis of the code in the #include header files. Applications compiled with -xa (C), -a (other compilers), and +d (C++) run more slowly than they normally would. The +d option inhibits expansion of C++ inline functions, and updating the .d file for each execution takes considerable time.

The index.assist.d file is created in the directory specified by the environment variable TCOVDIR. If TCOVDIR is not set, index.assist.d is created in the current directory.

Having compiled index.assist.c, you can run index.assist:

% index.assist 
% ls *.d
index.assist.d

Now you can run tcov to produce a file containing the summaries of execution counts for each statement in the program. tcov uses the index.assist.d file to generate an index.assist.tcov file containing an annotated list of your code. The output shows the number of times each source statement is executed. At the end of the file, there is a short summary.

% tcov index.assist.c 
% ls *.tcov
index.assist.tcov

This small fragment of the C code from one of the modules of index.assist shows the insert_index_entry() function, which is called so recursively.

	 	 	 struct index_entry * 
11152	 	 ->	 insert_index_entry(node, entry) 
	 	 	 struct index_entry *node; 
	 	 	 struct index_entry *entry; 
	 	 	 { 
	 	 	 	 int result; 
	 	 	 	 int level; 
 
	 	 	 	 result = compare_entry(node, entry); 
	 	 	 	 if (result == 0) {	 	 	 	 	 	 	 	 /* exact match */ 
	 	 	 	 	 	 	 	 	 	 	 	 	 /* Place the page entry for the duplicate */ 
	 	 	 	 	 	 	 	 	 	 	 	 /* into the list of pages for this node */ 
59	 	 ->	 	 	 	 insert_page_entry(node, entry->page_entry); 
	 	 	 	 	 	 free(entry); 
	 	 	 	 	 	 return(node); 
	 	 	 	 } 
 
11093	 	 ->	 	 if (result > 0)	 	 	 	 	 	 /* node greater than new entry -- */ 
	 	 	 	 	 	 	 	 	 	 	 /* move to lesser nodes */ 
3956	 	 ->	 	 	 	 if (node->lesser != NULL) 
3626	 	 ->	 	 	 	 	 	 insert_index_entry(node->lesser, entry); 
	 	 	 	 	 	 else { 
330	 	 ->	 	 	 	 	 	 node->lesser = entry; 
	 	 	 	 	 	 	 	 return (node->lesser); 
	 	 	 	 	 	 }
	 	 	 	 	 else	 	 	 	 	 /* node less than new entry -- */ 
	 	 	 	 	 	 	 	 	 	 /* move to greater nodes */ 
7137	 	 ->	 	 	 	 if (node->greater != NULL) 
6766	 	 ->	 	 	 	 	 	 insert_index_entry(node->greater, entry); 
	 	 	 	 	 	 else { 
371	 	 ->	 	 	 	 	 	 node->greater = entry; 
	 	 	 	 	 	 	 	 return (node->greater); 
	 	 	 	 	 	 } 
	 	 	 } 

The numbers to the left of the C code show how many times each statement was executed. The insert_index_entry function is called 11,152 times.

tcov places a summary like the following at the end of the annotated program listing for index.assist.tcov:

  Top 10 Blocks 
Line
 Count 
240
 21563 
241
 21563 
245
 21563 
251
 21563 
250
 21400 
244
 21299 
255
 20612 
257
 16805 
123
 12021 
124
 11962 
77
Basic blocks in this file 
55
Basic blocks executed
71.43
Percent of the file executed
439144
Total basic block executions
5703.17
Average executions per basic block


A program compiled for code coverage analysis can be run multiple times (with potentially varying input); tcov can be used on the program after each run to compare behavior.

Creating tcov Profiled Shared Libraries

It is possible to create a tcov profiled shareable library and use it in place of one where binaries have already been linked. Include the -xa (C) or -a (other compilers) option when creating the shareable libraries. For example:

% cc -G -xa -o foo.so.1 foo.o 

This command includes a copy of the tcov profiling subroutines in the shareable libraries, so that clients of the library do not need to relink. If a client of the library is also linked for profiling, then the version of the tcov subroutines used by the client is used to profile the shareable library.

Locking Files

tcov uses a simple file-locking mechanism for updating the block coverage database in the .d files. It employs a single file, tcov.lock, for this purpose. Consequently, only one executable compiled with -xa (C) or -a (other compilers) should be running on the system at a time. If the execution of the program compiled with the -xa (or -a) option is manually terminated, then the tcov.lock file must be deleted manually.

Files compiled with the -xa or -a option call the profiling tool subroutines automatically when a program is linked for tcov profiling. At program exit, these subroutines combine the information collected at runtime for file xyz.f (for example) with the existing profiling information stored in file xyz.d. To ensure this information is not corrupted by several people simultaneously running a profiled binary, a xyz.d.lock lock file is created for xyz.d for the duration of the update. If there are any errors in opening or reading xyz.d or its lock file, or if there are inconsistencies between the runtime information and the stored information, the information stored in xyz.d is not changed.

An edit and recompile of xyz.d may change the number of counters in xyz.d. This is detected if an old profiled binary is run.

If too many people are running a profiled binary, some of them cannot obtain a lock. An error message similar to the following is displayed after a delay of several seconds:

tcov_exit: Failed to create lock file '/tmp_mnt/net/rbbb/export/
home/src/newpattern/foo.d.lock' for coverage data file '/tmp_mnt/
net/rbbb/export/home/src/newpattern/foo.d' after 5 tries. Is 
somebody else running this binary?

The stored information is not updated. This locking is safe across a network. Since locking is performed on a file-by-file basis, other files may be correctly updated.

The profiling subroutines attempt to deal with automounted file systems that have become inaccessible. They still fail if the file system containing a coverage data file is mounted with different names on different machines, or if the user running the profiled binary does not have permission to write to either the coverage data file or the directory containing it. Be sure all the directories are uniformly named and writable by anyone expected to run the binary.

Errors Reported by tcov Runtime Routines

The following error messages may be reported by the tcov runtime routines:

Using tcov Enhanced for Statement-Level Analysis

Like the original tcov, tcov Enhanced gives line-by-line information on how a program executes. It produces a copy of the source file, annotated to show which lines are used and how often. It also gives a summary of information about basic blocks. tcov Enhanced works with both C and C++ source files.

Advantages of tcov Enhanced

tcov Enhanced overcomes some of the shortcomings of the original tcov:

Compiling for tcov Enhanced

To use tcov Enhanced, follow the same basic steps as the original tcov:

  1. Compile a program for a tcov Enhanced experiment.

  2. Run the experiment.

  3. Analyze the results using tcov.

For the original tcov, you compile with the -xa option. To compile a program for code coverage with tcov Enhanced, you use the -xprofile=tcov option for all compilers. Using a program named index.assist as an example, you would compile for use with tcov Enhanced with this command:

% cc -xprofile=tcov -o index.assist index.assist.c 

tcov Enhanced, unlike tcov, does not produce a .d file. The coverage data file is not created until the program is run. Then one coverage data file is produced as opposed to one file for each module compiled for coverage analysis.

After you compile index.assist.c, you run index.assist to create the profile file:

% index.assist 
% ls -dF *.profile 
index.assist.profile/ 
% ls *.profile 
tcovd

By default, the name of the directory where the tcovd file is stored is derived from the name of the executable. Furthermore, that directory is created in the directory the executable was run in (the original tcov created the .d files in the directory where the modules were compiled).

The directory where the tcovd file is stored is also known as the profile bucket. The profile bucket can be overridden by using the SUN_PROFDATA environment variable. Doing this may be useful if the name of the executable is not the same as the value in argv[0] (for example, the invocation of the executable was through a symbolic link with a different name).

You can also override the directory where the profile bucket is created. To specify a location different from the run directory, specify the path using the SUN_PROFDATA_DIR environment variable. Absolute or relative pathnames can be specified in this variable. Relative pathnames are relative to the program's current working directory at program completion.

TCOVDIR is supported as a synonym for SUN_PROFDATA_DIR for backward compatibility. Any setting of SUN_PROFDATA_DIR causes TCOVDIR to be ignored. If both SUN_PROFDATA_DIR and TCOVDIR are set, a warning is displayed when the profile bucket is generated. SUN_PROFDATA_DIR takes precedence over TCOVDIR. The variables are used at runtime by a program compiled with -xprofile=tcov, and are used by the tcov command.


Note – This scheme is also used by the profile feedback mechanism.

Now that some coverage data has been produced, you could generate a report that relates the raw data back to the source files:

% tcov -x index.profile index.assist.c 
% ls *.tcov 
index.assist.c.tcov

The output of this report is identical to the one from the previous example (for the original tcov).

Creating Profiled Shared Libraries

Creating shared libraries for use with tcov Enhanced is accomplished by using the analogous compiler options:

% cc -G -xprofile=tcov -o foo.so.1 doo.o 

Locking Files

tcov Enhanced uses a simple file-locking mechanism for updating the block coverage data file. It employs a single file created in the same directory as the tcovd file. The file name is tcovd.temp.lock. If execution of the program compiled for coverage analysis is manually terminated, then the lock file must be deleted manually.

The locking scheme does an exponential back-off if there is a contention for the lock. If, after five tries, the tcov runtime cannot acquire the lock, it exits, and the data is lost for that run. In this case, the following message is displayed:

tcov_exit: temp file exists, is someone else running this 
executable?

tcov Directories and Environment Variables

When you compile a program for tcov and run the program, the running program generates a profile bucket. If a previous profile bucket exists, the program uses that profile bucket. If a profile bucket does not exist, it creates the profile bucket.

The profile bucket specifies the directory where the profile output is generated. The name and location of the profile output are controlled by defaults that you can modify with environment variables.


Note – tcov uses the same defaults and environment variables that are used by the compiler options that you use to gather profile feedback: -xprofile=collect and -xprofile=use. For more information about these compiler options, see the documentation for your compiler.

The default profile bucket the program creates is named after the executable with a .profile extension and is created in the directory where the executable is run. Therefore, if you are in /home/userdir, and run a program called /usr/bin/xyz, the default behavior is to create a profile bucket called xyz.profile in /home/userdir.

You set the following environment variables to modify the defaults:

Overriding the Default Definitions

To override the default, use the environment variables to change the profile bucket:

1. Change the name of the profile bucket by using the SUN_PROFDATA environment variable.

2. Change the directory where the profile-bucket is placed by using the SUN_PROFDATA_DIR environment variable.

The environment variables override the default location and name of the profile bucket. Both can be overridden independently. For example, if you only choose to set SUN_PROFDATA_DIR, the profile bucket will go into the directory where you set SUN_PROFDATA_DIR. The default name (which is the executable name followed by a .profile extension) will still be the name used for the profile bucket.

Absolute and Relative Pathnames

There are two forms of directories you can specify by using SUN_PROFDATA_DIR on the Profile Feedback compile line: absolute pathnames (which start with a `/'), and relative pathnames. If you use an absolute pathname, the profile bucket is dropped into that directory. If you specify a relative pathname, then it is relative to the current working directory where the executable is being run.

For example, if you are in /home/userdir and run a program called /usr/bin/xyz with SUN_PROFDATA_DIR set to .. , then the profile bucket is called /home/userdir/../xyz.profile. The value specified in the environment variable was relative, and therefore, it was relative to /home/userdir. Also, the default profile bucket name is used, which is named after the executable.

TCOVDIR and SUN_PROFDATA_DIR

The previous version of tcov (enabled by compiling with the -xa or -a flag) used an environment variable called TCOVDIR. TCOVDIR specified the directory where the tcov counter files go to instead of next to the source files. To retain compatibility with this environment variable, the new SUN_PROFDATA_DIR environment variable behaves like the TCOVDIR environment variable. If both variables are set, a warning is output and SUN_PROFDATA_DIR takes precedence over TCOVDIR.

For -xprofile=tcov

By default the profile bucket is called <argv[0]>.profile in the current directory.

If you set SUN_PROFDATA, the profile bucket is called $SUN_PROFDATA, wherever it is located.

If you set SUN_PROFDATA_DIR, the profile bucket is placed in the specified directory.

SUN_PROFDATA and SUN_PROFDATA_DIR are independent. If both are specified, the profile bucket name is generated by using SUN_PROFDATA_DIR to find the profile bucket and, SUN_PROFDATA is used to name the profile bucket in that directory.

A UNIX process can change its current working directory during the execution of a program. The current working directory used to generate the profile bucket is the current working directory of the program at exit. In the rare case where a program actually does change its current working directory during execution, you can use the environment variables to control where the profile bucket is generated.

For the tcov Program

The -xprofile=bucket option specifies the name of the profile bucket to use for the tcov profile. SUN_PROFDATA_DIR or TCOVDIR are prepended to this argument, if they are set.


Sun Microsystems, Inc.
Copyright information. All rights reserved.
Feedback
Library   |   Contents   |   Previous   |   Next   |   Index