While the flat profile from prof can provide valuable data for performance improvements, a more detailed analysis can be obtained by using a call graph profile to display a list identifying which modules are called by other modules, and which modules call other modules. Sometimes removing calls altogether can result in performance improvements.
gprof allocates the time within a function to the callers in proportion to the number of times each arc is traversed. Because all calls are not equivalent in performance, this behavior might lead to incorrect assumptions.
Like prof, gprof generates a statistical profile of the CPU time used by a program and it counts the number of times each function is entered. gprof also counts the number of times each arc in the program's call graph is traversed. An arc is a caller-callee pair.
For Solaris 2.6 and Solaris 7, the profile of CPU time is accurate for programs that use multiple CPUs, but the fact that the counts are not locked may affect the accuracy of the counts for functions.
Using gprof involves three basic steps:
Compiling the program for gprof
Running the program to produce a profile data file
Using gprof to generate a report that summarizes the data
To compile the program for call graph profiling, use the -xpg option for the C compiler or the -pg option for the Fortran compiler. For example:
% cc -xpg -o index.assist index.assist.c |
Now you could run the index.assist program. Each time that you run a program compiled for gprof, call-graph profile data is sent to a file called gmon.out. This file is recreated each time that you run the program.
Use the gprof command to generate a report of the results of the profile. The output from gprof can be large. You might find the report easier to read if you redirect it to a file. To redirect the output from gprof to a file named /tmp/g.output, you would use this sequence of commands:
% index.assist % ls gmon.outgmon.out % gprof index.assist > /tmp/g.output |
The output from gprof consists of two major items:
The full call graph profile, which shows fragments of output from a profiling run.
The "flat" profile, similar to the summary the prof command supplies.
The output from gprof contains an explanation of what the various parts of the summary mean. gprof also identifies the granularity of the sampling:
granularity: each sample hit covers 4 byte(s) for 0.07% of 14.74 seconds
The "4 bytes" means resolution to a single instruction. The "0.07% of 14.74 seconds" means that each sample, representing ten milliseconds of CPU time, accounts for 0.07% of the run.
This is part of the call graph profile.
|
|
|
|
called/total parents |
|
|
index |
%time |
self |
descendents |
called+self |
name |
index |
|
|
|
|
called/total children |
|
|
----------------------------------------------- |
||||||
|
|
0.00 |
14.47 |
1/1 |
start |
[1] |
[2] |
98.2 |
0.00 |
14.47 |
1 |
_main |
[2] |
|
|
0.59 |
5.70 |
760/760 |
_insert_index_entry |
[3] |
|
|
0.02 |
3.16 |
1/1 |
_print_index |
[6] |
|
|
0.20 |
1.91 |
761/761 |
_get_index_terms |
[11] |
|
|
0.94 |
0.06 |
762/762 |
_fgets |
[13] |
|
|
0.06 |
0.62 |
761/761 |
_get_page_number |
[18] |
|
|
0.10 |
0.46 |
761/761 |
_get_page_type |
[22] |
|
|
0.09 |
0.23 |
761/761 |
_skip_start |
[24] |
|
|
0.04 |
0.23 |
761/761 |
_get_index_type |
[26] |
|
|
0.07 |
0.00 |
761/820 |
_insert_page_entry |
[34] |
----------------------------------------------- |
||||||
|
|
|
|
10392 |
_insert_index_entry |
[3] |
|
|
0.59 |
5.70 |
760/760 |
_main |
[2] |
[3] |
42.6 |
0.59 |
5.70 |
760+10392 |
_insert_index_entry |
[3] |
|
|
0.53 |
5.13 |
11152/11152 |
_compare_entry |
[4] |
|
|
0.02 |
0.01 |
59/112 |
_free |
[38] |
|
|
0.00 |
0.00 |
59/820 |
_insert_page_entry |
[34] |
|
|
|
|
10392 |
_insert_index_entry |
[3] |
----------------------------------------------- |
Assuming there are 761 lines of data in the input file to the index.assist program, the following conclusions can be drawn:
fgets is called 762 times. The last call to fgets returns an end-of-file.
The insert_index_entry function is called 760 times from main.
In addition to the 760 times insert_index_entry is called from main, insert_index_entry also calls itself 10,392 times. insert_index_entry is heavily recursive.
compare_entry (which is called from insert_index_entry) is called 11,152 times, which is equal to 760+10,392 times. There is one call to compare_entry for every time insert_index_entry is called. This is as it should be. If there were a discrepancy in the number of calls, you could suspect some problem in the program logic.
insert_page_entry is called 820 times in total: 761 times from main while the program is building index nodes, and 59 times from insert_index_entry. This frequency indicates there are 59 duplicated index entries, so their page number entries are linked into a chain with the index nodes. The duplicate index entries are then freed; hence the 59 calls to free().