Analyzing Program Performance With Sun WorkShop

Using gprof to Generate a Call Graph Profile

While the flat profile from prof can provide valuable data for performance improvements, a more detailed analysis can be obtained by using a call graph profile to display a list identifying which modules are called by other modules, and which modules call other modules. Sometimes removing calls altogether can result in performance improvements.


Note -

gprof allocates the time within a function to the callers in proportion to the number of times each arc is traversed. Because all calls are not equivalent in performance, this behavior might lead to incorrect assumptions.


Like prof, gprof generates a statistical profile of the CPU time used by a program and it counts the number of times each function is entered. gprof also counts the number of times each arc in the program's call graph is traversed. An arc is a caller-callee pair.


Note -

For Solaris 2.6 and Solaris 7, the profile of CPU time is accurate for programs that use multiple CPUs, but the fact that the counts are not locked may affect the accuracy of the counts for functions.


Using gprof involves three basic steps:

  1. Compiling the program for gprof

  2. Running the program to produce a profile data file

  3. Using gprof to generate a report that summarizes the data

To compile the program for call graph profiling, use the -xpg option for the C compiler or the -pg option for the Fortran compiler. For example:

% cc -xpg -o index.assist index.assist.c

Now you could run the index.assist program. Each time that you run a program compiled for gprof, call-graph profile data is sent to a file called gmon.out. This file is recreated each time that you run the program.

Use the gprof command to generate a report of the results of the profile. The output from gprof can be large. You might find the report easier to read if you redirect it to a file. To redirect the output from gprof to a file named /tmp/g.output, you would use this sequence of commands:

% index.assist % ls gmon.outgmon.out % gprof index.assist > /tmp/g.output

The output from gprof consists of two major items:

The output from gprof contains an explanation of what the various parts of the summary mean. gprof also identifies the granularity of the sampling:

granularity: each sample hit covers 4 byte(s) for 0.07% of 14.74 seconds

The "4 bytes" means resolution to a single instruction. The "0.07% of 14.74 seconds" means that each sample, representing ten milliseconds of CPU time, accounts for 0.07% of the run.

This is part of the call graph profile.

 

 

 

 

called/total parents 

 

 

index 

%time 

self 

descendents 

called+self  

name 

index 

 

 

 

 

called/total children  

 

 

-----------------------------------------------  

 

 

0.00 

14.47 

1/1 

start 

[1]  

[2] 

98.2 

0.00 

14.47 

_main 

[2]  

 

 

0.59 

5.70 

760/760 

_insert_index_entry 

[3]  

 

 

0.02 

3.16 

1/1 

_print_index 

[6]  

 

 

0.20 

1.91 

761/761 

_get_index_terms 

[11]  

 

 

0.94 

0.06 

762/762 

_fgets 

[13]  

 

 

0.06 

0.62 

761/761 

_get_page_number 

[18]  

 

 

0.10 

0.46 

761/761 

_get_page_type 

[22]  

 

 

0.09 

0.23 

761/761 

_skip_start 

[24]  

 

 

0.04 

0.23 

761/761 

_get_index_type 

[26]  

 

 

0.07 

0.00 

761/820 

_insert_page_entry 

[34]  

-----------------------------------------------  

 

 

 

 

10392 

_insert_index_entry 

[3]  

 

 

0.59 

5.70 

760/760 

_main 

[2]  

[3] 

42.6 

0.59 

5.70 

760+10392 

_insert_index_entry 

[3]  

 

 

0.53 

5.13 

11152/11152 

_compare_entry 

[4]  

 

 

0.02 

0.01 

59/112 

_free 

[38]  

 

 

0.00 

0.00 

59/820 

_insert_page_entry 

[34]  

 

 

 

 

10392 

_insert_index_entry 

[3] 

-----------------------------------------------  

Assuming there are 761 lines of data in the input file to the index.assist program, the following conclusions can be drawn: