Fortran Programming Guide

The `gprof` Profiling Command

The gprof(1) command provides a detailed postmortem analysis of program timing at the subprogram level, including how many times a subprogram was called, who called it, whom it called, and how much time was spent in the routine and by the routines it called.

To enable gprof profiling, compile and link the program with the -pg option:

demo% f77 -o Myprog -fast -pg Myprog.f ...
demo% Myprog
demo% gprof Myprog

The program must complete normally for gprof to obtain meaningful timing information.

At program termination, the file gmon.out is automatically written in the working directory. This file contains the profiling data that will be interpreted by gprof.

Invoking gprof produces a report on standard output. An example is shown on the next pages. Not only the routines in your program are listed but also the library procedures and the routines they call.

The report is mostly two profiles of how the total time is distributed across the program procedures: the call graph and the flat profile. They are preceded by an explanation of the column labels, followed by an index. (The gprof -b option eliminates the explanatory text; see the gprof(1) man page for other options that can be used to limit the amount of output generated.)

In the graph profile, each procedure (subprogram, procedure) is presented in a call-tree representation. The line representing a procedure in its call-tree is called the function line, and is identified by an index number in the leftmost column, within square brackets; the lines above it are the parent lines; the lines below it, the descendant lines.

---------------------------------------------------------------
				parent line									caller 1
				parent line									caller 2
					....
[n]   time				function line									function name
				descendant line									called 1
				descendant line									called 2
					....
---------------------------------------------------------------

The call graph profile is followed by a flat profile that provides a routine-by-routine overview. An (edited) example of gprof output follows.

Note -

User-defined functions appear with their Fortran names followed by an underscore. Library routines appear with leading underscores.

The call graph profile:

granularity: each sample hit covers 2 byte(s) for 0.08% of 12.78 seconds

                                  called/total       parents 
index  %time    self descendents  called+self    name          index
                                  called/total       children

                0.00       12.66       1/1           main [1]
[3]     99.1    0.00       12.66       1         MAIN_ [3]
                0.92       10.99    1000/1000        diffr_ [4]
                0.62        0.00    2000/2001        code_ [9]
                0.11        0.00    1000/1000        shock_ [11]
                0.02        0.00    1000/1000        bndry_ [14]
                0.00        0.00       1/1           init_ [24]
                0.00        0.00       2/2           output_ [40]
                0.00        0.00       1/1           input_ [47]

-----------------------------------------------

                0.92       10.99    1000/1000        MAIN_ [3]
[4]     93.2    0.92       10.99    1000         diffr_ [4]
                1.11        4.52    3000/3000        deriv_ [7]
                1.29        2.91    3000/6000        cheb1_ [5]
                1.17        0.00    3000/3000        dissip_ [8]

-----------------------------------------------

                1.29        2.91    3000/6000        deriv_ [7]
                1.29        2.91    3000/6000        diffr_ [4]
[5]     65.7    2.58        5.81    6000         cheb1_ [5]
                5.81        0.00    6000/6000        fftb_ [6]
                0.00        0.00     128/321         cos [21]
                0.00        0.00     128/192         __sin [279]

-----------------------------------------------

                5.81        0.00    6000/6000        cheb1_ [5]
[6]     45.5    5.81        0.00    6000         fftb_ [6]
                0.00        0.00      64/321         cos [21]
                0.00        0.00      64/192         __sin [279]

-----------------------------------------------
...

The flat profile overview:

granularity: each sample hit covers 2 byte(s) for 0.08% of 12.84 seconds

   %  cumulative    self              self    total          
 time   seconds   seconds    calls  ms/call  ms/call name    
 45.2       5.81     5.81    6000     0.97     0.97  fftb_   [6]
 20.1       8.39     2.5     6000     0.43     1.40  cheb1_  [5]
  9.1       9.56     1.17    3000     0.39     0.39  dissip_ [8]
  8.6      10.67     1.11    3000     0.37     1.88  deriv_  [7]
  7.1      11.58     0.92    1000     0.92    11.91  diffr_  [4]
  4.8      12.20     0.62    2001     0.31     0.31  code_   [9]
  2.5      12.53     0.33    69000     0.00     0.00  __exp  [10]
  0.9      12.64     0.11     1000     0.11     0.11  shock_ [11]
...

Function Line.

The function line [5] in the preceding example reveals that:

cheb1 was called 6000 times-- 3000 from deriv, 3000 from diffr.
2.58 seconds were spent in cheb1 itself.
5.81 seconds were spent in routines called by cheb1.
65.7% of the execution time of the program was within cheb1.

Parent Lines.

The parent lines above [5] indicate that cheb1 was called from two routines, deriv and diffr. The timings on these lines show how much time was spent in cheb1 when it was called from each of these routines.

Descendant Lines.

The lines below the function line indicate the routines called from cheb1, fftb,sin, and cos. The library sine function is called indirectly.

Flat Profile.

Function names appear on the right. The profile is sorted by percentage of total execution time.

Overhead Considerations

Profiling (compiling with the -pg option) may greatly increase the running time of a program. This is due to the extra overhead required to clock program performance and subprogram calls. Profiling tools like gprof attempt to subtract an approximate overhead factor when computing relative runtime percentages. All other timings shown may not be accurate due to UNIX and hardware timekeeping inaccuracies.

Programs with short execution times are the most difficult to profile because the overhead may be a significant fraction of the total execution time. The best practice is to choose input data for the profiling run that will result in a realistic test of the program's performance. If this is not possible, consider enclosing the main computational part of the program within a loop that effectively runs the program N times. Estimate actual performance by dividing the profile results by N.

The Fortran library includes two routines that return the total time used by the calling process. See the man pages for dtime(3F) and etime(3F).

The gprof Profiling Command

Overhead Considerations

The `gprof` Profiling Command