The gprof(1) command provides a detailed postmortem analysis of program timing at the subprogram level, including how many times a subprogram was called, who called it, whom it called, and how much time was spent in the routine and by the routines it called.
To enable gprof profiling, compile and link the program with the -pg option:
demo% f77 -o Myprog -fast -pg Myprog.f ... demo% Myprog demo% gprof Myprog
The program must complete normally for gprof to obtain meaningful timing information.
At program termination, the file gmon.out is automatically written in the working directory. This file contains the profiling data that will be interpreted by gprof.
Invoking gprof produces a report on standard output. An example is shown on the next pages. Not only the routines in your program are listed but also the library procedures and the routines they call.
The report is mostly two profiles of how the total time is distributed across the program procedures: the call graph and the flat profile. They are preceded by an explanation of the column labels, followed by an index. (The gprof -b option eliminates the explanatory text; see the gprof(1) man page for other options that can be used to limit the amount of output generated.)
In the graph profile, each procedure (subprogram, procedure) is presented in a call-tree representation. The line representing a procedure in its call-tree is called the function line, and is identified by an index number in the leftmost column, within square brackets; the lines above it are the parent lines; the lines below it, the descendant lines.
--------------------------------------------------------------- parent line caller 1 parent line caller 2 .... [n] time function line function name descendant line called 1 descendant line called 2 .... ---------------------------------------------------------------
The call graph profile is followed by a flat profile that provides a routine-by-routine overview. An (edited) example of gprof output follows.
User-defined functions appear with their Fortran names followed by an underscore. Library routines appear with leading underscores.
granularity: each sample hit covers 2 byte(s) for 0.08% of 12.78 seconds called/total parents index %time self descendents called+self name index called/total children 0.00 12.66 1/1 main [1] [3] 99.1 0.00 12.66 1 MAIN_ [3] 0.92 10.99 1000/1000 diffr_ [4] 0.62 0.00 2000/2001 code_ [9] 0.11 0.00 1000/1000 shock_ [11] 0.02 0.00 1000/1000 bndry_ [14] 0.00 0.00 1/1 init_ [24] 0.00 0.00 2/2 output_ [40] 0.00 0.00 1/1 input_ [47] ----------------------------------------------- 0.92 10.99 1000/1000 MAIN_ [3] [4] 93.2 0.92 10.99 1000 diffr_ [4] 1.11 4.52 3000/3000 deriv_ [7] 1.29 2.91 3000/6000 cheb1_ [5] 1.17 0.00 3000/3000 dissip_ [8] ----------------------------------------------- 1.29 2.91 3000/6000 deriv_ [7] 1.29 2.91 3000/6000 diffr_ [4] [5] 65.7 2.58 5.81 6000 cheb1_ [5] 5.81 0.00 6000/6000 fftb_ [6] 0.00 0.00 128/321 cos [21] 0.00 0.00 128/192 __sin [279] ----------------------------------------------- 5.81 0.00 6000/6000 cheb1_ [5] [6] 45.5 5.81 0.00 6000 fftb_ [6] 0.00 0.00 64/321 cos [21] 0.00 0.00 64/192 __sin [279] ----------------------------------------------- ...
granularity: each sample hit covers 2 byte(s) for 0.08% of 12.84 seconds % cumulative self self total time seconds seconds calls ms/call ms/call name 45.2 5.81 5.81 6000 0.97 0.97 fftb_ [6] 20.1 8.39 2.5 6000 0.43 1.40 cheb1_ [5] 9.1 9.56 1.17 3000 0.39 0.39 dissip_ [8] 8.6 10.67 1.11 3000 0.37 1.88 deriv_ [7] 7.1 11.58 0.92 1000 0.92 11.91 diffr_ [4] 4.8 12.20 0.62 2001 0.31 0.31 code_ [9] 2.5 12.53 0.33 69000 0.00 0.00 __exp [10] 0.9 12.64 0.11 1000 0.11 0.11 shock_ [11] ...
Function Line.
The function line [5] in the preceding example reveals that:
cheb1 was called 6000 times-- 3000 from deriv, 3000 from diffr.
2.58 seconds were spent in cheb1 itself.
5.81 seconds were spent in routines called by cheb1.
65.7% of the execution time of the program was within cheb1.
Parent Lines.
The parent lines above [5] indicate that cheb1 was called from two routines, deriv and diffr. The timings on these lines show how much time was spent in cheb1 when it was called from each of these routines.
Descendant Lines.
The lines below the function line indicate the routines called from cheb1, fftb,sin, and cos. The library sine function is called indirectly.
Flat Profile.
Function names appear on the right. The profile is sorted by percentage of total execution time.
Profiling (compiling with the -pg option) may greatly increase the running time of a program. This is due to the extra overhead required to clock program performance and subprogram calls. Profiling tools like gprof attempt to subtract an approximate overhead factor when computing relative runtime percentages. All other timings shown may not be accurate due to UNIX and hardware timekeeping inaccuracies.
Programs with short execution times are the most difficult to profile because the overhead may be a significant fraction of the total execution time. The best practice is to choose input data for the profiling run that will result in a realistic test of the program's performance. If this is not possible, consider enclosing the main computational part of the program within a loop that effectively runs the program N times. Estimate actual performance by dividing the profile results by N.
The Fortran library includes two routines that return the total time used by the calling process. See the man pages for dtime(3F) and etime(3F).