Analyzing Program Performance With Sun WorkShop

Using prof to Generate a Program Profile

prof generates a statistical profile of the CPU time used by a program and counts the number of times each function in a program is entered. Different, or more detailed data, is provided by the call-graph profile and the code coverage tools.

Using prof involves three basic steps:

  1. Compiling the program for prof

  2. Running the program to produce a profile data file

  3. Using prof to generate a report that summarizes the data

To compile a program for profiling with prof, use the -p option to the compiler. For example, to compile a C source file named index.assist.c for profiling, you would use this compiler command:

% cc -p -o index.assist index.assist.c

The compiler would produce a program called index.assist.

Now you could run the index.assist program. Each time you run the program, profiling data would be sent to a profile file called mon.out. Every time you run the program a new mon.out file would be created, overwriting the old version.

Finally, you would use the prof command to generate a report:


% index.assist
% ls mon.outmon.out
%prof index.assist

The prof output would resemble this example.

 %Time Seconds Cumsecs #Calls msecs/call Name
 19.4 3.28 3.28 11962 0.27 compare_strings
 15.6 2.64 5.92 32731 0.08 _strlen
 12.6 2.14 8.06 4579 0.47 __doprnt
 10.5 1.78 9.84    mcount
 9.9 1.68 11.52 6849 0.25 _get_field
 5.3 0.90 12.42 762 1.18 _fgets
 4.7 0.80 13.22 19715 0.04 _strcmp
 4.0 0.67 13.89 5329 0.13 _malloc
 3.4 0.57 14.46 11152 0.05 _insert_index_entry
 3.1 0.53 14.99 11152 0.05 _compare_entry
 2.5 0.42 15.41 1289 0.33 lmodt
 0.9 0.16 15.57 761 0.21 _get_index_terms
 0.9 0.16 15.73 3805 0.04 _strcpy
 0.8 0.14 15.87 6849 0.02 _skip_space
 0.7 0.12 15.99 13 9.23 _read
 0.7 0.12 16.11 1289 0.09 ldivt
 0.6 0.10 16.21 1405 0.07 _print_index
 . . (The rest of the output is insignificant)

Sample prof Output

This display points out that most of the program running time is spent in the compare_strings routine; after that, most of the time is spent in the _strlen library routine. To make improvements to this program, concentrate on the compare_strings function.

Let's interpret the results of the profiling run-through. The results are listed under these column headings:

%Time--The percentage of the total CPU time consumed by this routine of the program.

Seconds--The total CPU time accounted for by this function.

Cumsecs--A running sum of the number of seconds accounted for by this function and those listed above it.

#Calls--The number of times this routine is called.

msecs/call--The average number of milliseconds this routine consumes each time it is called.

Name--The name of the routine.

What results can be derived from the profile data? The compare_strings function consumes nearly 20% of the total time. To improve the runtime of index.assist, you could either improve the algorithm that compare_strings uses, or cut down the number of calls to compare_strings.

It is not obvious from the flat call graph that compare_strings is heavily recursive, but you can deduce this by using the call graph profile described in "Using gprof to Generate a Call Graph Profile". In this particular case, improving the algorithm also reduces the number of calls.


Note -

For Solaris 2.6 and Solaris 7, the profile of CPU time is accurate for programs that use multiple CPUs, but the fact that the counts are not locked may affect the accuracy of the counts for functions.