Hardware counter overflow profiling data includes a counter ID and the overflow value. The value can be larger than the value at which the counter is set to overflow because the processor executes some instructions between the overflow and the recording of the event. The value is especially likely to be larger for cycle and instruction counters, which are incremented much more frequently than counters such as floating-point operations or cache misses. The delay in recording the event also means that the program counter address recorded with call stack does not correspond exactly to the overflow event. See Attribution of Hardware Counter Overflows for more information. See also the discussion of Traps. Traps and trap handlers can cause significant differences between reported User CPU time and time reported by the cycle counter.
Experiments recorded on machines that dynamically change their operating clock frequency might show inaccuracies in the conversion of cycle-based count to time.
The amount of time spent collecting data can be a substantial fraction of the execution time of the program. The collection run might spend most of its time handling overflows and writing data instead of running the program.
A substantial fraction of the counts can come from the collection process. These counts are attributed to the collector function collector_record_counters . If you see high counts for this function, the overflow value is too small.
The collection of data can alter the behavior of the program. For example, if you are collecting data on cache misses, the majority of the misses could come from flushing the collector instructions and profiling data from the cache and replacing it with the program instructions and data. The program would appear to have a lot of cache misses, but without data collection few cache misses might actually have occurred.