Understanding Hardware Counter CPU Cycles Profiling Metrics

Language:

This part of the tutorial requires an experiment with data from the cycles counter. If your system does not support this counter, your experiment cannot be used in this section. Skip to the next section Understanding Cache Contention and Cache Profiling Metrics.

Select the Overview page and enable the derived metric Cycles Per Instruction and the General Hardware Counter metric, CPU Cycles Time.

You should keep Total CPU Time and Instructions Executed selected.
Return to the Source view at computeB().

Cycles Per Instruction is computed from the underlying cycles and instruction counts.

The CPU Cycles metric is displayed as estimated time. The time value is computed by dividing the cycle count by a fixed frequency. On systems where the processors run at a fixed frequency, CPU Cycles and Total CPU time are roughly equivalent. On systems where the processors run at variable frequencies, the two metrics will differ in value. In all cases, CPI and IPC are computed from the underlying cycle counts. In any case, CPI and IPC are computed from the underlying cycle counts.

In the screen shots, the Incl. CPU Cycles and the Incl. Total CPU Time are about 12 seconds for each of the compute*() functions except computeB(). You should also see in your experiment that the Incl. Cycles Per Instruction (CPI) is much higher for computeB() than it is for the other compute*() functions. This indicates that more CPU cycles are needed to execute the same number of instructions, and computeB() is therefore less efficient than the others.

The data you have seen so far shows the difference between that computeB() function and the others, but does not show why they might be different. The next part of this tutorial explores why computeB() is different.