Clock Profiling

Language:

The event-specific data for clock profiling consists of an array of profiling interval counts. On Oracle Solaris, an interval counter is provided. At the end of the profiling interval, the appropriate interval counter is incremented by 1, and another profiling signal is scheduled. The array is recorded and reset only when the Solaris thread enters CPU user mode. Resetting the array consists of setting the array element for the User-CPU state to 1, and the array elements for all the other states to 0. The array data is recorded on entry to user mode before the array is reset. Thus, the array contains an accumulation of counts for each microstate that was entered since the previous entry into user mode for each of the ten microstates maintained by the kernel for each Solaris thread. On the Linux operating system, microstates do not exist. The only interval counter is User CPU Time.

The call stack is recorded at the same time as the data. If the Solaris thread is not in user mode at the end of the profiling interval, the call stack cannot change until the thread enters user mode again. Thus, the call stack always accurately records the position of the program counter at the end of each profiling interval.

The metrics to which each of the microstates contributes on Oracle Solaris are shown in Table 6–1.

Table 6-1 How Kernel Microstates Contribute to Metrics

Kernel Microstate	Description	Metric Name
`LMS_USER`	Running in user mode	User CPU Time
`LMS_SYSTEM`	Running in system call or page fault	System CPU Time
`LMS_TRAP`	Running in any other trap	System CPU Time
`LMS_TFAULT`	Asleep in user text page fault	Text Page Fault Time
`LMS_DFAULT`	Asleep in user data page fault	Data Page Fault Time
`LMS_KFAULT`	Asleep in kernel page fault	Other Wait Time
`LMS_USER_LOCK`	Asleep waiting for user-mode lock	User Lock Time
`LMS_SLEEP`	Asleep for any other reason	Other Wait Time
`LMS_STOPPED`	Stopped (`/proc`, job control, or `lwp_stop`)	Other Wait Time
`LMS_WAIT_CPU`	Waiting for CPU	Wait CPU Time

Accuracy of Timing Metrics

Timing data is collected on a statistical basis, and is therefore subject to all the errors of any statistical sampling method. For very short runs in which only a small number of profile packets is recorded, the call stacks might not represent the parts of the program which consume the most resources. Run your program for long enough or for enough times to accumulate hundreds of profile packets for any function or source line you are interested in.

In addition to statistical sampling errors, specific errors arise from the way the data is collected and attributed and the way the program progresses through the system. The following are some of the circumstances in which inaccuracies or distortions can appear in the timing metrics:

When a thread is created, the time spent before the first profile packet is recorded is less than the profiling interval but the entire profiling interval is ascribed to the microstate recorded in the first profile packet. If many threads are created, the error can be many times the profiling interval.
When a thread is destroyed, some time is spent after the last profile packet is recorded. If many threads are destroyed, the error can be many times the profiling interval.
Rescheduling of threads can occur during a profiling interval. As a consequence, the recorded state of the thread might not represent the microstate in which it spent most of the profiling interval. The errors are likely to be larger when there are more threads to run than there are processors to run them.
A program can behave in a way that is correlated with the system clock. In this case, the profiling interval always expires when the thread is in a state that might represent a small fraction of the time spent and the call stacks recorded for a particular part of the program are overrepresented. On a multiprocessor system, the profiling signal can induce a correlation: processors that are interrupted by the profiling signal while they are running threads for the program are likely to be in the Trap-CPU microstate when the microstate is recorded.
The kernel records the microstate value when the profiling interval expires. When the system is under heavy load, that value might not represent the true state of the process. On Oracle Solaris, this situation is likely to result in over accounting of the Trap-CPU or Wait-CPU microstate.
When the system clock is being synchronized with an external source, the timestamps recorded in profile packets do not reflect the profiling interval but include any adjustment that was made to the clock. The clock adjustment can make it appear that profile packets are lost. The time period involved is usually several seconds, and the adjustments are made in increments.
Experiments recorded on machines that dynamically change their operating clock frequency might reflect inaccuracies in profiling.

In addition to the inaccuracies just described, timing metrics are distorted by the process of collecting data. The time spent recording profile packets never appears in the metrics for the program because the recording is initiated by the profiling signal. (This is another instance of correlation.) The user CPU time spent in the recording process is distributed over whatever microstates are recorded. The result is an underaccounting of the User CPU Time metric and an overaccounting of other metrics. The amount of time spent recording data is typically less than a few percent of the CPU time for the default profiling interval.

Comparisons of Timing Metrics

If you compare timing metrics obtained from the profiling done in a clock-based experiment with times obtained by other means, you should be aware of the following issues.

For a single-threaded application, the total thread time recorded for a process is usually accurate to a few tenths of a percent, compared with the values returned by gethrtime(3C) for the same process. The CPU time can vary by several percentage points from the values returned by gethrvtime(3C) for the same process. Under heavy load, the variation might be even more pronounced. However, the CPU time differences do not represent a systematic distortion. The relative times reported for different functions, source-lines, and such are not substantially distorted.

The thread times that are reported in the Performance Analyzer can differ substantially from the times that are reported by vmstat, because vmstat reports times that are summed over CPUs. If the target process has more LWPs than the system on which it is running has CPUs, the Performance Analyzer shows more wait time than vmstat reports.

The microstate timings that appear in the Statistics view of the Performance Analyzer and the er_print statistics display are based on process file system /proc usage reports, for which the times spent in the microstates are recorded to high accuracy. See the proc (4) man page for more information. You can compare these timings with the metrics for the <Total> function, which represents the program as a whole, to gain an indication of the accuracy of the aggregated timing metrics. However, the values displayed in the Statistics view can include other contributions that are not included in the timing metric values for <Total>. These contributions come from the periods of time in which data collection is paused.

User CPU time and hardware counter cycle time differ because the hardware counters are turned off when the CPU mode has been switched to system mode. For more information, see Traps.