ChorusOS 4.0 Introduction

Performance Profiler Description

This section provides information about the performance profiling system's design, to help you understand the sequence of events that occurs before the generation of a performance profiling report.

The performance profiling tool set consists of:

The Performance Profiling Library

When the performance profiling compiler option (generally -p) is used, the compiler provides each function entry point with a call to a routine, normally called mcount. For each function, the compiler also sets up a static counter, and passes the address of this counter to mcount. The counter is initialized at zero.

What is done by mcount is defined by the application. Low-end performance profilers simply count the number of times the routine is called. ChorusOS Profiler provides a sophisticated mcount routine within the profiled library that constructs the runtime call graph. Note that you can supply your own mcount routine, for example to assert predicates when debugging a component.

The Performance Profiler Server

The profiler server, PROF, is a supervisor actor that can locate and modify static data within the memory context of the profiled actors, using the embedded symbol tables. The profiler server also dynamically creates and deletes the memory regions that are used to construct the call graph and count the profiling ticks (see below).

The Performance Profiling Clock

While the performance profiler is active, the system is regularly interrupted by the profiling clock, which by default is the system clock. At each clock tick, the instruction pointer is sampled, the active procedure is located and a counter associated with the interrupted procedure is incremented. A high rate performance profiling clock could use a significant amount of system time, which could lead to the system appearing to run more slowly. A rapid sampling clock could jeopardize the system's real-time requirements.

Notes About Accuracy

Significant disruptions in the real-time capabilities of the profiled programs must be expected, because performance profiling is performed by software (rather than by hardware with an external bus analyzer or equivalent device). Performance profiling using software slows down the processor, and the profiled applications may behave differently when being profiled compared to when running at full processor speed.

When profiling, the processor can spend more than fifty percent of the processing time profiling clock interrupts. Similarly, the time spent recording the call graph is significant, and tends to bias the profiling results in a non-linear manner.

The accuracy of the reported percentage of time spent is about five percent when the number of profiling ticks is in the order of magnitude of ten times the number of bytes in the profiled programs. In other words, in order to profile a program of 1 million bytes with any degree of accuracy, at least 10 millions ticks should be used. This level of accuracy is usually sufficient to plan code optimizations, which is the primary goal of the profiler, but the operator should beware of using all the fractional digits of the reported figures.

If more accuracy is needed, the operator can experiment with different combinations of the rate of the profiling clock, the type of profiling clock and the time spent profiling.