This section provides information on the design of the performance profiling system. This information can help you understand the sequence of events that occur before the generation of a performance profiling report.
The performance profiling tool set consists of:
The profiler server, PROF
(a supervisor
process). This process first interprets the performance profiling requests
issued by the PROF utility, and then executes the performance
profiling function at a selected profiling clock rate on the target. See the PROF(1CC)
man page for more details.
The profctl target utility. This utility sends performance profiling requests to the profiler server on the target. See the profctl(1CC) man page for more information.
The profrpg host utility. This command interprets profiling data and produces coherent profiling reports on the development host. See the profrpg(1CC) man page for more information.
When the performance profiling compiler option (-p) is used, the compiler provides each function entry point with a call to a routine, usually known as mcount. For each function, the compiler also sets up a static counter and passes the address of this counter to mcount. The counter is initialized at zero.
The scope of the action performed by mcount is defined by the application. Low-end performance profilers count the number of times the routine is called, and do not do much more than that. The ChorusOS profiler supports an advanced mcount routine within the profiled library (for constructing the runtime call graph).
You can supply your own mcount routine, to assert predicates when debugging a component, for example.
The profiler server PROF
, is a
supervisor process that can locate and modify static data within the memory
context of the profiled processes (using the embedded symbol tables). The
profiler server also dynamically creates and deletes the memory regions that
are used to construct the call graph and count the profiling ticks (see the
following section).
While the performance profiler is active, the system is regularly interrupted by the profiling clock (which by default is the system clock). At each clock tick, the instruction pointer is sampled, the active procedure is located, and a counter associated with the interrupted procedure is incremented. A high rate performance profiling clock can use a significant amount of system time, which may lead to the system appearing to run more slowly. A rapid sampling clock could jeopardize the system's real time requirements.
Significant disruptions in the real time capabilities of the profiled programs must be expected because performance profiling is implemented with software (rather than by hardware with an external bus analyzer or equivalent device). Performance profiling using software slows down the processor. An application can behave differently when being profiled, compared to when running at full processor speed.
When profiling, a processor can spend more than fifty percent of the processing time profiling clock interrupts. Similarly, the time spent recording the call graph is significant and can bias the profiling results in a non-linear manner.
The accuracy of the reported percentage of time spent is about five percent when the number of profiling ticks is in the order of magnitude of ten times the number of bytes in the profiled programs. For example, to profile a program of 1 million bytes with any degree of accuracy, at least 10 millions ticks should be used. This level of accuracy is usually sufficient to plan code optimizations (which is the primary goal of the profiler). However, the operator should beware of using all the fractional digits of the reported figures.
If greater accuracy is required, experiment with different combinations of the profiling clock rate, the type of profiling clock, and the time spent profiling.