ChorusOS 4.0 Introduction

Chapter 11 Performance Profiling

This chapter explains how to analyze the performance of a ChorusOS operating system and its applications by generating a performance profile report.

Introduction to Performance Profiling

The ChorusOS operating system performance profiling system contains a set of tools that facilitate the analysis and optimization of the performance of the ChorusOS operating system and applications. These tools concern only system components sharing the system address space, that is, the ChorusOS operating system components and supervisor application actors. This set of tools is composed of a profiling server, libraries for building profiled actors, a target controller and a host utility.

Software performance profiling consists of collecting data about the dynamic behavior of the software, to gain knowledge of the time distribution within the software. For example, the performance profiling system is able to report the time spent within each procedure, as well as providing a dynamically constructed call graph.

The typical steps of an optimization project are:

  1. To bench a set of typical applications, using the ChorusOS operating system and applications at peak performance. The selection of these applications is critical, as the system will eventually be tuned for this type of application.

  2. To evaluate and record the output of the benchmarks.

  3. To use the performance profiling system to collect raw data about the dynamic behavior of the applications.

  4. To generate, evaluate and record the performance profiling reports.

  5. To plan and implement optimizations such as rewriting certain time-critical routines in assembly language, using in-line functions or tuning algorithms.

The performance profiling tools provide two different classes of service, depending on the way in which the software being measured has been prepared:

Note -

The standard (binary) version of the ChorusOS operating system is not compiled with the performance profiling option: profiling the system will only generate a simple form. Non-profiled components (or components for which a simple report form is sufficient) do not need to be compiled with the performance profiling option.

In order to obtain a full form for ChorusOS operating system components, a source product distribution is needed. In this case, it is necessary to regenerate the system components with the performance profiling option set.

Preparing to Create a Performance Profile

Configuring the System

In order to perform system performance profiling using the ChorusOS Profiler, a ChorusOS target system must include the ACTOR_EXTENDED_MNGT and NFS_CLIENT feature options.

Launch the performance profiling server (the PROF actor) dynamically, using:

% rsh -n target arun PROF &

Compiling the Application

If you require full report forms, the profiled components must be compiled using the performance profiling compiler options (usually, the -p option).

If you are using the imake environment provided with the ChorusOS operating system, you can set the profiling option in the Project.tmpl file if you want to profile the whole project hierarchy, or in each Imakefile of the directories that you want to profile if you want to profile only a subset of your project hierarchy. In either case, add the following line:


You can also add the performance profiling option dynamically by calling make with the compiler profiling option:

% make PROF=-p  

in the directory of the program that is to be performance profiled.

Launching the Performance Profiled Application

In this section, it is assumed that the application consists of a single supervisor actor, the_actor, it is also assumed that the target system is named trumpet, and that the target tree is mounted under the $CHORUS_ROOT host directory.

In order to be performance profiled, an application may be either:

Running a Performance Profiling Session

Starting the Performance Profiling Session

Performance profiling is initiated by running the profctl utility on the target system, using the -start option. This utility (see "Security" for more details) considers the components to be profiled as arguments.

If the_actor was part of the system image:

% rsh trumpet arun profctl -start -b the_actor

Otherwise, if the_actor was loaded dynamically:

% rsh trumpet arun profctl -start -a the_actor aid

where aid is the numeric identifier of the actor (as returned by the arun or aps commands).

Note -

Several components may be specified to the profctl utility. See "Security" for more details.

Run the application.

Stopping the Performance Profiling Session

Performance profiling is stopped by running the profctl utility again, using the -stop option:

% rsh trumpet arun profctl -stop

When performance profiling is stopped, a raw data file is generated for each profiled component within the /tmp directory of the target file system. The name of the file consists of the component name, to which the suffix .prof is added. For example, if only the_actor was profiled, the file $CHORUSUS_ROOT/tmp/ would be created.

Generating Performance Profiling Reports

Performance profiling reports are generated by the profrpg host utility (see "Security" for details on reporting options).

Use the report generator to produce a report for each profiled component; as follows:

% cd $CHORUSUS_ROOT/tmp  

% profrpg the_actor > the_actor.rpg 

In order to track the benefits of optimization, the reports should be archived.

Analyzing Performance Profiling Reports

Performance profiling is applied to a user-selected set of components (ChorusOS operating system kernel, supervisor actors). The result of the performance profiling consists of a set of reports, one per profiled component.

A performance profiling report consists of two parts:

For each function, the performance profile report displays the following information:

Shown below is an example of a profiling report.

  memcpy 4 K=18.834
  memcpy 16 K=51.936
  memcpy 64 K=185.579
  memcpy 256 K=801.300
  thread switch=5.777
  threadCreate (active)=8.062
  threadCreate (active, preempt)=10.071
  threadPriority (self)=3.789
  threadPriority (self, high)=3.195
  threadResume (preempt)=6.999
  threadResume (awake)=4.014
  ipcCall (null, timeout)=35.732
  ipcSend (null, funcmode)=7.723
  ipcCall (null, funcmode)=31.762
  ipcSend (null, funcumode)=7.924
  ipcCall (null, funcumode)=31.864
  ipcSend (annex)=8.294
  ipcReceive (annex)=7.086
  ipcCall (annex)=33.708
  ipcSend (body 4b)=8.020
  ipcReceive (body 4b)=6.822
  ipcCall (body 4b)=32.558
  ipcSend (annex, body 4b)=8.684
  ipcReceive (annex, body 4b)=7.495
  ipcCall (annex, body 4b)=34.849

Performance Profiler Description

This section provides information about the performance profiling system's design, to help you understand the sequence of events that occurs before the generation of a performance profiling report.

The performance profiling tool set consists of:

The Performance Profiling Library

When the performance profiling compiler option (generally -p) is used, the compiler provides each function entry point with a call to a routine, normally called mcount. For each function, the compiler also sets up a static counter, and passes the address of this counter to mcount. The counter is initialized at zero.

What is done by mcount is defined by the application. Low-end performance profilers simply count the number of times the routine is called. ChorusOS Profiler provides a sophisticated mcount routine within the profiled library that constructs the runtime call graph. Note that you can supply your own mcount routine, for example to assert predicates when debugging a component.

The Performance Profiler Server

The profiler server, PROF, is a supervisor actor that can locate and modify static data within the memory context of the profiled actors, using the embedded symbol tables. The profiler server also dynamically creates and deletes the memory regions that are used to construct the call graph and count the profiling ticks (see below).

The Performance Profiling Clock

While the performance profiler is active, the system is regularly interrupted by the profiling clock, which by default is the system clock. At each clock tick, the instruction pointer is sampled, the active procedure is located and a counter associated with the interrupted procedure is incremented. A high rate performance profiling clock could use a significant amount of system time, which could lead to the system appearing to run more slowly. A rapid sampling clock could jeopardize the system's real-time requirements.

Notes About Accuracy

Significant disruptions in the real-time capabilities of the profiled programs must be expected, because performance profiling is performed by software (rather than by hardware with an external bus analyzer or equivalent device). Performance profiling using software slows down the processor, and the profiled applications may behave differently when being profiled compared to when running at full processor speed.

When profiling, the processor can spend more than fifty percent of the processing time profiling clock interrupts. Similarly, the time spent recording the call graph is significant, and tends to bias the profiling results in a non-linear manner.

The accuracy of the reported percentage of time spent is about five percent when the number of profiling ticks is in the order of magnitude of ten times the number of bytes in the profiled programs. In other words, in order to profile a program of 1 million bytes with any degree of accuracy, at least 10 millions ticks should be used. This level of accuracy is usually sufficient to plan code optimizations, which is the primary goal of the profiler, but the operator should beware of using all the fractional digits of the reported figures.

If more accuracy is needed, the operator can experiment with different combinations of the rate of the profiling clock, the type of profiling clock and the time spent profiling.