The User mode presentation of the profile data attempts to present the information as if the program really executed according to the model described in Overview of OpenMP Software Execution. The actual data captures the implementation details of the runtime library, libmtsk.so , which does not correspond to the model. In User mode, the presentation of profile data is altered to match the model better, and differs from the recorded data and Machine mode presentation in three ways:
Artificial functions are constructed representing the state of each thread from the point of view of the OpenMP runtime library.
Call stacks are manipulated to report data corresponding to the model of how the code runs, as described above.
Two additional metrics of performance are constructed for clock-based profiling experiments, corresponding to time spent doing useful work and time spent waiting in the OpenMP runtime.
Artificial functions are constructed and put onto the User mode call stacks reflecting events in which a thread was in some state within the OpenMP runtime library.
The following artificial functions are defined:
<OMP-overhead> |
Executing in the OpenMP library |
<OMP-idle> |
Slave thread, waiting for work |
<OMP-reduction> |
Thread performing a reduction operation |
<OMP-implicit_barrier> |
Thread waiting at an implicit barrier |
<OMP-explicit_barrier> |
Thread waiting at an explicit barrier |
<OMP-lock_wait> |
Thread waiting for a lock |
<OMP-critical_section_wait> |
Thread waiting to enter a critical section |
<OMP-ordered_section_wait> |
Thread waiting for its turn to enter an ordered section |
When a thread is in an OpenMP runtime state corresponding to one of those functions, the corresponding function is added as the leaf function on the stack. When a thread’s leaf function is anywhere in the OpenMP runtime, it is replaced by <OMP-overhead> as the leaf function. Otherwise, all PCs from the OpenMP runtime are omitted from the user-mode stack.
For OpenMP experiments, user mode shows reconstructed call stacks similar to those obtained when the program is compiled without OpenMP.