Parallel Execution Model - Oracle® Developer Studio 12.6: C User's Guide

The execution of parallel loops is performed by threads. The thread starting the initial execution of the program is called the master thread. When the master thread encounters a parallel loop, it creates a team of threads composed of itself and multiple slave threads. The iterations of the loop are divided into chunks, and the chunks are distributed among the threads in the team. When a thread finishes execution of its chunk(s), it synchronizes with the remaining threads of the team. This synchronization is called a barrier. The master thread cannot continue executing the remainder of the program until all the slave threads have finished their work on the parallel loop and reached the barrier. At the end of the barrier, the master thread continues executing the program serially, until it encounters another parallel loop.

During this process, various overheads can occur, such as those related to:

Thread creation
Work distribution
Barrier synchronization

For some parallel loops, the amount of useful work performed is not enough to justify the overhead. For such loops, there may be appreciable slowdown from parallelization. However, if the amount of useful work in the loop is large enough, then the parallel execution of the loop will speed up the program.