In many applications, and especially numerical applications, while part of the algorithm can be parallelized, other parts are inherently sequential, as shown in the following:
Thread_{1} |
Thread_{2} through Thread_{n} |
while(many_iterations) { sequential_computation --- Barrier --- parallel_computation } |
while(many_iterations) { --- Barrier --- parallel_computation } |
For example, you might produce a set of matrices with a strictly linear computation, then perform operations on the matrices using a parallel algorithm, then use the results of these operations to produce another set of matrices, then operate on them in parallel, and so on.
The nature of the parallel algorithms for such a computation is that little synchronization is required during the computation, but synchronization of all the threads employed is required to ensure that the sequential computation is finished before the parallel computation begins.
The barrier forces all the threads that are doing the parallel computation to wait until all threads involved have reached the barrier. When they've reached the barrier, they are released and begin computing together.