Multithreaded Programming Guide

Parallelizing a Loop on a Shared-Memory Parallel Computer

In many applications, and especially numerical applications, while part of the algorithm can be parallelized, other parts are inherently sequential, as shown in the following table. The algorithm can use barrier synchronization to coordinate the parallel and sequential portions.

Table 9–1 Multithreaded Cooperation Through Barrier Synchronization

Sequential Execution 

Parallel Execution 

Thread 1
Thread 2 through Thread n
while(many_iterations) {

    sequential_computation
    --- Barrier ---
    parallel_computation
}
while(many_iterations) {


    --- Barrier ---
    parallel_computation
}

For example, you might produce a set of matrixes with a strictly linear computation, and perform operations on the matrixes that use a parallel algorithm. You then use the results of these operations to produce another set of matrixes, operate on these matrixes in parallel, and so on.

The nature of the parallel algorithms for such a computation is that little synchronization is required during the computation. But, synchronization of all the threads is required to ensure that the sequential computation is finished before the parallel computation begins.

The barrier forces all the threads that are doing the parallel computation to wait until all involved threads have reached the barrier. When the threads have reached the barrier, the threads are released and begin computing together.