Speedups - Oracle® Solaris Studio 12.4: C User's Guide

Language:

3.5 Speedups

If the compiler does not parallelize a portion of a program where a significant amount of time is spent, then no speedup occurs. For example, if a loop that accounts for five percent of the execution time of a program is parallelized, then the overall speedup is limited to five percent. However, any improvement depends on the size of the workload and parallel execution overheads.

As a general rule, the larger the fraction of program execution that is parallelized, the greater the likelihood of a speedup.

Each parallel loop incurs a small overhead during startup and shutdown. The start overhead includes the cost of work distribution, and the shutdown overhead includes the cost of the barrier synchronization. If the total amount of work performed by the loop is not big enough then no speedup will occur. In fact, the loop might even slow down. If a large amount of program execution is accounted by a large number of short parallel loops, then the whole program may slow down instead of speeding up.

The compiler performs several loop transformations that try to increase the granularity of the loops. Some of these transformations are loop interchange and loop fusion. If the amount of parallelism in a program is small or is fragmented among small parallel regions, then the speedup is usually less.

Scaling up a problem size often improves the fraction of parallelism in a program. For example, consider a problem that consists of two parts: a quadratic part that is sequential, and a cubic part that is parallelizable. For this problem, the parallel part of the workload grows faster than the sequential part. At some point the problem will speed up unless it runs into resource limitations.

Try some tuning and experimentation with directives, problem sizes, and program restructuring in order to achieve the most benefit from parallel C.