Oracle® Solaris Studio 12.4: OpenMP API User's Guide

Exit Print View

Updated: December 2014
 
 

3.4 Some Tips for Using Nested Parallelism

  • Nested parallel regions provide an immediate way for more threads to participate in the computation.

    For example, suppose you have a program that contains two levels of parallelism and OMP_NUM_THREADS is set to 2. Also, suppose your system has four hardware threads and you want to use all four hardware threads to speed up the execution of the program. Just parallelizing any one level will use only two hardware threads. You can use all four hardware threads by enabling nested parallelism.

  • Nested parallel regions can easily create too many threads and oversubscribe the system. Set OMP_THREAD_LIMIT and OMP_MAX_ACTIVE_LEVELS appropriately to limit the number of threads in use and prevent runaway oversubscription.

  • Nested parallel regions add overhead. If the outer level has enough parallelism and the load is balanced, using all the threads at the outer level of the computation will be more efficient than creating nested parallel regions at the inner levels.

    For example, suppose you have a program that contains two levels of parallelism and the load is balanced. Suppose you have a system with four hardware threads and want to use all four hardware threads to speed up the execution of this program. In general, using all four threads for the outer parallel region would yield better performance than using two threads for the outer parallel region and using the other two threads as helper threads for the inner parallel regions because nested parallel regions will introduce additional barriers.