通过在执行程序前设置各种环境变量,可以在运行时控制嵌套并行操作。
可通过设置 OMP_NESTED 环境变量或调用 omp_set_nested() 来启用或禁用嵌套并行操作。
以下示例中的嵌套并行构造具有三个级别。
示例 4-1 嵌套并行操作示例
#include <omp.h> #include <stdio.h> void report_num_threads(int level) { #pragma omp single { printf("Level %d: number of threads in the team - %d\n", level, omp_get_num_threads()); } } int main() { omp_set_dynamic(0); #pragma omp parallel num_threads(2) { report_num_threads(1); #pragma omp parallel num_threads(2) { report_num_threads(2); #pragma omp parallel num_threads(2) { report_num_threads(3); } } } return(0); }
启用嵌套并行操作时,编译和运行此程序会产生以下(经过排序的)输出:
% setenv OMP_NESTED TRUE % a.out Level 1: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 2
比较禁用嵌套并行操作时运行相同程序的输出结果:
% setenv OMP_NESTED FALSE % a.out Level 1: number of threads in the team - 2 Level 2: number of threads in the team - 1 Level 3: number of threads in the team - 1 Level 2: number of threads in the team - 1 Level 3: number of threads in the team - 1
OpenMP 运行时库维护一个线程池,该线程池可用作并行区域中的从属线程。可通过设置 OMP_THREAD_LIMIT 环境变量来控制池中的线程数。缺省情况下,池中的最大线程数为 1023。
线程池只包含运行时库创建的非用户线程。它不包含初始线程或由用户程序显式创建的任何线程。
如果将 OMP_THREAD_LIMIT 设置为 1(或将 SUNW_MP_MAX_POOL_THREADS 设置为零),线程池将为空,所有并行区域都将由一个线程来执行。
以下示例表明,如果池中的线程不足,并行区域将获得较少的线程。代码与上一示例相同。使所有并行区域同时处于活动状态所需的线程数为 8 个。所以,池至少需要包含 7 个线程。如果将 OMP_THREAD_LIMIT 设置为 6(或将 SUNW_MP_MAX_POOL_THREADS 设置为 5),池最多包含 5 个从属线程。这意味着四个最里面的并行区域中的两个区域可能无法获取所请求的所有从属线程。一种可能的结果如下所示。
% setenv OMP_NESTED TRUE % OMP_THREAD_LIMIT 6 % a.out Level 1: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 1 Level 3: number of threads in the team - 1
环境变量 OMP_MAX_ACTIVE_LEVELS 可控制需要多个线程的嵌套活动并行区域的最大深度。
活动嵌套深度大于此环境变量值的任何活动并行区域将仅由一个线程来执行。如果并行区域没有 if 子句,或者其 if 子句计算为 true,会将此区域视为活动区域。活动嵌套级别的缺省最大数量是 4。
以下代码将创建 4 级嵌套并行区域。如果将 OMP_MAX_ACTIVE_LEVELS 设置为 2,嵌套深度为 3 和 4 的嵌套并行区域将由单个线程来执行。
#include <omp.h> #include <stdio.h> #define DEPTH 5 void report_num_threads(int level) { #pragma omp single { printf("Level %d: number of threads in the team - %d\n", level, omp_get_num_threads()); } } void nested(int depth) { if (depth == DEPTH) return; #pragma omp parallel num_threads(2) { report_num_threads(depth); nested(depth+1); } } int main() { omp_set_dynamic(0); omp_set_nested(1); nested(1); return(0); }
使用最大嵌套级别 4 来编译和运行此程序会产生以下可能的输出。(实际结果取决于操作系统调度线程的方式。)
% setenv OMP_MAX_ACTIVE_LEVELS 4 % a.out |sort Level 1: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 4: number of threads in the team - 2 Level 4: number of threads in the team - 2 Level 4: number of threads in the team - 2 Level 4: number of threads in the team - 2 Level 4: number of threads in the team - 2 Level 4: number of threads in the team - 2 Level 4: number of threads in the team - 2 Level 4: number of threads in the team - 2
使用设置为 2 的嵌套级别来运行会产生以下可能的结果:
% setenv OMP_MAX_ACTIVE_LEVELS 2 % a.out |sort Level 1: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 3: number of threads in the team - 1 Level 3: number of threads in the team - 1 Level 3: number of threads in the team - 1 Level 3: number of threads in the team - 1 Level 4: number of threads in the team - 1 Level 4: number of threads in the team - 1 Level 4: number of threads in the team - 1 Level 4: number of threads in the team - 1
此外,这些示例只显示了一些可能的结果。实际结果取决于操作系统调度线程的方式。