通过在执行程序前设置各种环境变量,可以在运行时控制嵌套并行操作。
可通过设置 OMP_NESTED 环境变量或调用 omp_set_nested() 来启用或禁用嵌套并行操作。
以下示例中的嵌套并行构造具有三个级别。
#include <omp.h> #include <stdio.h> void report_num_threads(int level) { #pragma omp single { printf("Level %d: number of threads in the team - %d\n", level, omp_get_num_threads()); } } int main() { omp_set_dynamic(0); #pragma omp parallel num_threads(2) { report_num_threads(1); #pragma omp parallel num_threads(2) { report_num_threads(2); #pragma omp parallel num_threads(2) { report_num_threads(3); } } } return(0); } |
启用嵌套并行操作时,编译和运行此程序会产生以下(经过排序的)输出:
% setenv OMP_NESTED TRUE % a.out Level 1: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 2 |
比较禁用嵌套并行操作时运行相同程序的输出结果:
% setenv OMP_NESTED FALSE % a.out Level 1: number of threads in the team - 2 Level 2: number of threads in the team - 1 Level 3: number of threads in the team - 1 Level 2: number of threads in the team - 1 Level 3: number of threads in the team - 1 |
OpenMP 运行时库维护一个线程池,该线程池可用作并行区域中的从属线程。设置 SUNW_MP_MAX_POOL_THREADS 环境变量可控制池中线程的数量。缺省值为 1023。
线程池只包含运行时库创建的非用户线程。它不包含初始线程或由用户程序显式创建的任何线程。如果将此环境变量设置为零,则线程池为空,并且将由一个线程执行所有并行区域。
以下示例说明,如果池中没有足够的线程,并行区域可能获取较少的线程。代码与上面的代码相同。使所有并行区域同时处于活动状态所需的线程数为 8 个。池需要至少包含 7 个线程。如果将 SUNW_MP_MAX_POOL_THREADS 设置为 5,则四个最里面的并行区域中的两个区域可能无法获取所请求的所有从属线程。一种可能的结果如下所示。
% setenv OMP_NESTED TRUE % setenv SUNW_MP_MAX_POOL_THREADS 5 % a.out Level 1: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 1 Level 3: number of threads in the team - 1 |
环境变量 SUNW_MP_MAX_NESTED_LEVELS 可控制需要多个线程的嵌套活动并行区域的最大深度。
活动嵌套深度大于此环境变量值的任何活动并行区域将仅由一个线程来执行。如果并行区域没有 IF 子句,或者其 IF 子句计算为 true,则将此并行区域视为活动区域。活动嵌套级别的缺省最大数量是 4。
以下代码将创建 4 级嵌套并行区域。如果将 SUNW_MP_MAX_NESTED_LEVELS 设置为 2,则嵌套深度为 3 和 4 的嵌套并行区域将由单个线程来执行。
#include <omp.h> #include <stdio.h> #define DEPTH 5 void report_num_threads(int level) { #pragma omp single { printf("Level %d: number of threads in the team - %d\n", level, omp_get_num_threads()); } } void nested(int depth) { if (depth == DEPTH) return; #pragma omp parallel num_threads(2) { report_num_threads(depth); nested(depth+1); } } int main() { omp_set_dynamic(0); omp_set_nested(1); nested(1); return(0); } |
使用最大嵌套级别 4 来编译和运行此程序会产生以下可能的输出。(实际结果取决于操作系统调度线程的方式。)
% setenv SUNW_MP_MAX_NESTED_LEVELS 4 % a.out |sort +2n Level 1: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 3: number of threads in the team - 2 Level 4: number of threads in the team - 2 Level 4: number of threads in the team - 2 Level 4: number of threads in the team - 2 Level 4: number of threads in the team - 2 Level 4: number of threads in the team - 2 Level 4: number of threads in the team - 2 Level 4: number of threads in the team - 2 Level 4: number of threads in the team - 2 |
使用设置为 2 的嵌套级别来运行会产生以下可能的结果:
% setenv SUNW_MP_MAX_NESTED_LEVELS 2 % a.out |sort Level 1: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 2: number of threads in the team - 2 Level 3: number of threads in the team - 1 Level 3: number of threads in the team - 1 Level 3: number of threads in the team - 1 Level 3: number of threads in the team - 1 Level 4: number of threads in the team - 1 Level 4: number of threads in the team - 1 Level 4: number of threads in the team - 1 Level 4: number of threads in the team - 1 |
此外,这些示例只显示了一些可能的结果。实际结果取决于操作系统调度线程的方式。