可通过在执行程序之前设置各种环境变量,或者通过调用 omp_set_nested() 运行时例程来控制嵌套并行操作。本节讨论可用于控制嵌套并行操作的各种环境变量。
可通过设置 OMP_NESTED 环境变量来启用或禁用嵌套并行操作。缺省情况下,禁用嵌套并行操作。
以下示例中的嵌套并行构造具有三个级别。
示例 1 嵌套并行操作示例#include <omp.h> #include <stdio.h> void report_num_threads(int level) { #pragma omp single { printf("Level %d: number of threads in the team = %d\n", level, omp_get_num_threads()); } } int main() { omp_set_dynamic(0); #pragma omp parallel num_threads(2) { report_num_threads(1); #pragma omp parallel num_threads(2) { report_num_threads(2); #pragma omp parallel num_threads(2) { report_num_threads(3); } } } return(0); }
启用嵌套并行操作时,编译和运行此程序会产生以下(经过排序的)输出:
% setenv OMP_NESTED TRUE % a.out | sort Level 1: number of threads in the team = 2 Level 2: number of threads in the team = 2 Level 2: number of threads in the team = 2 Level 3: number of threads in the team = 2 Level 3: number of threads in the team = 2 Level 3: number of threads in the team = 2 Level 3: number of threads in the team = 2
在禁用嵌套并行操作的情况下运行程序会生成以下输出:
% setenv OMP_NESTED FALSE % a.out | sort Level 1: number of threads in the team = 2 Level 2: number of threads in the team = 1 Level 2: number of threads in the team = 1 Level 3: number of threads in the team = 1 Level 3: number of threads in the team = 1
OMP_THREAD_LIMIT 环境变量的设置控制可用于整个程序的最大 OpenMP 线程数。这一数量包括初始(或主)线程以及 OpenMP 运行时库创建的 OpenMP 辅助线程。缺省情况下,可用于整个程序的最大 OpenMP 线程数为 1024(一个初始线程或主线程及 1023 个 OpenMP 辅助线程)。
请注意,线程池仅由 OpenMP 运行时库创建的 OpenMP 辅助线程组成。该池不包含初始(或主)线程或由用户程序显式创建的任何线程。
如果将 OMP_THREAD_LIMIT 设置为 1,则辅助线程池将为空,并且所有并行区域将由一个线程(初始线程或主线程)执行。
以下示例输出表明,如果池中不包含足够数量的辅助线程,并行区域可能会获得较少的辅助线程。该代码与示例 1 中的代码相同,除了将环境变量 OMP_THREAD_LIMIT 设置为 6。使所有并行区域同时处于活动状态所需的线程数为 8 个。所以,池至少需要包含 7 个辅助线程。如果将 OMP_THREAD_LIMIT 设置为 6,则池中将最多包含 5 个辅助线程。因此,四个最里面的并行区域中的两个区域可能无法获取所请求的所有辅助线程。以下示例显示一个可能的结果。
% setenv OMP_NESTED TRUE % OMP_THREAD_LIMIT 6 % a.out | sort Level 1: number of threads in the team = 2 Level 2: number of threads in the team = 2 Level 2: number of threads in the team = 2 Level 3: number of threads in the team = 2 Level 3: number of threads in the team = 2 Level 3: number of threads in the team = 1 Level 3: number of threads in the team = 1
环境变量 OMP_MAX_ACTIVE_LEVELS 可控制嵌套活动并行区域的最大数量。如果由包含多个线程的组执行并行区域,则该并行区域处于活动状态。如果未设置,则使用缺省值 4。
请注意,设置该环境变量仅控制嵌套活动并行区域的最大数量,并不启用嵌套并行操作。要启用嵌套并行操作,必须将 OMP_NESTED 设置为 TRUE,或者必须使用求值结果为 true 的参数调用 omp_set_nested()。
以下样例代码将创建 4 级嵌套并行区域。
#include <omp.h> #include <stdio.h> #define DEPTH 4 void report_num_threads(int level) { #pragma omp single { printf("Level %d: number of threads in the team = %d\n", level, omp_get_num_threads()); } } void nested(int depth) { if (depth > DEPTH) return; #pragma omp parallel num_threads(2) { report_num_threads(depth); nested(depth+1); } } int main() { omp_set_dynamic(0); omp_set_nested(1); nested(1); return(0); }
以下输出显示将 DEPTH 设置为 4 时编译和运行样例代码可能产生的结果。实际结果取决于操作系统调度线程的方式。
% setenv OMP_NESTED TRUE % setenv OMP_MAX_ACTIVE_LEVELS 4 % a.out | sort Level 1: number of threads in the team = 2 Level 2: number of threads in the team = 2 Level 2: number of threads in the team = 2 Level 3: number of threads in the team = 2 Level 3: number of threads in the team = 2 Level 3: number of threads in the team = 2 Level 3: number of threads in the team = 2 Level 4: number of threads in the team = 2 Level 4: number of threads in the team = 2 Level 4: number of threads in the team = 2 Level 4: number of threads in the team = 2 Level 4: number of threads in the team = 2 Level 4: number of threads in the team = 2 Level 4: number of threads in the team = 2 Level 4: number of threads in the team = 2
如果将 OMP_MAX_ACTIVE_LEVELS 设置为 2,嵌套深度为 3 和 4 的嵌套并行区域将由单个线程来执行。以下示例显示一个可能的结果。
% setenv OMP_NESTED TRUE % setenv OMP_MAX_ACTIVE_LEVELS 2 % a.out |sort Level 1: number of threads in the team = 2 Level 2: number of threads in the team = 2 Level 2: number of threads in the team = 2 Level 3: number of threads in the team = 1 Level 3: number of threads in the team = 1 Level 3: number of threads in the team = 1 Level 3: number of threads in the team = 1 Level 4: number of threads in the team = 1 Level 4: number of threads in the team = 1 Level 4: number of threads in the team = 1 Level 4: number of threads in the team = 1