Go to main content
Oracle® Developer Studio 12.6: OpenMP API User's Guide

Exit Print View

Updated: June 2017
 
 

5.2 OMP_PLACES and OMP_PROC_BIND

OpenMP 4.0 provides the OMP_PLACES and OMP_PROC_BIND environment variables to specify how the OpenMP threads in a program are bound to processors. These two environment variable are often used in conjunction with each other. OMP_PLACES is used to specify the places on the machine to which the threads are bound. OMP_PROC_BIND is used to specify the binding policy (thread affinity policy) which prescribes how the threads are assigned to places. Setting OMP_PLACES alone does not enable binding. You also need to set OMP_PROC_BIND.

According to the OpenMP specification, the value of OMP_PLACES can be one of two types of values: either an abstract name describing a set of places (threads, cores, or sockets), or an explicit list of places described by non-negative numbers. Intervals can also be used to define places using the <lowerbound> : <length> : <stride> notation to represent the following list of numbers: "<lower-bound>, <lower-bound> + <stride>, …, <lower-bound> + (<length>-1)*<stride>". When <stride> is omitted, a unit stride is assumed. If OMP_PLACES is not set, then the default value is cores.

Example 14  One hardware thread in each place
% OMP_PLACES="{0:1}:8:32"

{0:1} defines a place which has one hardware thread only, namely place {0}. The interval {0:1}:8:32 is therefore equivalent to
{0}:8:32, which defines 8 places starting with place {0}, and the stride is 32. So the list of places is as follows:

Place 0: {0}
Place 1: {32}
Place 2: {64}
Place 3: {96}
Place 4: {128}
Place 5: {160}
Place 6: {192}
Place 7: {224}
Example 15  Two hardware threads in each place
% OMP_PLACES="{0:2}:32:8"

{0:2} defines a place which has two hardware threads, namely place {0,1}. The interval {0:2}:24:8 is therefore equivalent to
{0,1}:24:8 which defines 24 places starting with place {0,1}, and the stride is 8. So the list of places is as follows: 

Place 0: {0,1}
Place 1: {8,9}
Place 2: {16,17}
Place 3: {24,25}
Place 4: {32,33}
Place 5: {40,41}
Place 6: {48,49}
Place 7: {56,57}
Place 8: {64,65}
Place 9: {72,73}
Place 10: {80,81}
Place 11: {88,89}
Place 12: {96,97}
Place 13: {104,105}
Place 14: {112,113}
Place 15: {120,121}
Place 16: {128,129}
Place 17: {136,137}
Place 18: {144,145}
Place 19: {152,153}
Place 20: {160,161}
Place 21: {168,169}
Place 22: {176,177}
Place 23: {184,185}

In addition to the two environment variables, OMP_PLACES and OMP_PROC_BIND, OpenMP 4.0 provides the proc_bind clause, which can appear on a parallel directive. The proc_bind clause is used to specify how the team of threads executing the parallel region are bound to processors.

For details about the OMP_PLACES and OMP_PROC_BIND environment variables and the proc_bind clause, refer to the OpenMP 4.0 specification.

5.2.1 Controlling Thread Affinity in OpenMP 4.0

This section provides details about Section 2.5.2, "Controlling OpenMP Thread Affinity", in the OpenMP 4.0 specification.

When a thread encounters a parallel construct that includes a proc_bind clause, the OMP_PROC_BIND environment variable is used to determine the policy for binding threads to places. If the parallel construct includes a proc_bind clause, then the binding policy specified by the proc_bind clause overrides the policy specified by OMP_PROC_BIND. Once a thread in the team is assigned to a place, the implementation does not move it to another place.

The master thread affinity policy instructs the execution environment to assign every thread in the team to the same place as the master thread. The place partition is not changed by this policy, and each implicit task inherits the place-partition-var Internal Control Variable (ICV) of the parent implicit task.

The close thread affinity policy instructs the execution environment to assign the threads in the team to places close to the place of the parent thread. The place partition is not changed by this policy, and each implicit task inherits the place-partition-var ICV of the parent implicit task. If T is the number of threads in the team, and P is the number of places in the parent's place partition, then the assignment of threads in the team to places is as follows:

  • T <= P. The master thread executes on the place of the parent thread, that is, the thread that encountered the parallel construct. The thread with the next smallest thread number executes on the next place in the place partition, and so on, with wrap around with respect to the place partition of the master thread.

  • T > P. Each place P will contain Sp threads with consecutive thread numbers, where floor(T/P) <= Sp <= ceiling(T/P). The first S0 threads (including the master thread) are assigned to the place of the parent thread. The next S1 threads are assigned to the next place in the place partition, and so on, with wrap around with respect to the place partition of the master thread. When P does not divide T evenly, the exact number of threads in a particular place is implementation defined.

The purpose of the spread thread affinity policy is to create a sparse distribution for a team of T threads among the P places of the parent's place partition. A sparse distribution is achieved by first subdividing the parent partition into T subpartitions if T <= P, or P subpartitions if T > P. Then one thread (T <= P) or a set of threads (T > P) is assigned to each subpartition. The place-partition-var ICV of each implicit task is set to its subpartition. The subpartitioning is not only a mechanism for achieving a sparse distribution, it also defines a subset of places for a thread to use when creating a nested parallel region. The assignment of threads to places is as follows:

  • T <= P. The parent thread's place partition is split into T subpartitions, where each subpartition contains floor(P/T) or ceiling(P/T) consecutive places. A single thread is assigned to each subpartition. The master thread executes on the place of the parent thread and is assigned to the subpartition that includes that place. The thread with the next smallest thread number is assigned to the first place in the next subpartition, and so on, with wrap around with respect to the original place partition of the master thread.

  • T > P. The parent thread's place partition is split into P subpartitions, each consisting of a single place. Each subpartition is assigned Sp threads with consecutive thread numbers, where floor(T/P) <= Sp <= ceiling(T/P). The first S0 threads (including the master thread) are assigned to the subpartition containing the place of the parent thread. The next S1 threads are assigned to the next subpartition, and so on, with wrap around with respect to the original place partition of the master thread. When P does not divide T evenly, the exact number of threads in a particular subpartition is implementation defined.


Note -  Wrap around is needed if the end of a place partition is reached before all thread assignments are done. For example, wrap around may be needed in the case of close and T <= P, if the master thread is assigned to a place other than the first place in the place partition. In this case, thread 1 is assigned to the place after the place of the master place, thread 2 is assigned to the place after that, and so on. The end of the place partition may be reached before all threads are assigned. In this case, assignment of threads is resumed with the first place in the place partition.