Controlling Thread Affinity in OpenMP 4.0 - Oracle® Solaris Studio 12.4: OpenMP API User's Guide

Language:

5.2.1 Controlling Thread Affinity in OpenMP 4.0

This section provides details about Section 2.5.2, "Controlling OpenMP Thread Affinity", in the OpenMP 4.0 specification.

When a thread encounters a parallel construct that includes a proc_bind clause, the OMP_PROC_BIND environment variable is used to determine the policy for binding threads to places. If the parallel construct includes a proc_bind clause, then the binding policy specified by the proc_bind clause overrides the policy specified by OMP_PROC_BIND. Once a thread in the team is assigned to a place, the implementation does not move it to another place.

The master thread affinity policy instructs the execution environment to assign every thread in the team to the same place as the master thread. The place partition is not changed by this policy, and each implicit task inherits the place-partition-var Internal Control Variable (ICV) of the parent implicit task.

The close thread affinity policy instructs the execution environment to assign the threads in the team to places close to the place of the parent thread. The place partition is not changed by this policy, and each implicit task inherits the place-partition-var ICV of the parent implicit task. If T is the number of threads in the team, and P is the number of places in the parent's place partition, then the assignment of threads in the team to places is as follows:

T <= P. The master thread executes on the place of the parent thread, that is, the thread that encountered the parallel construct. The thread with the next smallest thread number executes on the next place in the place partition, and so on, with wrap around with respect to the place partition of the master thread.
T > P. Each place P will contain S_p threads with consecutive thread numbers, where floor(T/P) <= S_p<= ceiling(T/P). The first S₀ threads (including the master thread) are assigned to the place of the parent thread. The next S₁ threads are assigned to the next place in the place partition, and so on, with wrap around with respect to the place partition of the master thread. When P does not divide T evenly, the exact number of threads in a particular place is implementation defined.

The purpose of the spread thread affinity policy is to create a sparse distribution for a team of T threads among the P places of the parent's place partition. A sparse distribution is achieved by first subdividing the parent partition into T subpartitions if T <= P, or P subpartitions if T > P. Then one thread (T <= P) or a set of threads (T > P) is assigned to each subpartition. The place-partition-var ICV of each implicit task is set to its subpartition. The subpartitioning is not only a mechanism for achieving a sparse distribution, it also defines a subset of places for a thread to use when creating a nested parallel region. The assignment of threads to places is as follows:

T <= P. The parent thread's place partition is split into T subpartitions, where each subpartition contains floor(P/T) or ceiling(P/T) consecutive places. A single thread is assigned to each subpartition. The master thread executes on the place of the parent thread and is assigned to the subpartition that includes that place. The thread with the next smallest thread number is assigned to the first place in the next subpartition, and so on, with wrap around with respect to the original place partition of the master thread.
T > P. The parent thread's place partition is split into P subpartitions, each consisting of a single place. Each subpartition is assigned S_p threads with consecutive thread numbers, where floor(T/P) <= S_p <= ceiling(T/P). The first S₀ threads (including the master thread) are assigned to the subpartition containing the place of the parent thread. The next S₁ threads are assigned to the next subpartition, and so on, with wrap around with respect to the original place partition of the master thread. When P does not divide T evenly, the exact number of threads in a particular subpartition is implementation defined.

Note - Wrap around is needed if the end of a place partition is reached before all thread assignments are done. For example, wrap around may be needed in the case of close and T <= P, if the master thread is assigned to a place other than the first place in the place partition. In this case, thread 1 is assigned to the place after the place of the master place, thread 2 is assigned to the place after that, and so on. The end of the place partition may be reached before all threads are assigned. In this case, assignment of threads is resumed with the first place in the place partition.