Avoid False Sharing - Oracle® Developer Studio 12.6: OpenMP API User's Guide

Language:

8.2 Avoid False Sharing

Careless use of shared memory structures with OpenMP applications can result in poor performance and limited scalability. Multiple processors updating adjacent shared data in memory can result in excessive traffic on the multiprocessor interconnect and, in effect, cause serialization of computations.

8.2.1 What Is False Sharing?

In most shared memory multiprocessor computers, each processor has its own local cache. The cache acts as a buffer between slow memory and the high speed registers of the processor. Accessing a memory location causes a slice of actual memory (a cache line) containing the memory location requested to be copied into the cache. Subsequent references to the same memory location or to those around it are satisfied out of the cache until the system determines it is necessary to maintain the coherency between cache and memory.

False sharing occurs when threads on different processors modify variables that reside on the same cache line. This situation is called false sharing (to distinguish it from true sharing) because the threads are not accessing the same variable, but rather are accessing different variables that happen to reside on the same cache line.

When a thread modifies a variable in its cache, the whole cache line on which the variable resides is marked as invalid. If another thread attempts to access a variable on the same cache line, then the modified cache line is written back to memory and the thread fetches the cache line from memory. This occurs because cache coherency is maintained on a cache-line basis and not for individual variables or elements. With false sharing, a thread is forced to fetch a more recent copy of a cache line from memory, even though the variable it is attempting to access has not been modified.

If false sharing occurs frequently, interconnect traffic increases, and the performance and scalability of an OpenMP application suffer significantly. False sharing degrades performance when all of the following conditions occur:

Shared data is modified by multiple threads
Multiple threads modify data within the same cache line
Data is modified very frequently (as in a tight loop)

Note that accessing shared data that is read-only does not lead to false sharing.

8.2.2 Reducing False Sharing

False sharing can typically be detected when accesses to certain variables seem particularly expensive. Careful analysis of parallel loops that play a major part in the execution of an application can reveal performance scalability problems caused by false sharing.

In general, false sharing can be reduced using the following techniques:

Make use of private or threadprivate data as much as possible.
Use the compiler’s optimization features to eliminate memory loads and stores.
Pad data structures so that each thread's data resides on a different cache line. The size of the padding is system-dependent, and is the size needed to push a thread's data onto a separate cache line.
Modify data structures so there is less sharing of data among the threads.

Techniques for tackling false sharing are very much dependent on the particular application. In some cases, a change in the way the data is allocated can reduce false sharing. In other cases, changing the mapping of iterations to threads by giving each thread more work per chunk (by changing the chunk_size value) can also lead to a reduction in false sharing.