JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Solaris Studio 12.3: OpenMP API User's Guide     Oracle Solaris Studio 12.3 Information Library
search filter icon
search icon

Document Information

Preface

1.  Introducing the OpenMP API

2.  Compiling and Running OpenMP Programs

3.  Implementation-Defined Behaviors

4.  Nested Parallelism

5.  Tasking

6.  Automatic Scoping of Variables

7.  Scope Checking

8.  Performance Considerations

8.1 Some General Performance Recommendations

8.2 False Sharing And How To Avoid It

8.2.1 What Is False Sharing?

8.2.2 Reducing False Sharing

8.3 Oracle Solaris OS Tuning Features

A.  Placement of Clauses on Directives

Index

8.2 False Sharing And How To Avoid It

Careless use of shared memory structures with OpenMP applications can result in poor performance and limited scalability. Multiple processors updating adjacent shared data in memory can result in excessive traffic on the multiprocessor interconnect and, in effect, cause serialization of computations.

8.2.1 What Is False Sharing?

Most high performance processors, such as UltraSPARC processors, insert a cache buffer between slow memory and the high speed registers of the CPU. Accessing a memory location causes a slice of actual memory (a cache line) containing the memory location requested to be copied into the cache. Subsequent references to the same memory location or those around it can probably be satisfied out of the cache until the system determines it is necessary to maintain the coherency between cache and memory.

However, simultaneous updates of individual elements in the same cache line coming from different processors invalidate entire cache lines, even though these updates are logically independent of each other. Each update of an individual element of a cache line marks the line as invalid. Other processors accessing a different element in the same line see the line marked as invalid. They are forced to fetch a more recent copy of the line from memory or elsewhere, even though the element accessed has not been modified. This occurs because cache coherency is maintained on a cache-line basis and not for individual elements. As a result, interconnect traffic and overhead increases. Also, while the cache-line update is in progress, access to the elements in the line is inhibited.

This situation is called false sharing. If it occurs frequently, performance and scalability of an OpenMP application suffers significantly.

False sharing degrades performance when all of the following conditions occur:

Note that shared data that is read-only in a loop does not lead to false sharing.

8.2.2 Reducing False Sharing

Careful analysis of those parallel loops that play a major part in the execution of an application can reveal performance scalability problems caused by false sharing. In general, false sharing can be reduced by the following techniques:

In specific cases, the impact of false sharing might be less visible when dealing with larger problem sizes, as there might be less sharing.

Techniques for tackling false sharing are very much dependent on the particular application. In some cases, a change in the way the data is allocated can reduce false sharing. In other cases, changing the mapping of iterations to threads by giving each thread more work per chunk (by changing the chunksize value) can also lead to a reduction in false sharing.