Some multi-threaded applications intentionally allow data-races in order to get better performance. A benign data-race is an intentional data-race whose existence does not affect the correctness of the program. The following examples demonstrate benign data races.
In addition to benign data-races, a large class of applications allow data-races because they rely on lock-free and wait-free algorithms which are difficult to design correctly. The Thread Analyzer can help determine the locations of data-races in these applications.
The threads in the following file, omp_prime.c check whether an integer is a prime number by executing the function is_prime().
11 int is_prime(int v) 12 { 13 int i; 14 int bound = floor(sqrt ((double)v)) + 1; 15 16 for (i = 2; i < bound; i++) { 17 /* No need to check against known composites */ 18 if (!pflag[i]) 19 continue; 20 if (v % i == 0) { 21 pflag[v] = 0; 22 return 0; 23 } 24 } 25 return (v > 1); 26 }
The Thread Analyzer reports that there is a data-race between the write
to pflag[]
on line 21 and the read of pflag[]
on line 18. However, this data-race is benign as it does not
affect the correctness of the final result. At line 18, a thread checks whether
or not pflag[i]
, for a given value of i is
equal to zero. If pflag[i]
is equal to zero, that
means that i is a known composite number (in other words, i is known to be non-prime). Consequently, there is no need to check
whether v is divisible by i; we only
need to check whether or not v is divisible by some prime
number. Therefore, if pflag[i]
is equal to zero,
the thread continues to the next value of i. If pflag[i]
is not equal to zero and v is divisible by i, the thread assigns zero to pflag[v]
to
indicate that v is not a prime number.
It does not matter, from a correctness point of view, if multiple threads
check the same pflag[]
element and write to it concurrently.
The initial value of a pflag[]
element is one. When
the threads update that element, they assign it the value zero. That is, the
threads store zero in the same bit in the same byte of memory for that element.
On current architectures, it is safe to assume that those stores are atomic.
This means that, when that element is read by a thread, the value read is
either one or zero. If a thread checks a given pflag[]
element
(line 18) before it has been assigned the value zero, it then executes lines
20-23. If, in the meantime, another thread assigns zero to that same pflag[]
element (line 21), the final result is not changed. Essentially,
this means that the first thread executed lines 20-23 unnecessarily.
A group of threads call check_bad_array() concurrently to check whether any element of array data_array is corrupt. Each thread checks a different section of the array. If a thread finds that an element is corrupt, it sets the value of a global shared variable is_bad to true.
20 volatile int is_bad = 0; ... 100 /* 101 * Each thread checks its assigned portion of data_array, and sets 102 * the global flag is_bad to 1 once it finds a bad data element. 103 */ 104 void check_bad_array(volatile data_t *data_array, unsigned int thread_id) 105 { 106 int i; 107 for (i=my_start(thread_id); i<my_end(thread_id); i++) { 108 if (is_bad) 109 return; 110 else { 111 if (is_bad_element(data_array[i])) { 112 is_bad = 1; 113 return; 114 } 115 } 116 } 117 }
There is a data-race between the read of is_bad on line 108 and the write to is_bad on line 112. However, the data-race does not affect the correctness of the final result.
The initial value of is_bad is zero. When the threads update is_bad, they assign it the value one. That is, the threads store one in the same bit in the same byte of memory for is_bad. On current architectures, it is safe to assume that those stores are atomic. Therefore, when is_bad is read by a thread, the value read will either be zero or one. If a thread checks is_bad (line 108) before it has been assigned the value one, then it continues executing the for loop. If, in the meantime, another thread has assigned the value one to is_bad (line 112), that does not change the final result. It just means that the thread executed the for loop longer than necessary.
A singleton ensures that only one object of a certain type exists throughout the program. Double-checked locking is a common, efficient way to initialize a singleton in multi-threaded applications. The following code illustrates such an implementation.
100 class Singleton { 101 public: 102 static Singleton* instance(); 103 ... 104 private: 105 static Singleton* ptr_instance; 106 }; ... 200 Singleton* Singleton::ptr_instance = 0; ... 300 Singleton* Singleton::instance() { 301 Singleton *tmp = ptr_instance; 302 memory_barrier(); 303 if (tmp == NULL) { 304 Lock(); 305 if (ptr_instance == NULL) { 306 tmp = new Singleton; 307 memory_barrier(); 308 ptr_instance = tmp; 309 } 310 Unlock(); 311 } 312 return tmp; 313 }
The read of ptr_instance (line 301) is intentionally not protected by a lock. This makes the check to determine whether or not the singleton has already been instantiated in a multi-threaded environment efficient. Notice that there is a data-race on variable ptr_instance between the read on line 301 and the write on line 308, but the program works correctly. However, writing a correct program that allows data-races is a difficult task. For example, in the above double-checked-locking code, the calls to memory_barrier() at lines 302 and 307 are used to ensure that the singleton and ptr_instance are set, and read, in the proper order. Consequently, all threads read them consistently. This programming technique will not work if the memory barriers are not used.