Sun Studio 12: Thread Analyzer User's Guide

2.6 Benign Data-Races

Some multi-threaded applications intentionally allow data-races in order to get better performance. A benign data-race is an intentional data-race whose existence does not affect the correctness of the program. The following examples demonstrate benign data races.


Note –

In addition to benign data-races, a large class of applications allow data-races because they rely on lock-free and wait-free algorithms which are difficult to design correctly. The Thread Analyzer can help determine the locations of data-races in these applications.


2.6.1 A Program for Finding Primes

The threads in the following file, omp_prime.c check whether an integer is a prime number by executing the function is_prime().

11 int is_prime(int v)
    12 {
    13     int i;
    14     int bound = floor(sqrt ((double)v)) + 1;
    15      
    16     for (i = 2; i < bound; i++) {
    17         /* No need to check against known composites */ 
    18         if (!pflag[i]) 
    19             continue;
    20         if (v % i == 0) { 
    21             pflag[v] = 0;
    22             return 0;
    23         }
    24     }
    25     return (v > 1); 
    26 }

The Thread Analyzer reports that there is a data-race between the write to pflag[] on line 21 and the read of pflag[] on line 18. However, this data-race is benign as it does not affect the correctness of the final result. At line 18, a thread checks whether or not pflag[i], for a given value of i is equal to zero. If pflag[i] is equal to zero, that means that i is a known composite number (in other words, i is known to be non-prime). Consequently, there is no need to check whether v is divisible by i; we only need to check whether or not v is divisible by some prime number. Therefore, if pflag[i] is equal to zero, the thread continues to the next value of i. If pflag[i] is not equal to zero and v is divisible by i, the thread assigns zero to pflag[v] to indicate that v is not a prime number.

It does not matter, from a correctness point of view, if multiple threads check the same pflag[] element and write to it concurrently. The initial value of a pflag[] element is one. When the threads update that element, they assign it the value zero. That is, the threads store zero in the same bit in the same byte of memory for that element. On current architectures, it is safe to assume that those stores are atomic. This means that, when that element is read by a thread, the value read is either one or zero. If a thread checks a given pflag[] element (line 18) before it has been assigned the value zero, it then executes lines 20-23. If, in the meantime, another thread assigns zero to that same pflag[] element (line 21), the final result is not changed. Essentially, this means that the first thread executed lines 20-23 unnecessarily.

2.6.2 A Program that Verifies Array-Value Types

A group of threads call check_bad_array() concurrently to check whether any element of array data_array is corrupt. Each thread checks a different section of the array. If a thread finds that an element is corrupt, it sets the value of a global shared variable is_bad to true.

20  volatile int is_bad = 0;
 ...

 100  /* 
 101   * Each thread checks its assigned portion of data_array, and sets 
 102   * the global flag is_bad to 1 once it finds a bad data element.
 103   */
 104  void check_bad_array(volatile data_t *data_array, unsigned int thread_id)    
 105  {
 106     int i;
 107     for (i=my_start(thread_id); i<my_end(thread_id); i++) {
 108          if (is_bad) 
 109              return;
 110          else {
 111              if (is_bad_element(data_array[i])) { 
 112                  is_bad = 1;
 113                  return;
 114              }
 115          }
 116     }
 117  }

There is a data-race between the read of is_bad on line 108 and the write to is_bad on line 112. However, the data-race does not affect the correctness of the final result.

The initial value of is_bad is zero. When the threads update is_bad, they assign it the value one. That is, the threads store one in the same bit in the same byte of memory for is_bad. On current architectures, it is safe to assume that those stores are atomic. Therefore, when is_bad is read by a thread, the value read will either be zero or one. If a thread checks is_bad (line 108) before it has been assigned the value one, then it continues executing the for loop. If, in the meantime, another thread has assigned the value one to is_bad (line 112), that does not change the final result. It just means that the thread executed the for loop longer than necessary.

2.6.3 A Program Using Double-Checked Locking

A singleton ensures that only one object of a certain type exists throughout the program. Double-checked locking is a common, efficient way to initialize a singleton in multi-threaded applications. The following code illustrates such an implementation.

100 class Singleton {
 101     public:
 102     static Singleton* instance();
 103     ...
 104     private:
 105     static Singleton* ptr_instance;
 106 };
 ...

 200 Singleton* Singleton::ptr_instance = 0;
 ...

 300 Singleton* Singleton::instance() {
 301     Singleton *tmp = ptr_instance;
 302     memory_barrier();
 303     if (tmp == NULL) {
 304         Lock();
 305         if (ptr_instance == NULL) {
 306             tmp = new Singleton;
 307             memory_barrier();
 308             ptr_instance = tmp;
 309         }
 310         Unlock();
 311     }
 312     return tmp;
 313 }

The read of ptr_instance (line 301) is intentionally not protected by a lock. This makes the check to determine whether or not the singleton has already been instantiated in a multi-threaded environment efficient. Notice that there is a data-race on variable ptr_instance between the read on line 301 and the write on line 308, but the program works correctly. However, writing a correct program that allows data-races is a difficult task. For example, in the above double-checked-locking code, the calls to memory_barrier() at lines 302 and 307 are used to ensure that the singleton and ptr_instance are set, and read, in the proper order. Consequently, all threads read them consistently. This programming technique will not work if the memory barriers are not used.