The following is a detailed tutorial on how to detect and fix data races with the Thread Analyzer. The tutorial is divided into the following sections:
This tutorial relies on two programs, both of which contain data races:
The first program finds prime numbers. It is written with C and is parallelized with OpenMP directives. The source file is called omp_prime.c.
The second program also finds prime number and is also written with C. However, it is parallelized with POSIX threads instead of OpenMP directives. The source file is called pthr_prime.c.
1 #include <stdio.h> 2 #include <math.h> 3 #include <omp.h> 4 5 #define THREADS 4 6 #define N 3000 7 8 int primes[N]; 9 int pflag[N]; 10 11 int is_prime(int v) 12 { 13 int i; 14 int bound = floor(sqrt ((double)v)) + 1; 15 16 for (i = 2; i < bound; i++) { 17 /* No need to check against known composites */ 18 if (!pflag[i]) 19 continue; 20 if (v % i == 0) { 21 pflag[v] = 0; 22 return 0; 23 } 24 } 25 return (v > 1); 26 } 27 28 int main(int argn, char **argv) 29 { 30 int i; 31 int total = 0; 32 33 #ifdef _OPENMP 34 omp_set_num_threads(THREADS); 35 omp_set_dynamic(0); 36 #endif 37 38 for (i = 0; i < N; i++) { 39 pflag[i] = 1; 40 } 41 42 #pragma omp parallel for 43 for (i = 2; i < N; i++) { 44 if ( is_prime(i) ) { 45 primes[total] = i; 46 total++; 47 } 48 } 49 printf("Number of prime numbers between 2 and %d: %d\n", 50 N, total); 51 for (i = 0; i < total; i++) { 52 printf("%d\n", primes[i]); 53 } 54 55 return 0; 56 }
1 #include <stdio.h> 2 #include <math.h> 3 #include <pthread.h> 4 5 #define THREADS 4 6 #define N 3000 7 8 int primes[N]; 9 int pflag[N]; 10 int total = 0; 11 12 int is_prime(int v) 13 { 14 int i; 15 int bound = floor(sqrt ((double)v)) + 1; 16 17 for (i = 2; i < bound; i++) { 18 /* No need to check against known composites */ 19 if (!pflag[i]) 20 continue; 21 if (v % i == 0) { 22 pflag[v] = 0; 23 return 0; 24 } 25 } 26 return (v > 1); 27 } 28 29 void *work(void *arg) 30 { 31 int start; 32 int end; 33 int i; 34 35 start = (N/THREADS) * (*(int *)arg) ; 36 end = start + N/THREADS; 37 for (i = start; i < end; i++) { 38 if ( is_prime(i) ) { 39 primes[total] = i; 40 total++; 41 } 42 } 43 return NULL; 44 } 45 46 int main(int argn, char **argv) 47 { 48 int i; 49 pthread_t tids[THREADS-1]; 50 51 for (i = 0; i < N; i++) { 52 pflag[i] = 1; 53 } 54 55 for (i = 0; i < THREADS-1; i++) { 56 pthread_create(&tids[i], NULL, work, (void *)&i); 57 } 58 59 i = THREADS-1; 60 work((void *)&i); 61 62 printf("Number of prime numbers between 2 and %d: %d\n", 63 N, total); 64 for (i = 0; i < total; i++) { 65 printf("%d\n", primes[i]); 66 } 67 68 return 0; 69 }
As noted in the2.1.1 Complete Listing of omp_prime.c, the order of memory accesses is non-deterministic when code contains a race condition and the computation gives different results from run to run. Each execution of omp_prime.c produces incorrect and inconsistent results because of the data races in the code. An example of the output is shown below:
% cc -xopenmp=noopt omp_prime.c -lm % a.out | sort -n 0 0 0 0 0 0 0 Number of prime numbers between 2 and 3000: 336 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 ... 2971 2999 % a.out | sort -n 0 0 0 0 0 0 0 0 0 Number of prime numbers between 2 and 3000: 325 3 5 7 13 17 19 23 29 31 41 43 47 61 67 71 73 79 83 89 101 ... 2971 2999
Similarly, as a result of data-races in pthr_prime.c, different runs of the program may produce incorrect and inconsistent results as shown below.
% cc pthr_prime.c -lm -mt % a.out | sort -n Number of prime numbers between 2 and 3000: 304 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 ... 2999 2999 % a.out | sort -n Number of prime numbers between 2 and 3000: 314 751 757 761 769 773 787 797 809 811 821 823 827 839 853 859 877 881 883 907 911 ... 2999 2999
The Thread Analyzer follows the same "collect-analyze" model that the Sun Studio Performance Analyzer uses. There are three steps involved in using the Thread Analyzer:
In order to enable data-race detection in a program, the source files must first be compiled with a special compiler option. This special option for the C, C++, and Fortran languages is: -xinstrument=datarace
Add the -xinstrument=datarace compiler option to the existing set of options you use to compile your program. You can apply the option to only the source files that you suspect to have data-races.
Be sure to specify -g when you compile your program. Do not specify a high level of optimization when compiling your program for race detection. Compile an OpenMP program with -xopenmp=noopt. The information reported, such as line numbers and callstacks, may be incorrect when a high optimization level is used.
The following are example commands for instrumenting the source code:
cc -xinstrument=datarace -g -mt pthr_prime.c
cc -xinstrument=datarace -g -xopenmp=noopt omp_prime.c
Use the collect command with the -r onflag to run the program and create a data-race-detection experiment during the execution of the process. For OpenMP programs, make sure that the number of threads used is larger than one. The following is an example command that creates a data-race experiment:
collect -r race./a.out
To increase the likelihood of detecting data-races, it is recommended that you create several data-race-detection experiments using collect with the -r race flag. Use a different number of threads and different input data in the different experiments.
You can examine a data-race-detection experiment with the Thread Analyzer, the Performance Analyzer, or the er_print utility. Both the Thread Analyzer and the Performance Analyzer present a GUI interface; the former presents a simplified set of default tabs, but is otherwise identical to the Performance Analyzer.
The Thread Analyzer GUI has a menu bar, a tool bar, and a split pane that contains tabs for the various displays. On the left-hand pane, the following three tabs are shown by default:
The Races tab shows a list of data-races detected in the program. This tab is selected by default.
The Dual Source tab shows the two source locations corresponding to the two accesses of a selected data-race. The source line where a data-race access occurred is highlighted.
The Experiments tab shows the load objects in the experiment, and lists error and warning messages.
On the right-hand pane of the Thread Analyzer display, the following two tabs are shown:
The Summary tab shows summary information about a data-race access selected from the Races tab.
The Race Details tab shows detailed information about a data-race trace selected from the Races tab.
The er_print utility, on the other hand, presents a command-line interface. The following subcommands are useful for examining races with the er_print utility:
-races: This reports any data races revealed in the experiment.
-rdetail race_id: This displays detailed information about the data-race with the specified race_id. If the specified race_id is "all", then detailed information about all data-races will be displayed.
-header: This displays descriptive information about the experiment, and reports any errors or warnings.
Refer to the collect.1, tha.1, analyzer.1, and er_print.1 man pages for more information.
This section shows how to use both the er_print command line and the Thread Analyzer GUI to display the following information about each detected data-race:
The unique ID of the data-race.
The virtual address, Vaddr
, associated
with the data-race. If there is more than one virtual address, then the label
Multiple Addresses is displayed in parentheses .
The memory accesses to the virtual address, Vaddr
by
two different threads. The type of the access (read or write) is shown, as
well as the function, offset, and line number in the source code where the
access occurred.
The total number of traces associated with the data-race. Each trace refers to the pair of thread callstacks at the time the two data-race accesses occurred. If you are using the GUI, the two callstacks will be displayed in the Race Details tab when an individual trace is selected. If you are using the er_print utility, the two callstacks will be displayed by the rdetail command.
% cc -xopenmp=noopt omp_prime.c -lm -xinstrument=datarace % collect -r race a.out | sort -n 0 0 0 0 0 0 0 0 0 0 ... 0 0 Creating experiment database test.1.er ... Number of prime numbers between 2 and 3000: 429 2 3 5 7 11 13 17 19 23 29 31 37 41 47 53 59 61 67 71 73 ... 2971 2999 % er_print test.1.er (er_print) races Total Races: 4 Experiment: test.1.er Race #1, Vaddr: 0xffbfeec4 Access 1: Read, main -- MP doall from line 42 [_$d1A42.main] + 0x00000060, line 45 in "omp_prime.c" Access 2: Write, main -- MP doall from line 42 [_$d1A42.main] + 0x0000008C, line 46 in "omp_prime.c" Total Traces: 2 Race #2, Vaddr: 0xffbfeec4 Access 1: Write, main -- MP doall from line 42 [_$d1A42.main] + 0x0000008C, line 46 in "omp_prime.c" Access 2: Write, main -- MP doall from line 42 [_$d1A42.main] + 0x0000008C, line 46 in "omp_prime.c" Total Traces: 1 Race #3, Vaddr: (Multiple Addresses) Access 1: Write, main -- MP doall from line 42 [_$d1A42.main] + 0x0000007C, line 45 in "omp_prime.c" Access 2: Write, main -- MP doall from line 42 [_$d1A42.main] + 0x0000007C, line 45 in "omp_prime.c" Total Traces: 1 Race #4, Vaddr: 0x21418 Access 1: Read, is_prime + 0x00000074, line 18 in "omp_prime.c" Access 2: Write, is_prime + 0x00000114, line 21 in "omp_prime.c" Total Traces: 1 (er_print)
The following screen-shot shows the races that were detected in omp_primes.c as displayed by the Thread Analyzer GUI. The command to invoke the GUI and load the experiment data is tha test.1.er.
There are four data-races in omp_primes.c:
Race number one: A data-race between a read from total on line 45 and a write to total on line 46.
Race number two: A data-race between a write to total on line 46 and another write to total on the same line.
Race number three: A data-race between a write to primes[]
on line 45 and another write to primes[]
on
the same line.
Race number four: A data-race between a read from pflag[]
on line 18 and a write to pflag[]
on
line 21.
% cc pthr_prime.c -lm -mt -xinstrument=datarace . % collect -r on a.out | sort -n Creating experiment database test.2.er ... of type "nfs", which may distort the measured performance. 0 0 0 0 0 0 0 0 0 0 ... 0 0 Creating experiment database test.2.er ... Number of prime numbers between 2 and 3000: 328 751 757 761 773 797 809 811 821 823 827 829 839 853 857 859 877 881 883 887 907 ... 2999 2999 % er_print test.2.er (er_print) races Total Races: 6 Experiment: test.2.er Race #1, Vaddr: 0x218d0 Access 1: Write, work + 0x00000154, line 40 in "pthr_prime.c" Access 2: Write, work + 0x00000154, line 40 in "pthr_prime.c" Total Traces: 3 Race #2, Vaddr: 0x218d0 Access 1: Read, work + 0x000000CC, line 39 in "pthr_prime.c" Access 2: Write, work + 0x00000154, line 40 in "pthr_prime.c" Total Traces: 3 Race #3, Vaddr: 0xffbfeec4 Access 1: Write, main + 0x00000204, line 55 in "pthr_prime.c" Access 2: Read, work + 0x00000024, line 35 in "pthr_prime.c" Total Traces: 2 Race #4, Vaddr: (Multiple Addresses) Access 1: Write, work + 0x00000108, line 39 in "pthr_prime.c" Access 2: Write, work + 0x00000108, line 39 in "pthr_prime.c" Total Traces: 1 Race #5, Vaddr: 0x23bfc Access 1: Write, is_prime + 0x00000210, line 22 in "pthr_prime.c" Access 2: Write, is_prime + 0x00000210, line 22 in "pthr_prime.c" Total Traces: 1 Race #6, Vaddr: 0x247bc Access 1: Write, work + 0x00000108, line 39 in "pthr_prime.c" Access 2: Read, main + 0x00000394, line 65 in "pthr_prime.c" Total Traces: 1 (er_print)
The following screen-shot shows the races detected in pthr_primes.c as displayed by the Thread Analyzer GUI. The command to invoke the GUI and load the experiment data is tha test.2.er.
There are six data-races in pthr_prime.c:
Race number one: A data-race between a write to total on line 40 and another write to total on the same line.
Race number two: A data-race between a read from total on line 39 and a write to total on line 40.
Race number three: A data-race between a write to i on line 55 and a read from i on line 35.
Race number four: A data-race between a write to primes[]
on line 39 and another write to primes[]
the
same line.
Race number five: A data-race between a write to pflag[]
on line 22 and another write to pflag[]
on
the same line
Race number six: A data-race between a write to primes[]
on line 39 and a read from primes[]
on
line 65.
One advantage of the GUI is that it allows you to see, side by side, the two source locations associated with a data-race. For example, select race number six for pthr_prime.c in the Races tab and then click on the Dual Source tab. You will see the following:
The first access for race number six (line 39) is shown in the top Race Source pane, while the second access for that data-race is shown in the bottom pane. Source lines 39 and 65, where the data-race accesses occurred, are highlighted. The default metric (Exclusive Race Accesses metric) is shown to the left of each source line. This metric gives a count of the number of times a data-race access was reported on that line.
This section provides a basic strategy to diagnosing the cause of data races.
A false positive data-race is a data-race that is reported by the Thread Analyzer, but has actually not occurred. The Thread Analyzer tries to reduce the number of false positives reported. However, there are cases where the tool is not able to do a precise job and may report false positive data-races.
You can ignore a false-positive data-race because it is not a genuine data-race and, therefore, does not affect the behavior of the program.
See 2.5 False Positives for some examples of false positive data-races. For information on how to remove false positive data-races from the report, see A.1 The Thread-Analyzer's User-APIs.
A benign data-race is an intentional data-race whose existence does not affect the correctness of the program.
Some multi-threaded applications intentionally use code that may cause data-races. Since the data-races are there by design, no fix is required. In some cases, however, it is quite tricky to get such codes to run correctly. These data-races should be reviewed carefully.
See 2.5 False Positives for more detailed information about benign races.
The Thread Analyzer can help find data-races in the program, but it cannot automatically find bugs in the program nor suggest ways to fix the data-races found. A data-race may have been introduced by a bug. It is important to find and fix the bug. Merely removing the data-race is not the right approach, and could make further debugging even more difficult. Fix the bug, not the data-race.
Here's how to fix the bug in omp_prime.c. See 2.1.1 Complete Listing of omp_prime.c for a complete file listing.
Move lines 45 and 46 into a critical section in order to remove the data-race between the read from total on line 45 and the write to total on line 46. The critical section protects the two lines and prevents the data-race. Here is the corrected code:
42 #pragma omp parallel for . 43 for (i = 2; i < N; i++) { 44 if ( is_prime(i) ) { #pragma omp critical { 45 primes[total] = i; 46 total++; } 47 } 48 }
Note that the addition of a single critical section also fixes two other
data races in omp_prime.c. It fixes the data-race on prime[]
at line 45, as well as the data-race on total at
line 46. The fourth data-race, between a read from pflag[]
from
line 18 and a write to pflag[]
from line 21, is actually
a benign race because it does not lead to incorrect results. It is not essential
to fix benign data-races.
You could also move lines 45 and 46 into a critical section as follows, but this change fails to correct the program:
42 #pragma omp parallel for . 43 for (i = 2; i < N; i++) { 44 if ( is_prime(i) ) { #pragma omp critical { 45 primes[total] = i; } #pragma omp critical { 46 total++; } 47 } 48 }
The critical sections around lines 45 and 46 get rid of the data-race
because the threads are not using any exclusive locks to control their accesses
to total. The critical section around line 46 ensures that
the computed value of total is correct. However, the program
is still incorrect. Two threads may update the same element of primes[]
using the same value of total. Moreover,
some elements in primes[]
may not be assigned a value
at all.
Here's how to fix the bug in pthr_prime.c. See 2.1.2 Complete Listing of pthr_prime.c for a complete file listing.
Use a single mutex to remove the data-race in pthr_prime.c between
the read from total on line 39 and the write to total on line 40. This addition also fixes two other data races in pthr_prime.c: the data-race on prime[]
at
line 39, as well as the data-race on total at line 40.
The data-race between the write to i on line 55 and
the read from i on line 35 and the data-race on pflag[]
on line 22, reveal a problem in the shared-access to the variable i by different threads. The initial thread in pthr_prime.c creates
the child threads in a loop (source lines 55-57), and dispatches them to work
on the function work(). The loop index i is
passed to work() by address. Since all threads access the
same memory location for i, the value of i for
each thread will not remain unique, but will change as the initial thread
increments the loop index. As different threads use the same value of i,
the data-races occur.
One way to fix the problem is to pass i to work() by value. This ensures that each thread has its own private copy
of i with a unique value. To remove the data-race on primes[]
between the write access on line 39 and the read access
on line 65, we can protect line 65 with the same mutex lock as the one used
above for lines 39 and 40. However, this is not the correct fix. The real
problem is that the main thread may report the result (lines 50 through 53)
while the child threads are still updating total and primes[]
in function work(). Using mutex locks
does not provide the proper ordering synchronization between the threads.
One correct fix is to let the main thread wait for all child threads to join
it before printing out the results.
Here is the corrected version of pthr_prime.c:
1 #include <stdio.h> 2 #include <math.h> 3 #include <pthread.h> 4 5 #define THREADS 4 6 #define N 3000 7 8 int primes[N]; 9 int pflag[N]; 10 int total = 0; 11 pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; 12 13 int is_prime(int v) 14 { 15 int i; 16 int bound = floor(sqrt(v)) + 1; 17 18 for (i = 2; i < bound; i++) { 19 /* no need to check against known composites */ 20 if (!pflag[i]) 21 continue; 22 if (v % i == 0) { 23 pflag[v] = 0; 24 return 0; 25 } 26 } 27 return (v > 1); 28 } 29 30 void *work(void *arg) 31 { 32 int start; 33 int end; 34 int i; 35 36 start = (N/THREADS) * ((int)arg) ; 37 end = start + N/THREADS; 38 for (i = start; i < end; i++) { 39 if ( is_prime(i) ) { 40 pthread_mutex_lock(&mutex); 41 primes[total] = i; 42 total++; 43 pthread_mutex_unlock(&mutex); 44 } 45 } 46 return NULL; 47 } 48 49 int main(int argn, char **argv) 50 { 51 int i; 52 pthread_t tids[THREADS-1]; 53 54 for (i = 0; i < N; i++) { 55 pflag[i] = 1; 56 } 57 58 for (i = 0; i < THREADS-1; i++) { 59 pthread_create(&tids[i], NULL, work, (void *)i); 60 } 61 62 i = THREADS-1; 63 work((void *)i); 64 65 for (i = 0; i < THREADS-1; i++) { 66 pthread_join(tids[i], NULL); 67 } 68 69 printf("Number of prime numbers between 2 and %d: %d\n", 70 N, total); 71 for (i = 0; i < total; i++) { 72 printf("%d\n", primes[i]); 73 } 74 }
Occasionally, the Thread Analyzer may report data-races that have not actually occurred in the program. These are called false positives. In most cases, false positives are caused by 2.5.1 User-Defined Synchronizations or 2.5.2 Memory That is Recycled by Different Threads.
The Thread Analyzer can recognize most standard synchronization APIs and constructs provided by OpenMP, POSIX threads, and Solaris threads. However, the tool cannot recognize user-defined synchronizations, and may report false data-races if your code contains such synchronizations. For example, the tool cannot recognize implementation of locks using CAS instructions, post and wait operations using busy-waits, etc. Here is a typical example of a class of false positives where the program employs a common way of using POSIX thread condition variables:
/* Initially ready_flag is 0 */ /* Thread 1: Producer */ 100 data = ... 101 pthread_mutex_lock (&mutex); 102 ready_flag = 1; 103 pthread_cond_signal (&cond); 104 pthread_mutex_unlock (&mutex); ... /* Thread 2: Consumer */ 200 pthread_mutex_lock (&mutex); 201 while (!ready_flag) { 202 pthread_cond_wait (&cond, &mutex); 203 } 204 pthread_mutex_unlock (&mutex); 205 ... = data;
The pthread_cond_wait() call is usually made within a loop that tests the predicate to protect against program errors and spurious wake-ups. The test and set of the predicate is often protected by a mutex lock. In the above code, Thread 1 produces the value for the variable data at line 100, sets the value of ready_flag to one at line 102 to indicate that the data has been produced, and then calls pthread_cond_signal() to wake up the consumer thread, Thread 2. Thread 2 tests the predicate (!ready_flag) in a loop. When it finds that the flag is set, it consumes the data at line 205.
The write of ready_flag at line 102 and read of ready_flag at line 201 are protected by the same mutex lock, so there is no data-race between the two accesses and the tool recognizes that correctly.
The write of data at line 100 and the read of data at line 205 are not protected by mutex locks. However, in the program logic, the read at line 205 always happens after the write at line 100 because of the flag variable ready_flag. Consequently, there is no data-race between these two accesses to data. However, the tool reports that there is a data-race between the two accesses if the call to pthread_cond_wait() (line 202) is actually not called at run time. If line 102 is executed before line 201 is ever executed, then when line 201 is executed, the loop entry test fails and line 202 is skipped. The tool monitors pthread_cond_signal() calls and pthread_cond_wait() calls and can pair them to derive synchronization. When the pthread_cond_wait() at line 202 is not called, the tool does not know that the write at line 100 is always executed before the read at line 205. Therefore, it considers them as executed concurrently and reports a data-race between them.
In order to avoid reporting this kind of false positive data-race, the Thread Analyzer provides a set of APIs that can be used to notify the tool when user-defined synchronizations are performed. See A.1 The Thread-Analyzer's User-APIs for more information.
Some memory management routines recycle memory that is freed by one thread for use by another thread. The Thread Analyzer is sometimes not able to recognize that the life span of the same memory location used by different threads do not overlap. When this happens, the tool may report a false positive data-race. The following example illustrates this kind of false positive.
/*----------*/ /*----------*/ /* Thread 1 */ /* Thread 2 */ /*----------*/ /*----------*/ ptr1 = mymalloc(sizeof(data_t)); ptr1->data = ... ... myfree(ptr1); ptr2 = mymalloc(sizeof(data_t)); ptr2->data = ... ... myfree(ptr2);
Thread 1 and Thread 2 execute concurrently. Each thread allocates a chunk of memory that is used as its private memory. The routine mymalloc() may supply the memory freed by a previous call tomyfree(). If Thread 2 calls mymalloc() before Thread 1 calls myfree(), then ptr1 and ptr2 get different values and there is no data-race between the two threads. However, if Thread 2 calls mymalloc() after Thread 1 calls myfree(), then ptr1 and ptr2 may have the same value. There is no data-race because Thread 1 no longer accesses that memory. However, if the tool does not know mymalloc() is recycling memory, it reports a data-race between the write of ptr1 data and the write of ptr2 data. This kind of false positive often happens in C++ applications when the C++ runtime library recycles memory for temporary variables. It also often happens in user applications that implement their own memory management routines. Currently, the Thread Analyzer is able to recognize memory allocation and free operations performed with the standard malloc(), calloc(), and realloc() interfaces.
Some multi-threaded applications intentionally allow data-races in order to get better performance. A benign data-race is an intentional data-race whose existence does not affect the correctness of the program. The following examples demonstrate benign data races.
In addition to benign data-races, a large class of applications allow data-races because they rely on lock-free and wait-free algorithms which are difficult to design correctly. The Thread Analyzer can help determine the locations of data-races in these applications.
The threads in the following file, omp_prime.c check whether an integer is a prime number by executing the function is_prime().
11 int is_prime(int v) 12 { 13 int i; 14 int bound = floor(sqrt ((double)v)) + 1; 15 16 for (i = 2; i < bound; i++) { 17 /* No need to check against known composites */ 18 if (!pflag[i]) 19 continue; 20 if (v % i == 0) { 21 pflag[v] = 0; 22 return 0; 23 } 24 } 25 return (v > 1); 26 }
The Thread Analyzer reports that there is a data-race between the write
to pflag[]
on line 21 and the read of pflag[]
on line 18. However, this data-race is benign as it does not
affect the correctness of the final result. At line 18, a thread checks whether
or not pflag[i]
, for a given value of i is
equal to zero. If pflag[i]
is equal to zero, that
means that i is a known composite number (in other words, i is known to be non-prime). Consequently, there is no need to check
whether v is divisible by i; we only
need to check whether or not v is divisible by some prime
number. Therefore, if pflag[i]
is equal to zero,
the thread continues to the next value of i. If pflag[i]
is not equal to zero and v is divisible by i, the thread assigns zero to pflag[v]
to
indicate that v is not a prime number.
It does not matter, from a correctness point of view, if multiple threads
check the same pflag[]
element and write to it concurrently.
The initial value of a pflag[]
element is one. When
the threads update that element, they assign it the value zero. That is, the
threads store zero in the same bit in the same byte of memory for that element.
On current architectures, it is safe to assume that those stores are atomic.
This means that, when that element is read by a thread, the value read is
either one or zero. If a thread checks a given pflag[]
element
(line 18) before it has been assigned the value zero, it then executes lines
20-23. If, in the meantime, another thread assigns zero to that same pflag[]
element (line 21), the final result is not changed. Essentially,
this means that the first thread executed lines 20-23 unnecessarily.
A group of threads call check_bad_array() concurrently to check whether any element of array data_array is corrupt. Each thread checks a different section of the array. If a thread finds that an element is corrupt, it sets the value of a global shared variable is_bad to true.
20 volatile int is_bad = 0; ... 100 /* 101 * Each thread checks its assigned portion of data_array, and sets 102 * the global flag is_bad to 1 once it finds a bad data element. 103 */ 104 void check_bad_array(volatile data_t *data_array, unsigned int thread_id) 105 { 106 int i; 107 for (i=my_start(thread_id); i<my_end(thread_id); i++) { 108 if (is_bad) 109 return; 110 else { 111 if (is_bad_element(data_array[i])) { 112 is_bad = 1; 113 return; 114 } 115 } 116 } 117 }
There is a data-race between the read of is_bad on line 108 and the write to is_bad on line 112. However, the data-race does not affect the correctness of the final result.
The initial value of is_bad is zero. When the threads update is_bad, they assign it the value one. That is, the threads store one in the same bit in the same byte of memory for is_bad. On current architectures, it is safe to assume that those stores are atomic. Therefore, when is_bad is read by a thread, the value read will either be zero or one. If a thread checks is_bad (line 108) before it has been assigned the value one, then it continues executing the for loop. If, in the meantime, another thread has assigned the value one to is_bad (line 112), that does not change the final result. It just means that the thread executed the for loop longer than necessary.
A singleton ensures that only one object of a certain type exists throughout the program. Double-checked locking is a common, efficient way to initialize a singleton in multi-threaded applications. The following code illustrates such an implementation.
100 class Singleton { 101 public: 102 static Singleton* instance(); 103 ... 104 private: 105 static Singleton* ptr_instance; 106 }; ... 200 Singleton* Singleton::ptr_instance = 0; ... 300 Singleton* Singleton::instance() { 301 Singleton *tmp = ptr_instance; 302 memory_barrier(); 303 if (tmp == NULL) { 304 Lock(); 305 if (ptr_instance == NULL) { 306 tmp = new Singleton; 307 memory_barrier(); 308 ptr_instance = tmp; 309 } 310 Unlock(); 311 } 312 return tmp; 313 }
The read of ptr_instance (line 301) is intentionally not protected by a lock. This makes the check to determine whether or not the singleton has already been instantiated in a multi-threaded environment efficient. Notice that there is a data-race on variable ptr_instance between the read on line 301 and the write on line 308, but the program works correctly. However, writing a correct program that allows data-races is a difficult task. For example, in the above double-checked-locking code, the calls to memory_barrier() at lines 302 and 307 are used to ensure that the singleton and ptr_instance are set, and read, in the proper order. Consequently, all threads read them consistently. This programming technique will not work if the memory barriers are not used.