Sun Studio 12: Thread Analyzer User's Guide

2.4 Diagnosing the Cause of a Data Race

This section provides a basic strategy to diagnosing the cause of data races.

2.4.1 Check Whether or Not the Data Race is a False Positive

A false positive data-race is a data-race that is reported by the Thread Analyzer, but has actually not occurred. The Thread Analyzer tries to reduce the number of false positives reported. However, there are cases where the tool is not able to do a precise job and may report false positive data-races.

You can ignore a false-positive data-race because it is not a genuine data-race and, therefore, does not affect the behavior of the program.

See 2.5 False Positives for some examples of false positive data-races. For information on how to remove false positive data-races from the report, see A.1 The Thread-Analyzer's User-APIs.

2.4.2 Check Wether or Not the Data Race is Benign

A benign data-race is an intentional data-race whose existence does not affect the correctness of the program.

Some multi-threaded applications intentionally use code that may cause data-races. Since the data-races are there by design, no fix is required. In some cases, however, it is quite tricky to get such codes to run correctly. These data-races should be reviewed carefully.

See 2.5 False Positives for more detailed information about benign races.

2.4.3 Fix the Bug, Not the Data Race

The Thread Analyzer can help find data-races in the program, but it cannot automatically find bugs in the program nor suggest ways to fix the data-races found. A data-race may have been introduced by a bug. It is important to find and fix the bug. Merely removing the data-race is not the right approach, and could make further debugging even more difficult. Fix the bug, not the data-race.

2.4.3.1 Fixing Bugs in omp_prime.c

Here's how to fix the bug in omp_prime.c. See 2.1.1 Complete Listing of omp_prime.c for a complete file listing.

Move lines 45 and 46 into a critical section in order to remove the data-race between the read from total on line 45 and the write to total on line 46. The critical section protects the two lines and prevents the data-race. Here is the corrected code:

42      #pragma omp parallel for         .
43      for (i = 2; i < N; i++) {
44          if ( is_prime(i) ) {
                 #pragma omp critical

                 {          
 45                 primes[total] = i;
 46                 total++;
                 }
 47          }
 48      }

Note that the addition of a single critical section also fixes two other data races in omp_prime.c. It fixes the data-race on prime[] at line 45, as well as the data-race on total at line 46. The fourth data-race, between a read from pflag[] from line 18 and a write to pflag[] from line 21, is actually a benign race because it does not lead to incorrect results. It is not essential to fix benign data-races.

You could also move lines 45 and 46 into a critical section as follows, but this change fails to correct the program:

42      #pragma omp parallel for         .
43      for (i = 2; i < N; i++) {
44          if ( is_prime(i) ) {
                 #pragma omp critical
                 {          
45                 primes[total] = i;
                 }
                 #pragma omp critical
                 {

46                 total++;
                 }
47          }
48      }

The critical sections around lines 45 and 46 get rid of the data-race because the threads are not using any exclusive locks to control their accesses to total. The critical section around line 46 ensures that the computed value of total is correct. However, the program is still incorrect. Two threads may update the same element of primes[] using the same value of total. Moreover, some elements in primes[] may not be assigned a value at all.

2.4.3.2 Fixing Bugs in pthr_prime.c

Here's how to fix the bug in pthr_prime.c. See 2.1.2 Complete Listing of pthr_prime.c for a complete file listing.

Use a single mutex to remove the data-race in pthr_prime.c between the read from total on line 39 and the write to total on line 40. This addition also fixes two other data races in pthr_prime.c: the data-race on prime[] at line 39, as well as the data-race on total at line 40.

The data-race between the write to i on line 55 and the read from i on line 35 and the data-race on pflag[] on line 22, reveal a problem in the shared-access to the variable i by different threads. The initial thread in pthr_prime.c creates the child threads in a loop (source lines 55-57), and dispatches them to work on the function work(). The loop index i is passed to work() by address. Since all threads access the same memory location for i, the value of i for each thread will not remain unique, but will change as the initial thread increments the loop index. As different threads use the same value of i, the data-races occur.

One way to fix the problem is to pass i to work() by value. This ensures that each thread has its own private copy of i with a unique value. To remove the data-race on primes[] between the write access on line 39 and the read access on line 65, we can protect line 65 with the same mutex lock as the one used above for lines 39 and 40. However, this is not the correct fix. The real problem is that the main thread may report the result (lines 50 through 53) while the child threads are still updating total and primes[] in function work(). Using mutex locks does not provide the proper ordering synchronization between the threads. One correct fix is to let the main thread wait for all child threads to join it before printing out the results.

Here is the corrected version of pthr_prime.c:

1	#include <stdio.h>
2	#include <math.h>
3	#include <pthread.h>
4	
5	#define THREADS 4
6	#define N 3000
7	
8	int primes[N];
9	int pflag[N];
10	int total = 0;
11	pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
12	
13	int is_prime(int v)
14	{
15	    int i;
16	    int bound = floor(sqrt(v)) + 1;
17	
18	    for (i = 2; i < bound; i++) {
19	        /* no need to check against known composites */ 
20	        if (!pflag[i])
21	            continue;
22	        if (v % i == 0) {
23	            pflag[v] = 0;
24	            return 0;
25	        }
26	    }
27	    return (v > 1); 
28	}
29	
30	void *work(void *arg)
31	{
32	    int start;
33	    int end;
34	    int i;
35	    
36	    start = (N/THREADS) * ((int)arg) ;
37	    end = start + N/THREADS;
38	    for (i = start; i < end; i++) {
39	        if ( is_prime(i) ) {
40	            pthread_mutex_lock(&mutex);
41	            primes[total] = i;
42	            total++;
43	            pthread_mutex_unlock(&mutex);
44	        }
45	    }
46	    return NULL;
47	}
48	
49	int main(int argn, char **argv)
50	{
51	    int i;
52	    pthread_t tids[THREADS-1];
53	
54	    for (i = 0; i < N; i++) {
55	        pflag[i] = 1; 
56	    }
57	
58	    for (i = 0; i < THREADS-1; i++) {
59	        pthread_create(&tids[i], NULL, work, (void *)i);
60	    }
61	
62	    i = THREADS-1;
63	    work((void *)i);
64	    
65	    for (i = 0; i < THREADS-1; i++) {
66	        pthread_join(tids[i], NULL);
67	    }
68	
69	    printf("Number of prime numbers between 2 and %d: %d\n",
70	           N, total);
71	    for (i = 0; i < total; i++) {
72	        printf("%d\n", primes[i]);
73	    }
74	}