A computation that transforms an array into a scalar is called a reduction operation. Typical reduction operations are the sum or product of the elements of a vector. Reduction operations violate the criterion that calculations within a loop not change a scalar variable in a cumulative way across iterations.
Example: Reduction summation of the elements of a vector:
s = 0.0 do i = 1, 1000 s = s + v(i) end do t(k) = s
However, for some operations, if the reduction is the only factor that prevents parallelization, it is still possible to parallelize the loop. Common reduction operations occur so frequently that the compilers are capable of recognizing and parallelizing them as special cases.
Recognition of reduction operations is not included in the automatic parallelization analysis unless the -reduction compiler option is specified along with -autopar or -parallel.
If a parallelizable loop contains one of the reduction operations listed in Table 10-3, the compiler will parallelize it if -reduction is specified.
The following table lists the reduction operations that are recognized by f77 and f90.
Table 10-3 Recognized Reduction Operations
Mathematical Operations |
Fortran Statement Templates |
---|---|
Sum of the elements | s = s + v(i) |
Product of the elements | s = s * v(i) |
Dot product of two vectors | s = s + v(i) * u(i) |
Minimum of the elements | s = amin( s, v(i)) |
Maximum of the elements | s = amax( s, v(i)) |
OR of the elements |
do i = 1, n b = b .or. v(i) end do |
AND of nonpositive elements |
b = .true. do i = 1, n if (v(i) .le. 0) b=b .and. v(i) end do |
Count nonzero elements |
k = 0 do i = 1, n if ( v(i) .ne. 0 ) k = k + 1 end do |
All forms of the MIN and MAX function are recognized.
Floating-point sum or product reduction operations may be inaccurate due to the following conditions:
The order in which the calculations were performed in parallel was not the same as when performed serially on a single processor.
The order of calculation affected the sum or product of floating-point numbers. Hardware floating-point addition and multiplication are not associative. Roundoff, overflow, or underflow errors may result depending on how the operands associate. For example, (X*Y)*Z and X*(Y*Z) may not have the same numerical significance.
In some situations, the error may not be acceptable.
Example: Overflow and underflow, with and without reduction:
demo% cat t3.f real A(10002), result, MAXFLOAT MAXFLOAT = r_max_normal() do 10 i = 1 , 10000, 2 A(i) = MAXFLOAT A(i+1) = -MAXFLOAT 10 continue A(5001)=-MAXFLOAT A(5002)=MAXFLOAT do 20 i = 1 ,10002 !Add up the array RESULT = RESULT + A(i) 20 continue write(6,*) RESULT end demo% setenv PARALLEL 2 {Number of processors is 2} demo% f77 -silent -autopar t3.f demo% a.out 0. {Without reduction, 0. is correct} demo% f77 -silent -autopar -reduction t3.f demo% a.out Inf {With reduction, Inf. is not correct} demo%
Example: Roundoff, get the sum of 100,000 random numbers between -1 and +1:
demo% cat t4.f parameter ( n = 100000 ) double precision d_lcrans, lb / -1.0 /, s, ub / +1.0 /, v(n) s = d_lcrans ( v, n, lb, ub ) ! Get n random nos. between -1 and +1 s = 0.0 do i = 1, n s = s + v(i) end do write(*, '(" s = ", e21.15)') s end demo% f77 -autopar -reduction t4.f
Results vary with the number of processors. The following table shows the sum of 100,000 random numbers between -1 and +1.
Number of Processors |
Output |
---|---|
1 | s = 0.568582080884714E+02 |
2 | s = 0.568582080884722E+02 |
3 | s = 0.568582080884721E+02 |
4 | s = 0.568582080884724E+02 |
In this situation, roundoff error on the order of 10-14 is acceptable for data that is random to begin with. For more information, see the Sun Numerical Computation Guide.