Sun Studio 12: Fortran Programming Guide

10.2.4 Automatic Parallelization With Reduction Operations

A computation that transforms an array into a scalar is called a reduction operation. Typical reduction operations are the sum or product of the elements of a vector. Reduction operations violate the criterion that calculations within a loop not change a scalar variable in a cumulative way across iterations.

Example: Reduction summation of the elements of a vector:


      s = 0.0
      do i = 1, 1000
        s = s + v(i)
      end do
      t(k) = s

However, for some operations, if reduction is the only factor that prevents parallelization, it is still possible to parallelize the loop. Common reduction operations occur so frequently that the compilers are capable of recognizing and parallelizing them as special cases.

Recognition of reduction operations is not included in the automatic parallelization analysis unless the -reduction compiler option is specified along with -autopar or -parallel.

If a parallelizable loop contains one of the reduction operations listed in Table 10–2, the compiler will parallelize it if -reduction is specified.

10.2.4.1 Recognized Reduction Operations

The following table lists the reduction operations that are recognized by the compiler.

Table 10–2 Recognized Reduction Operations

Mathematical Operations  

Fortran Statement Templates  

Sum 

s = s + v(i)

Product 

s = s * v(i)

Dot product 

s = s + v(i) * u(i)

Minimum 

s = amin( s, v(i))

Maximum 

s = amax( s, v(i))

OR

do i = 1, n

b = b .or. v(i)

end do

AND

b = .true.

do i = 1, n

b = b .and. v(i)

end do

Count of non-zero elements 

k = 0

do i = 1, n

if(v(i).ne.0) k = k + 1

end do

All forms of the MIN and MAX function are recognized.

10.2.4.2 Numerical Accuracy and Reduction Operations

Floating-point sum or product reduction operations may be inaccurate due to the following conditions:

In some situations, the error may not be acceptable.

Example: Roundoff, get the sum of 100,000 random numbers between– 1 and +1:


demo% cat t4.f
      parameter ( n = 100000 )
      double precision d_lcrans, lb / -1.0 /, s, ub / +1.0 /, v(n)
      s = d_lcrans ( v, n, lb, ub ) ! Get n random nos. between -1 and +1
      s = 0.0
      do i = 1, n
        s = s + v(i)
      end do
      write(*, ’(" s = ", e21.15)’) s
      end
demo% f95 -O4 -autopar -reduction t4.f

Results vary with the number of processors. The following table shows the sum of 100,000 random numbers between– 1 and +1.

Number of Processors  

Output  

s = 0.568582080884714E+02

s = 0.568582080884722E+02

s = 0.568582080884721E+02

s = 0.568582080884724E+02

In this situation, roundoff error on the order of 10-14 is acceptable for data that is random to begin with. For more information, see the Sun Numerical Computation Guide.