Automatic Parallelization With Reduction Operations (Fortran Programming Guide)

Fortran Programming Guide

Automatic Parallelization With Reduction Operations

A computation that transforms an array into a scalar is called a reduction operation. Typical reduction operations are the sum or product of the elements of a vector. Reduction operations violate the criterion that calculations within a loop not change a scalar variable in a cumulative way across iterations.

Example: Reduction summation of the elements of a vector:

      s = 0.0
      do i = 1, 1000
        s = s + v(i)
      end do
      t(k) = s

However, for some operations, if the reduction is the only factor that prevents parallelization, it is still possible to parallelize the loop. Common reduction operations occur so frequently that the compilers are capable of recognizing and parallelizing them as special cases.

Recognition of reduction operations is not included in the automatic parallelization analysis unless the -reduction compiler option is specified along with -autopar or -parallel.

If a parallelizable loop contains one of the reduction operations listed in Table 10-3, the compiler will parallelize it if -reduction is specified.

Recognized Reduction Operations

The following table lists the reduction operations that are recognized by f77 and f90.

Table 10-3 Recognized Reduction Operations


Mathematical Operations	Fortran Statement Templates
Sum of the elements	s = s + v(i)
Product of the elements	s = s * v(i)
Dot product of two vectors	s = s + v(i) * u(i)
Minimum of the elements	s = amin( s, v(i))
Maximum of the elements	s = amax( s, v(i))
`OR` of the elements	`do i = 1, n` `b = b .or. v(i)` `end do`
`AND` of nonpositive elements	`b = .true.` `do i = 1, n` `if (v(i) .le. 0) b=b .and. v(i)` `end do`
Count nonzero elements	`k = 0` `do i = 1, n` `if ( v(i) .ne. 0 ) k = k + 1` `end do`

All forms of the MIN and MAX function are recognized.

Numerical Accuracy and Reduction Operations

Floating-point sum or product reduction operations may be inaccurate due to the following conditions:

The order in which the calculations were performed in parallel was not the same as when performed serially on a single processor.

The order of calculation affected the sum or product of floating-point numbers. Hardware floating-point addition and multiplication are not associative. Roundoff, overflow, or underflow errors may result depending on how the operands associate. For example, (X*Y)*Z and X*(Y*Z) may not have the same numerical significance.

In some situations, the error may not be acceptable.

Example: Overflow and underflow, with and without reduction:

demo% cat t3.f
      real A(10002), result, MAXFLOAT
      MAXFLOAT = r_max_normal()
      do 10 i = 1 , 10000, 2
      A(i) = MAXFLOAT
      A(i+1) = -MAXFLOAT
10      continue

      A(5001)=-MAXFLOAT
      A(5002)=MAXFLOAT
 
      do 20 i = 1 ,10002        !Add up the array
        RESULT = RESULT + A(i)
20      continue
      write(6,*) RESULT
      end
demo% setenv PARALLEL 2          {Number of processors is 2}
demo% f77 -silent -autopar t3.f 
demo% a.out
   0.                            {Without reduction, 0. is correct}
demo% f77 -silent -autopar -reduction t3.f
demo% a.out
  Inf                            {With reduction, Inf. is not correct}
demo%

Example: Roundoff, get the sum of 100,000 random numbers between -1 and +1:

demo% cat t4.f
      parameter ( n = 100000 )
      double precision d_lcrans, lb / -1.0 /, s, ub / +1.0 /, v(n)
      s = d_lcrans ( v, n, lb, ub ) ! Get n random nos. between -1 and +1
      s = 0.0
      do i = 1, n
        s = s + v(i)
      end do
      write(*, '(" s = ", e21.15)') s
      end
demo% f77 -autopar -reduction t4.f

Results vary with the number of processors. The following table shows the sum of 100,000 random numbers between -1 and +1.

Number of Processors	Output
1	s = 0.568582080884714E+02
2	s = 0.568582080884722E+02
3	s = 0.568582080884721E+02
4	s = 0.568582080884724E+02

In this situation, roundoff error on the order of 10^-14 is acceptable for data that is random to begin with. For more information, see the Sun Numerical Computation Guide.