10.2.4 Automatic Parallelization With Reduction Operations (Sun Studio 12: Fortran Programming Guide)

Sun Studio 12: Fortran Programming Guide

10.2.4 Automatic Parallelization With Reduction Operations

A computation that transforms an array into a scalar is called a reduction operation. Typical reduction operations are the sum or product of the elements of a vector. Reduction operations violate the criterion that calculations within a loop not change a scalar variable in a cumulative way across iterations.

Example: Reduction summation of the elements of a vector:

      s = 0.0
      do i = 1, 1000
        s = s + v(i)
      end do
      t(k) = s

However, for some operations, if reduction is the only factor that prevents parallelization, it is still possible to parallelize the loop. Common reduction operations occur so frequently that the compilers are capable of recognizing and parallelizing them as special cases.

Recognition of reduction operations is not included in the automatic parallelization analysis unless the -reduction compiler option is specified along with -autopar or -parallel.

If a parallelizable loop contains one of the reduction operations listed in Table 10–2, the compiler will parallelize it if -reduction is specified.

10.2.4.1 Recognized Reduction Operations

The following table lists the reduction operations that are recognized by the compiler.

Table 10–2 Recognized Reduction Operations


Mathematical Operations	Fortran Statement Templates
Sum	`s = s + v(i)`
Product	*`s = s v(i)`**
Dot product	*`s = s + v(i) u(i)`**
Minimum	`s = amin( s, v(i))`
Maximum	`s = amax( s, v(i))`
`OR`	`do i = 1, n` `b = b .or. v(i)` `end do`
`AND`	`b = .true.` `do i = 1, n` `b = b .and. v(i)` `end do`
Count of non-zero elements	`k = 0` `do i = 1, n` `if(v(i).ne.0) k = k + 1` `end do`

All forms of the MIN and MAX function are recognized.

10.2.4.2 Numerical Accuracy and Reduction Operations

Floating-point sum or product reduction operations may be inaccurate due to the following conditions:

The order in which the calculations are performed in parallel is not the same as when performed serially on a single processor.
The order of calculation affects the sum or product of floating-point numbers. Hardware floating-point addition and multiplication are not associative. Roundoff, overflow, or underflow errors may result depending on how the operands associate. For example, (X*Y)*Z and X*(Y*Z) may not have the same numerical significance.

In some situations, the error may not be acceptable.

Example: Roundoff, get the sum of 100,000 random numbers between– 1 and +1:

demo% cat t4.f
      parameter ( n = 100000 )
      double precision d_lcrans, lb / -1.0 /, s, ub / +1.0 /, v(n)
      s = d_lcrans ( v, n, lb, ub ) ! Get n random nos. between -1 and +1
      s = 0.0
      do i = 1, n
        s = s + v(i)
      end do
      write(*, ’(" s = ", e21.15)’) s
      end
demo% f95 -O4 -autopar -reduction t4.f

Results vary with the number of processors. The following table shows the sum of 100,000 random numbers between– 1 and +1.

Number of Processors	Output
1	`s = 0.568582080884714E+02`
2	`s = 0.568582080884722E+02`
3	`s = 0.568582080884721E+02`
4	`s = 0.568582080884724E+02`

In this situation, roundoff error on the order of 10^-14 is acceptable for data that is random to begin with. For more information, see the Sun Numerical Computation Guide.