Sun Studio 12: Fortran Programming Guide

6.5 Further Numerical Adventures

This section addresses some real world problems that involve arithmetic operations that may unwittingly generate invalid, division by zero, overflow, underflow, or inexact exceptions.

For instance, prior to the IEEE standard, if you multiplied two very small numbers on a computer, you could get zero. Most mainframes and minicomputers behaved that way. With IEEE arithmetic, gradual underflow expands the dynamic range of computations.

For example, consider a 32-bit processor with 1.0E-38 as the machine’s epsilon, the smallest representable value on the machine. Multiply two small numbers:


      a = 1.0E-30
      b = 1.0E-15
      x = a * b

In older arithmetic, you would get 0.0, but with IEEE arithmetic and the same word length, you get 1.40130E-45. Underflow tells you that you have an answer smaller than the machine naturally represents. This result is accomplished by “stealing” some bits from the mantissa and shifting them over to the exponent. The result, a denormalized number, is less precise in some sense, but more precise in another. The deep implications are beyond this discussion. If you are interested, consult Computer, January 1980, Volume 13, Number 1, particularly J. Coonen’s article, “Underflow and the Denormalized Numbers.”

Most scientific programs have sections of code that are sensitive to roundoff, often in an equation solution or matrix factorization. Without gradual underflow, programmers are left to implement their own methods of detecting the approach of an inaccuracy threshold. Otherwise they must abandon the quest for a robust, stable implementation of their algorithm.

For more details on these topics, see the Numerical Computation Guide.

6.5.1 Avoiding Simple Underflow

Some applications actually do a lot of computation very near zero. This is common in algorithms computing residuals or differential corrections. For maximum numerically safe performance, perform the key computations in extended precision arithmetic. If the application is a single-precision application, you can perform key computations in double precision.

Example: A simple dot product computation in single precision:


      sum = 0
      DO i = 1, n
         sum = sum + a(i) * b(i)
      END DO

If a(i) and b(i) are very small, many underflows occur. By forcing the computation to double precision, you compute the dot product with greater accuracy and do not suffer underflows:


      DOUBLE PRECISION sum
      DO i = 1, n
         sum = sum + dble(a(i)) * dble(b(i))
      END DO
      result = sum

You can force a SPARC processor to behave like an older system with respect to underflow (Store Zero) by adding a call to the library routine nonstandard_arithmetic() or by compiling the application’s main program with the -fns option.

6.5.2 Continuing With the Wrong Answer

You might wonder why you would continue a computation if the answer is clearly wrong. IEEE arithmetic allows you to make distinctions about what kind of wrong answers can be ignored, such as NaN or Inf. Then decisions can be made based on such distinctions.

For an example, consider a circuit simulation. The only variable of interest (for the sake of argument) from a particular 50-line computation is the voltage. Further, assume that the only values that are possible are +5v, 0, -5v.

It is possible to carefully arrange each part of the calculation to coerce each sub-result to the correct range:

Furthermore, since Inf is not an allowed value, you need special logic to ensure that big numbers are not multiplied.

IEEE arithmetic allows the logic to be much simpler. The computation can be written in the obvious fashion, and only the final result need be coerced to the correct value—since Inf can occur and can be easily tested.

Furthermore, the special case of 0/0 can be detected and dealt with as you wish. The result is easier to read and faster in executing, since you don’t do unneeded comparisons.

6.5.3 Excessive Underflow

If two very small numbers are multiplied, the result underflows.

If you know in advance that the operands in a multiplication (or subtraction) may be small and underflow is likely, run the calculation in double precision and convert the result to single precision later.

For example, a dot product loop like this:


  real sum, a(maxn), b(maxn)
  ...
  do i =1, n
      sum = sum + a(i)*b(i)
  enddo

where the a(*) and b(*) are known to have small elements, should be run in double precision to preserve numeric accuracy:


  real a(maxn), b(maxn)
  double sum
  ...
  do i =1, n
      sum = sum + a(i)*dble(b(i))
  enddo

Doing so may also improve performance due to the software resolution of excessive underflows caused by the original loop. However, there is no hard and fast rule here; experiment with your intensely computational code to determine the most profitable solutions.