Floating-Point Arithmetic

Fortran Programming Guide

Chapter 6

Floating-Point Arithmetic

This chapter considers floating-point arithmetic and suggests strategies for avoiding and detecting numerical computation errors.

For a detailed examination of floating-point computation on SPARC and x86 processors, see the Sun Numerical Computation Guide.

Introduction

Sun's floating-point environment on SPARC and x86 implements the arithmetic model specified by the IEEE Standard 754 for Binary Floating Point Arithmetic. This environment enables you to develop robust, high-performance, portable numerical applications. It also provides tools to investigate any unusual behavior by a numerical program.

In numerical programs, there are many potential sources for computational error:

The computational model could be wrong.
The algorithm used could be numerically unstable.
The data could be ill-conditioned.
The hardware could be producing unexpected results.

Finding the source of the errors in a numerical computation that has gone wrong can be extremely difficult. The chance of coding errors can be reduced by using commercially available and tested library packages whenever possible. Choice of algorithms is another critical issue. Using the appropriate computer arithmetic is another.

This chapter makes no attempt to teach or explain numerical error analysis. The material presented here is intended to introduce the IEEE floating-point model as implemented by Sun WorkShop Fortran compilers.

IEEE Floating-Point Arithmetic

IEEE arithmetic is a relatively new way of dealing with arithmetic operations that result in such problems as invalid, division by zero, overflow, underflow, or inexact. The differences are in rounding, handling numbers near zero, and handling numbers near the machine maximum.

The IEEE standard supports user handling of exceptions, rounding, and precision. Consequently, the standard supports interval arithmetic and diagnosis of anomalies. IEEE Standard 754 makes it possible to standardize elementary functions like exp and cos, to create high precision arithmetic, and to couple numerical and symbolic algebraic computation.

IEEE arithmetic offers users greater control over computation than does any other kind of floating-point arithmetic. The standard simplifies the task of writing numerically sophisticated, portable programs. Many questions about floating-point arithmetic concern elementary operations on numbers. For example:

What is the result of an operation when the infinitely precise result is not representable in the computer hardware?
Are elementary operations like multiplication and addition commutative?

Another class of questions concerns floating-point exceptions and exception handling. What happens if you:

Multiply two very large numbers with the same sign?
Divide nonzero by zero?
Divide zero by zero?

In older arithmetic models, the first class of questions might not have the expected answers, while the exceptional cases in the second class might all have the same result: the program aborts on the spot or proceeds with garbage results.

The standard ensures that operations yield the mathematically expected results with the expected properties. It also ensures that exceptional cases yield specified results, unless the user specifically makes other choices.

For example, the exceptional values +Inf, -Inf, and NaN are introduced intuitively:

big*big = +Inf Positive infinitybig*(-big) = -Inf Negative infinitynum/0.0 = +InfWhere num > 0.0 num/0.0 = -InfWhere num < 0.0 0.0/0.0 = NaNNot a Number

Also, five types of floating-point exception are identified:

Invalid. Operations with mathematically invalid operands--for example, 0.0/0.0, sqrt(-1.0), and log(-37.8)
Division by zero. Divisor is zero and dividend is a finite nonzero number--for example, 9.9/0.0
Overflow. Operation produces a result that exceeds the range of the exponent-- for example, MAXDOUBLE+0.0000000000001e308
Underflow. Operation produces a result that is too small to be represented as a normal number--for example, MINDOUBLE * MINDOUBLE
Inexact. Operation produces a result that cannot be represented with infinite precision--for example, 2.0 / 3.0, log(1.1) and 0.1 in input

The implementation of the IEEE standard is described in the Sun Numerical Computation Guide.

-ftrap=mode Compiler Options

The -ftrap=mode option enables trapping for floating-point exceptions. If no signal handler has been established by an ieee_handler() call, the exception terminates the program with a memory dump core file. See Fortran User's Guide for details on this compiler option. For example, to enable trapping for overflow, division by zero, and invalid operations, compile with -ftrap=common.

Note – You must compile the application's main program with -ftrap= for trapping to be enabled.

Floating-Point Exceptions and Fortran

Programs compiled by f77 automatically display a list of accrued floating-point exceptions on program termination. In general, a message results if any one of the invalid, division-by-zero, or overflow exceptions have occurred. Inexact exceptions do not generate messages because they occur so frequently in real programs.

f95 programs do not automatically report on exceptions at program termination. An explicit call to ieee_retrospective(3M) is required.

You can turn off any or all of these messages with ieee_flags() by clearing exception status flags. Do this at the end of your program.

Handling Exceptions

Exception handling according to the IEEE standard is the default on SPARC and x86 processors. However, there is a difference between detecting a floating-point exception and generating a signal for a floating-point exception (SIGFPE).

Following the IEEE standard, two things happen when an untrapped exception occurs during a floating-point operation:

The system returns a default result. For example, on 0/0 (invalid), the system returns NaN as the result.
A flag is set to indicate that an exception is raised. For example, 0/0 (invalid), the system sets the "invalid operation" flag.

Trapping a Floating-Point Exception

f77 and f95 differ significantly in the way they handle floating-point exceptions.

With f77, the default on SPARC and x86 systems is not to automatically generate a signal to interrupt the running program for a floating-point exception. The assumptions are that signals could degrade performance and that most exceptions are not significant as long as expected values are returned.

The default with f95 is to automatically trap on division by zero, overflow, and invalid operation.

The f77 and f95 command-line option -ftrap can be used to change the default. In terms of -ftrap, the default for f77 is -ftrap=%none. The default for f95 is -ftrap=common.

To enable exception trapping, compile the main program with one of the -ftrap options--for example: -ftrap=common.

SPARC: Nonstandard Arithmetic

One aspect of standard IEEE arithmetic, called gradual underflow, can be manually disabled. When disabled, the program is considered to be running with nonstandard arithmetic.

The IEEE standard for arithmetic specifies a way of handling underflowed results gradually by dynamically adjusting the radix point of the significand. In IEEE floating-point format, the radix point occurs before the significand, and there is an implicit leading bit of 1. Gradual underflow allows the implicit leading bit to be cleared to 0 and shifts the radix point into the significant when the result of a floating-point computation would otherwise underflow. With a SPARC processor this result is not accomplished in hardware but in software. If your program generates many underflows (perhaps a sign of a problem with your algorithm) and you run on a SPARC processor, you may experience a performance loss.

Gradual underflow can be disabled either by compiling with the -fns option or by calling the library routine nonstandard_arithmetic() from within the program to turn it off. Call standard_arithmetic() to turn gradual underflow back on.

Note – To be effective, the application's main program must be compiled with
-fns. See the Fortran User's Guide.

For legacy applications, take note that:

The standard_arithmetic() subroutine replaces an earlier routine named gradual_underflow().
The nonstandard_arithmetic() subroutine replaces an earlier routine named abrupt_underflow().
Note – The -fns option and the nonstandard_arithmetic() library routine are effective only on some SPARC systems. On x86 platforms, gradual underflow is performed by the hardware.

IEEE Routines

The following interfaces help people use IEEE arithmetic and are described in man pages. These are mostly in the math library libsunmath and in several .h files.

ieee_flags(3m)--Controls rounding direction and rounding precision; query exception status; clear exception status
ieee_handler(3m)--Establishes an exception handler routine
ieee_functions(3m)--Lists name and purpose of each IEEE function
ieee_values(3m)--Lists functions that return special values
Other libm functions described in this section:

ieee_retrospective
nonstandard_arithmetic
standard_arithmetic

The SPARC processors conform to the IEEE standard in a combination of hardware and software support for different aspects. x86 processors conform to the IEEE standard entirely through hardware support.

The newest SPARC processors contain floating-point units with integer multiply and divide instructions and hardware square root.

Best performance is obtained when the compiled code properly matches the runtime floating-point hardware. The compiler's -xtarget= option permits specification of the runtime hardware. For example, -xtarget=ultra would inform the compiler to generate object code that will perform best on an UltraSPARC processor.

On SPARC platforms: The utility fpversion displays which floating-point hardware is installed and indicates the appropriate -xtarget value to specify. This utility runs on all Sun SPARC architectures. See fpversion(1), the Sun WorkShop Fortran User's Guide (regarding -xtarget) and the Numerical Computation Guide for details.

Flags and ieee_flags()

The ieee_flags function is used to query and clear exception status flags. It is part of the libsunmath library shipped with Sun compilers and performs the following tasks:

Controls rounding direction and rounding precision
Checks the status of the exception flags
Clears exception status flags

The general form of a call to ieee_flags is:
flags = ieee_flags( action, mode, in, out )
Each of the four arguments is a string. The input is action, mode, and in. The output is out and flags. ieee_flags is an integer-valued function. Useful information is returned in flags as a set of 1-bit flags. Refer to the man page for ieee_flags(3m) for complete details.

Possible parameter values are shown in the following table:

TABLE 6-1 ieee_flags( action, mode, in, out ) Argument Values
action mode in, out

get set clear clearall direction precision exception nearest tozero negative positive extended double single inexact division underflow overflow invalid all common

The precision mode is available only on x86 platforms.

Note that these are literal character strings, and the output parameter out must be at least CHARACTER*9. The meanings of the possible values for in and out depend on the action and mode they are used with. These are summarized in the following table:

TABLE 6-2 ieee_flags Argument Meanings
Value of in and out Refers to

nearest, tozero, negative, positive Rounding direction

extended, double, single Rounding precision

inexact, division, underflow, overflow, invalid Exceptions

all All five exceptions

common Common exceptions:
invalid, division, overflow

For example, to determine what is the highest priority exception that has a flag raised, pass the input argument in as the null string:
CHARACTER *9, out

ieeer = ieee_flags( 'get', 'exception', '', out )

PRINT *, out, ' flag raised'
Also, to determine if the overflow exception flag is raised, set the input argument in to overflow. On return, if out equals overflow, then the overflow exception flag is raised; otherwise it is not raised.
ieeer = ieee_flags( 'get', 'exception', 'overflow', out )

IF ( out.eq. 'overflow') PRINT *,'overflow flag raised'
Example: Clear the invalid exception:
ieeer = ieee_flags( 'clear', 'exception', 'invalid', out )
Example: Clear all exceptions:
ieeer = ieee_flags( 'clear', 'exception', 'all', out )
Example: Set rounding direction to zero:
ieeer = ieee_flags( 'set', 'direction', 'tozero', out )
Example: Set rounding precision to double:
ieeer = ieee_flags( 'set', 'precision', 'double', out )
Turning Off All Warning Messages With ieee_flags

Calling ieee_flags with an action of clear, as shown in the following example, resets any uncleared exceptions. Put this call before the program exits to suppress system warning messages about floating-point exceptions at program termination.

Example: Clear all accrued exceptions with ieee_flags():
i = ieee_flags('clear', 'exception', 'all', out )
Detecting an Exception With ieee_flags

The following example demonstrates how to determine which floating-point exceptions have been raised by earlier computations. Bit masks defined in the system include file floatingpoint.h are applied to the value returned by ieee_flags.

Note – Fortran 95 (f95) programs should include the file floatingpoint.h ; Fortran 77 (f77) programs should include f77_floatingpoint.h.

In this example, DetExcFlg.F, the include file is introduced using the #include preprocessor directive, which requires us to name the source file with a .F suffix. Underflow is caused by dividing the smallest double-precision number by 2.

Example: Detect an exception using ieee_flags and decode it:
#include "floatingpoint.h"

CHARACTER*16 out

DOUBLE PRECISION d_max_subnormal, x

INTEGER div, flgs, inv, inx, over, under

x = d_max_subnormal() / 2.0 ! Cause underflow

flgs=ieee_flags('get','exception','',out) ! Which are raised?

inx = and(rshift(flgs, fp_inexact) , 1) ! Decode

div = and(rshift(flgs, fp_division) , 1) ! the value

under = and(rshift(flgs, fp_underflow), 1) ! returned

over = and(rshift(flgs, fp_overflow) , 1) ! by

inv = and(rshift(flgs, fp_invalid) , 1) ! ieee_flags

PRINT *, "Highest priority exception is: ", out

PRINT *, ' invalid divide overflo underflo inexact'

PRINT '(5i8)', inv, div, over, under, inx

PRINT *, '(1 = exception is raised; 0 = it is not)'

i = ieee_flags('clear', 'exception', 'all', out) ! Clear all

END
Example: Compile and run the preceding example (DetExcFlg.F):
demo% f95 DetExcFlg.F

demo% a.out

Highest priority exception is: underflow

invalid divide overflo underflo inexact

0 0 0 1 1

(1 = exception is raised; 0 = it is not)

demo%
IEEE Extreme Value Functions

The compilers provide a set of functions that can be called to return a special IEEE extreme value. These values, such as infinity or minimum normal, can be used directly in an application program.

Example: A convergence test based on the smallest number supported by the hardware would look like:
IF ( delta .LE. r_min_normal() ) RETURN
The values available are listed in the following table:

TABLE 6-3 Functions Returning IEEE Values
IEEE Value Double Precision Single Precision

infinity d_infinity() r_infinity()

quiet NaN d_quiet_nan() r_quiet_nan()

signaling NaN d_signaling_nan() r_signaling_nan()

min normal d_min_normal() r_min_normal()

min subnormal d_min_subnormal() r_min_subnormal()

max subnormal d_max_subnormal() r_max_subnormal()

max normal d_max_normal() r_max_normal()

The two NaN values (quiet and signaling) are unordered and should not be used in comparisons such as IF(X.ne.r_quiet_nan())THEN... To determine whether some value is a NaN, use the function ir_isnan(r) or id_isnan(d).

The Fortran names for these functions are listed in these man pages:

libm_double(3f)
libm_single(3f)
ieee_functions(3m)

Also see:

ieee_values(3m)
The floatingpoint.h and f77_floatingpoint.h header files

Exception Handlers and ieee_handler()

Typical concerns about IEEE exceptions are:

What happens when an exception occurs?
How do I use ieee_handler() to establish a user function as an exception handler?
How do I write a function that can be used as an exception handler?
How do I locate the exception--where did it occur?

Exception trapping to a user routine begins with the system generating a signal on a floating-point exception. The standard UNIX name for signal: floating-point exception is SIGFPE. The default situation on SPARC and x86 platforms is not to generate a SIGFPE when an exception occurs. For the system to generate a SIGFPE, exception trapping must first be enabled, usually by a call to ieee_handler().

Establishing an Exception Handler Function

To establish a function as an exception handler, pass the name of the function to ieee_handler(), together with the name of the exception to watch for and the action to take. Once you establish a handler, a SIGFPE signal is generated whenever the particular floating-point exception occurs, and the specified function is called.

The form for invoking ieee_handler() is shown in the following table:

TABLE 6-4 Arguments for ieee_handler(action, exception, handler)
Argument Type Possible Values

action character get, set, or clear

exception character invalid, division, overflow, underflow, or inexact

handler Function name The name of the user handler function or SIGFPE_DEFAULT, SIGFPE_IGNORE, or SIGFPE_ABORT

Return value integer 0 =OK

A Fortran 77 routine compiled with f77 that calls ieee_handler() should also declare:

#include 'f77_floatingpoint.h'

For f95 programs, declare:

#include 'floatingpoint.h'

The special arguments SIGFPE_DEFAULT, SIGFPE_IGNORE,and SIGFPE_ABORT are defined in these include files and can be used to change the behavior of the program for a specific exception:
SIGFPE_DEFAULT or

SIGFPE_IGNORE
No action taken when the specified exception occurs.

SIGFPE_ABORT
Program aborts, possibly with dump file, on exception.
Writing User Exception Handler Functions

The actions your exception handler takes are up to you. However, the routine must be an integer function with three arguments specified as shown:

handler_name( sig, sip, uap )

handler_name is the name of the integer function.
sig is an integer.
sip is a record that has the structure siginfo.
uap is not used.
Example: An exception handler function:

INTEGER FUNCTION hand( sig, sip, uap )

INTEGER sig, location

STRUCTURE /fault/

INTEGER address

INTEGER trapno

END STRUCTURE

STRUCTURE /siginfo/

INTEGER si_signo

INTEGER si_code

INTEGER si_errno

RECORD /fault/ fault

END STRUCTURE

RECORD /siginfo/ sip

location = sip.fault.address

... actions you take ...

END
This f77 example would have to be modified to run on SPARC V9 architectures (-xarch=v9 or v9a) by replacing all INTEGER declarations within each STRUCTURE with INTEGER*8.

If the handler routine enabled by ieee_handler() is in Fortran as shown in the example, the routine should not make any reference to its first argument (sig). This first argument is passed by value to the routine and can only be referenced as loc(sig). The value is the signal number.

Detecting an Exception by Handler

The following examples show how to create handler routines to detect floating-point exceptions.

Example: Detect exception and abort:
demo% cat DetExcHan.f

EXTERNAL myhandler

REAL r / 14.2 /, s / 0.0 /

i = ieee_handler ('set', 'division', myhandler )

t = r/s

END

INTEGER FUNCTION myhandler(sig,code,context)

INTEGER sig, code, context(5)

CALL abort()

END

demo% f77 -silent DetExcHan.f

demo% a.out

abort: called

Abort (core dumped)

demo%
SIGFPE is generated whenever that floating-point exception occurs. When the SIGFPE is detected, control passes to the myhandler function, which immediately aborts. Compile with -g and use dbx to find the location of the exception.

Locating an Exception by Handler

Example: Locate an exception (print address) and abort:
demo% cat LocExcHan.F

#include "f77_floatingpoint.h"

EXTERNAL Exhandler

INTEGER Exhandler, i, ieee_handler

REAL r / 14.2 /, s / 0.0 /, t

C Detect division by zero

i = ieee_handler( 'set', 'division', Exhandler )

t = r/s

END

INTEGER FUNCTION Exhandler( sig, sip, uap)

INTEGER sig

STRUCTURE /fault/

INTEGER address

END STRUCTURE

STRUCTURE /siginfo/

INTEGER si_signo

INTEGER si_code

INTEGER si_errno

RECORD /fault/ fault

END STRUCTURE

RECORD /siginfo/ sip

WRITE (*,10) sip.si_signo, sip.si_code, sip.fault.address

10 FORMAT('Signal ',i4,' code ',i4,' at hex address ', Z8 )

Exhandler=1

CALL abort()

END

demo% f77 -silent -g LocExcHan.F

demo% a.out

Signal 8 code 3 at hex address 11230

abort: called

Abort (core dumped)

demo%
In SPARC V9 environments, replace the INTEGER declarations within each STRUCTURE with INTEGER*8, and the i4 formats with i8.

In most cases, knowing the actual address of the exception is of little use, except with dbx:
demo% dbx a.out

(dbx) stopi at 0x11230 Set breakpoint at address

(2) stopi at &MAIN+0x68

(dbx) run Run program

Running: a.out

(process id 18803)

stopped in MAIN at 0x11230

MAIN+0x68: fdivs %f3, %f2, %f2

(dbx) where Shows the line number of the exception

=>[1] MAIN(), line 7 in "LocExcHan.F"

(dbx) list 7 Displays the source code line

7 t = r/s

(dbx) cont Continue after breakpoint, enter handler routine

Signal 8 code 3 at hex address 11230

abort: called

signal ABRT (Abort) in _kill at 0xef6e18a4

_kill+0x8: bgeu _kill+0x30

Current function is exhandler

24 CALL abort()

(dbx) quit

demo%
Of course, there are easier ways to determine the source line that caused the error. However, this example does serve to show the basics of exception handling.

Disabling All Signal Handlers

With f77, some system signal handlers for trapping interrupts, bus errors, segmentation violations, or illegal instructions are automatically enabled by default.

Although generally you would not want to turn off this default behavior, you can do so by compiling a C program that sets the global C variable f77_no_handlers to 1 and linking into your executable program:
demo% cat NoHandlers.c

int f77_no_handlers=1 ;

demo% cc -c NoHandlers.c

demo% f77 NoHandlers.o MyProgram.f
Otherwise, by default, f77_no_handlers is 0. The setting takes effect just before execution is transferred to the user program.

This variable is in the global name space of the program; do not use f77_no_handlers as the name of a variable anywhere else in the program.

With f95, no signal handlers are on by default.

Retrospective Summary

The ieee_retrospective function queries the floating-point status registers to find out which exceptions have accrued and a message is printed to standard error to inform you which exceptions were raised but not cleared. This function is automatically called by Fortran 77 programs at normal program termination (CALL EXIT). The message typically looks like this; the format may vary with each compiler release:
Note: IEEE floating-point exception flags raised:

Division by Zero;

IEEE floating-point exception traps enabled:

inexact; underflow; overflow; invalid operation;

See the Numerical Computation Guide, ieee_flags(3M),

ieee_handler(3M)
Fortran 95 programs do not call ieee_retrospective automatically. A Fortran 95 program would need to call ieee_retrospective explicitly (and link with -lf77compat).

Debugging IEEE Exceptions

In most cases, the only indication that any floating-point exceptions (such as overflow, underflow, or invalid operation) have occurred is the retrospective summary message at program termination. Locating where the exception occurred requires exception trapping be enabled. This can be done by either compiling with the -ftrap=common option or by establishing an exception handler routine with ieee_handler(). With exception trapping enabled, run the program from dbx or the Sun WorkShop, using the dbx catch FPE command to see where the error occurs.

The advantage of recompiling with -ftrap=common is that the source code need not be modified to trap the exceptions. However, by calling ieee_handler() you can be more selective as to which exceptions to look at.

Example: Recompiling with -ftrap=common and using dbx:
demo% f77 -g -ftrap=common -silent myprogram.f

demo% dbx a.out

Reading symbolic information for a.out

Reading symbolic information for rtld /usr/lib/ld.so.1

Reading symbolic information for libF77.so.3

Reading symbolic information for libc.so.1

Reading symbolic information for libdl.so.1

(dbx) catch FPE

(dbx) run

Running: a.out

(process id 19739)

signal FPE (floating point divide by zero) in MAIN at line 212 in file "myprogram.f"

212 Z = X/Y

(dbx) print Y

y = 0.0

(dbx)
If you find that the program terminates with overflow and other exceptions, you can locate the first overflow specifically by calling ieee_handler() to trap just overflows. This requires modifying the source code of at least the main program, as shown in the following example.

Example: Locate an overflow when other exceptions occur:
demo% cat myprog.F

#include "f77_floatingpoint.h"

program myprogram

...

ier = ieee_handler(`set','overflow',SIGFPE_ABORT)

...

demo% f77 -g -silent myprog.F

demo% dbx a.out

Reading symbolic information for a.out

Reading symbolic information for rtld /usr/lib/ld.so.1

Reading symbolic information for libF77.so.3

Reading symbolic information for libc.so.1

Reading symbolic information for libdl.so.1

(dbx) catch FPE

(dbx) run

Running: a.out

(process id 19793)

signal FPE (floating point overflow) in MAIN at line 55 in file "myprog.F"

55 w = rmax * 200. ! Cause of the overflow

(dbx) cont ! Continue execution to completion

Note: IEEE floating-point exception flags raised:

Inexact; Division by Zero; Underflow; ! There were other exceptions

IEEE floating-point exception traps enabled:

overflow;

See the Numerical Computation Guide...

execution completed, exit code is 0

(dbx)
To be selective, the example introduces the #include, which required renaming the source file with a .F suffix and calling ieee_handler(). You could go further and create your own handler function to be invoked on the overflow exception to do some application-specific analysis and print intermediary or debug results before aborting.

Further Numerical Adventures

This section addresses some real world problems that involve arithmetic operations that may unwittingly generate invalid, division by zero, overflow, underflow, or inexact exceptions.

For instance, prior to the IEEE standard, if you multiplied two very small numbers on a computer, you could get zero. Most mainframes and minicomputers behaved that way. With IEEE arithmetic, gradual underflow expands the dynamic range of computations.

For example, consider a 32-bit processor with 1.0E-38 as the machine's epsilon, the smallest representable value on the machine. Multiply two small numbers:
a = 1.0E-30

b = 1.0E-15

x = a * b
In older arithmetic, you would get 0.0, but with IEEE arithmetic and the same word length, you get 1.40130E-45. Underflow tells you that you have an answer smaller than the machine naturally represents. This result is accomplished by "stealing" some bits from the mantissa and shifting them over to the exponent. The result, a denormalized number, is less precise in some sense, but more precise in another. The deep implications are beyond this discussion. If you are interested, consult Computer, January 1980, Volume 13, Number 1, particularly J. Coonen's article, "Underflow and the Denormalized Numbers."

Most scientific programs have sections of code that are sensitive to roundoff, often in an equation solution or matrix factorization. Without gradual underflow, programmers are left to implement their own methods of detecting the approach of an inaccuracy threshold. Otherwise they must abandon the quest for a robust, stable implementation of their algorithm.

For more details on these topics, see the Sun WorkShop Numerical Computation Guide.

Avoiding Simple Underflow

Some applications actually do a lot of computation very near zero. This is common in algorithms computing residuals or differential corrections. For maximum numerically safe performance, perform the key computations in extended precision arithmetic. If the application is a single-precision application, you can perform key computations in double precision.

Example: A simple dot product computation in single precision:
sum = 0

DO i = 1, n

sum = sum + a(i) * b(i)

END DO
If a(i) and b(i) are very small, many underflows occur. By forcing the computation to double precision, you compute the dot product with greater accuracy and do not suffer underflows:
DOUBLE PRECISION sum

DO i = 1, n

sum = sum + dble(a(i)) * dble(b(i))

END DO

result = sum
On SPARC platforms: You can force a SPARC processor to behave like an older system with respect to underflow (Store Zero) by adding a call to the library routine nonstandard_arithmetic() or by compiling the application's main program with the -fns option.

Continuing With the Wrong Answer

You might wonder why you would continue a computation if the answer is clearly wrong. IEEE arithmetic allows you to make distinctions about what kind of wrong answers can be ignored, such as NaN or Inf. Then decisions can be made based on such distinctions.

For an example, consider a circuit simulation. The only variable of interest (for the sake of argument) from a particular 50-line computation is the voltage. Further, assume that the only values that are possible are +5v, 0, -5v.

It is possible to carefully arrange each part of the calculation to coerce each sub-result to the correct range:
if computed value is greater than 4.0, return 5.0

if computed value is between -4.0 and +4.0, return 0

if computed value is less than -4.0, return -5.0
Furthermore, since Inf is not an allowed value, you need special logic to ensure that big numbers are not multiplied.

IEEE arithmetic allows the logic to be much simpler. The computation can be written in the obvious fashion, and only the final result need be coerced to the correct value--since Inf can occur and can be easily tested.

Furthermore, the special case of 0/0 can be detected and dealt with as you wish. The result is easier to read and faster in executing, since you don't do unneeded comparisons.

SPARC: Excessive Underflow

If two very small numbers are multiplied, the result underflows.

If you know in advance that the operands in a multiplication (or subtraction) may be small and underflow is likely, run the calculation in double precision and convert the result to single precision later.

For example, a dot product loop like this:
real sum, a(maxn), b(maxn)

...

do i =1, n

sum = sum + a(i)*b(i)

enddo
where the a(*) and b(*) are known to have small elements, should be run in double precision to preserve numeric accuracy:
real a(maxn), b(maxn)

double sum

...

do i =1, n

sum = sum + a(i)*dble(b(i))

enddo
Doing so may also improve performance due to the software resolution of excessive underflows caused by the original loop. However, there is no hard and fast rule here; experiment with your intensely computational code to determine the most profitable solutions.

Interval Arithmetic

The Sun WorkShop 6 Fortran 95 compiler f95 supports intervals as an intrinsic data type. An interval is the closed compact set: [a, b] ={z | a z b} defined by a pair of numbers, a b. Intervals can be used to:

Solve nonlinear problems
Perform rigorous error analysis
Detect sources of numerical instability

By introducing intervals as an intrinsic data type to Fortran 95, all of the applicable syntax and semantics of Fortran 95 become immediately available to the developer. Besides the INTERVAL data types, f95 includes the following interval extensions to Fortran 95:

Three classes of INTERVAL relational operators:

Certainly
Possibly
Set

Intrinsic INTERVAL-specific operators, such as INF, SUP, WID, and HULL
INTERVAL input/output edit descriptors, including single-number input/output
Interval extensions to arithmetic, trigonometric, and other mathematical functions
Expression context-dependent INTERVAL constants
Mixed-mode interval expression processing

To use the Fortran 95 interval-specific features, specify -xia or -xinterval in the f95 command line.

For detailed information on interval arithmetic in Fortran 95, see the Interval Arithmetic Programming Reference.

`bigbig = +Inf` Positive infinity`big(-big) = -Inf` Negative infinity`num/0.0 = +Inf`Where num > 0.0 `num/0.0 = -Inf`Where num < 0.0 `0.0/0.0 = NaN`Not a Number

flags = ieee_flags( action, mode, in, out )

**TABLE 6-1** `ieee_flags`*( action, mode, in, out* ) Argument Values
action	mode	in, out
`get set clear clearall`	`direction precision exception`	`nearest tozero negative positive extended double single inexact division underflow overflow invalid all common`

**TABLE 6-2** `ieee_flags` Argument Meanings
Value of in and out	Refers to
`nearest`, `tozero`, `negative`, `positive`	Rounding direction
`extended`, `double`, `single`	Rounding precision
`inexact, division, underflow, overflow, invalid`	Exceptions
`all`	All five exceptions
`common`	Common exceptions: invalid, division, overflow

CHARACTER 9, out ieeer = ieee_flags( 'get', 'exception', '', out ) PRINT , out, ' flag raised'

ieeer = ieee_flags( 'get', 'exception', 'overflow', out ) IF ( out.eq. 'overflow') PRINT *,'overflow flag raised'

ieeer = ieee_flags( 'clear', 'exception', 'invalid', out )

ieeer = ieee_flags( 'clear', 'exception', 'all', out )

ieeer = ieee_flags( 'set', 'direction', 'tozero', out )

ieeer = ieee_flags( 'set', 'precision', 'double', out )

i = ieee_flags('clear', 'exception', 'all', out )

#include "floatingpoint.h" CHARACTER16 out DOUBLE PRECISION d_max_subnormal, x INTEGER div, flgs, inv, inx, over, under x = d_max_subnormal() / 2.0 ! Cause underflow* flgs=ieee_flags('get','exception','',out) ! Which are raised? inx = and(rshift(flgs, fp_inexact) , 1) ! Decode div = and(rshift(flgs, fp_division) , 1) ! the value under = and(rshift(flgs, fp_underflow), 1) ! returned over = and(rshift(flgs, fp_overflow) , 1) ! by inv = and(rshift(flgs, fp_invalid) , 1) ! ieee_flags PRINT , "Highest priority exception is: ", out PRINT , ' invalid divide overflo underflo inexact' PRINT '(5i8)', inv, div, over, under, inx PRINT , '(1 = exception is raised; 0 = it is not)' i = ieee_flags('clear', 'exception', 'all', out) ! Clear all* END

demo% `f95 DetExcFlg.F` demo% `a.out` Highest priority exception is: underflow invalid divide overflo underflo inexact 0 0 0 1 1 (1 = exception is raised; 0 = it is not) demo%

IF ( delta .LE. r_min_normal() ) RETURN

**TABLE 6-3** Functions Returning IEEE Values
IEEE Value	Double Precision	Single Precision
`infinity`	`d_infinity()`	`r_infinity()`
`quiet NaN`	`d_quiet_nan()`	`r_quiet_nan()`
`signaling NaN`	`d_signaling_nan()`	`r_signaling_nan()`
`min normal`	`d_min_normal()`	`r_min_normal()`
`min subnormal`	`d_min_subnormal()`	`r_min_subnormal()`
`max subnormal`	`d_max_subnormal()`	`r_max_subnormal()`
`max normal`	`d_max_normal()`	`r_max_normal()`

**TABLE 6-4** Arguments for ieee_handler(*action*`,` *exception*`,` *handler*)
Argument	Type	Possible Values
action	`character`	`get`, `set`, or `clear`
exception	`character`	`invalid`, `division`, `overflow,` `underflow`, or `inexact`
handler	Function name	The name of the user handler function or `SIGFPE_DEFAULT, SIGFPE_IGNORE`, or `SIGFPE_ABORT`
Return value	`integer`	0 =OK

SIGFPE_DEFAULT or SIGFPE_IGNORE	No action taken when the specified exception occurs.
SIGFPE_ABORT	Program aborts, possibly with dump file, on exception.

INTEGER FUNCTION hand( sig, sip, uap ) INTEGER sig, location STRUCTURE /fault/ INTEGER address INTEGER trapno END STRUCTURE STRUCTURE /siginfo/ INTEGER si_signo INTEGER si_code INTEGER si_errno RECORD /fault/ fault END STRUCTURE RECORD /siginfo/ sip location = sip.fault.address ... actions you take ... END

demo% `cat DetExcHan.f` EXTERNAL myhandler REAL r / 14.2 /, s / 0.0 / i = ieee_handler ('set', 'division', myhandler ) t = r/s END INTEGER FUNCTION myhandler(sig,code,context) INTEGER sig, code, context(5) CALL abort() END demo% `f77 -silent DetExcHan.f` demo% `a.out` abort: called Abort (core dumped) demo%

demo% `cat LocExcHan.F` #include "f77_floatingpoint.h" EXTERNAL Exhandler INTEGER Exhandler, i, ieee_handler REAL r / 14.2 /, s / 0.0 /, t C Detect division by zero i = ieee_handler( 'set', 'division', Exhandler ) t = r/s END INTEGER FUNCTION Exhandler( sig, sip, uap) INTEGER sig STRUCTURE /fault/ INTEGER address END STRUCTURE STRUCTURE /siginfo/ INTEGER si_signo INTEGER si_code INTEGER si_errno RECORD /fault/ fault END STRUCTURE RECORD /siginfo/ sip WRITE (*,10) sip.si_signo, sip.si_code, sip.fault.address 10 FORMAT('Signal ',i4,' code ',i4,' at hex address ', Z8 ) Exhandler=1 CALL abort() END demo% `f77` -silent -g LocExcHan.F demo% `a.out` Signal 8 code 3 at hex address 11230 abort: called Abort (core dumped) demo%

demo% `dbx a.out` (dbx) `stopi at 0x11230` Set breakpoint at address (2) stopi at &MAIN+0x68 (dbx) `run` Run program Running: a.out (process id 18803) stopped in MAIN at 0x11230 MAIN+0x68: fdivs %f3, %f2, %f2 (dbx) `where` Shows the line number of the exception =>[1] MAIN(), line 7 in "LocExcHan.F" (dbx) `list 7` Displays the source code line 7 t = r/s (dbx) `cont` Continue after breakpoint, enter handler routine Signal 8 code 3 at hex address 11230 abort: called signal ABRT (Abort) in _kill at 0xef6e18a4 _kill+0x8: bgeu _kill+0x30 Current function is exhandler 24 CALL abort() (dbx) `quit` demo%

demo% `cat NoHandlers.c` int f77_no_handlers=1 ; demo% `cc -c NoHandlers.c` demo% `f77 NoHandlers.o MyProgram.f`

Note: IEEE floating-point exception flags raised: Division by Zero; IEEE floating-point exception traps enabled: inexact; underflow; overflow; invalid operation; See the Numerical Computation Guide, ieee_flags(3M), ieee_handler(3M)

demo% `f77 -g -ftrap=common -silent myprogram.f` demo% `dbx a.out` Reading symbolic information for a.out Reading symbolic information for rtld /usr/lib/ld.so.1 Reading symbolic information for libF77.so.3 Reading symbolic information for libc.so.1 Reading symbolic information for libdl.so.1 (dbx) `catch FPE` (dbx) `run` Running: a.out (process id 19739) signal FPE (floating point divide by zero) in MAIN at line 212 in file "myprogram.f" 212 Z = X/Y (dbx) `print Y` y = 0.0 (dbx)

demo% `cat myprog.F` #include "f77_floatingpoint.h" program myprogram ... ier = ieee_handler(`set','overflow',SIGFPE_ABORT) ... demo% `f77 -g -silent myprog.F` demo% `dbx a.out` Reading symbolic information for a.out Reading symbolic information for rtld /usr/lib/ld.so.1 Reading symbolic information for libF77.so.3 Reading symbolic information for libc.so.1 Reading symbolic information for libdl.so.1 (dbx) `catch FPE` (dbx) `run` Running: a.out (process id 19793) signal FPE (floating point overflow) in MAIN at line 55 in file "myprog.F" 55 w = rmax * 200. ! Cause of the overflow (dbx) `cont` ! Continue execution to completion Note: IEEE floating-point exception flags raised: Inexact; Division by Zero; Underflow; ! There were other exceptions IEEE floating-point exception traps enabled: overflow; See the Numerical Computation Guide... execution completed, exit code is 0 (dbx)

a = 1.0E-30 b = 1.0E-15 x = a * b

sum = 0 DO i = 1, n sum = sum + a(i) * b(i) END DO

DOUBLE PRECISION sum DO i = 1, n sum = sum + dble(a(i)) * dble(b(i)) END DO result = sum

if computed value is greater than 4.0, return 5.0 if computed value is between -4.0 and +4.0, return 0 if computed value is less than -4.0, return -5.0

real sum, a(maxn), b(maxn) ... do i =1, n sum = sum + a(i)*b(i) enddo

real a(maxn), b(maxn) double sum ... do i =1, n sum = sum + a(i)*dble(b(i)) enddo

Library | Contents | Previous | Next | Index

Chapter 6

Floating-Point Arithmetic

Introduction

IEEE Floating-Point Arithmetic

`-ftrap=`mode Compiler Options

Floating-Point Exceptions and Fortran

Handling Exceptions

Trapping a Floating-Point Exception

SPARC: Nonstandard Arithmetic

IEEE Routines

Flags and `ieee_flags()`

Turning Off All Warning Messages With `ieee_flags`

Detecting an Exception With `ieee_flags`

IEEE Extreme Value Functions

Exception Handlers and `ieee_handler()`

Establishing an Exception Handler Function

Writing User Exception Handler Functions

Detecting an Exception by Handler

Locating an Exception by Handler

Disabling All Signal Handlers

Retrospective Summary

Debugging IEEE Exceptions

Further Numerical Adventures

Avoiding Simple Underflow

Continuing With the Wrong Answer

SPARC: Excessive Underflow

Interval Arithmetic

Chapter 6

Floating-Point Arithmetic

Introduction

IEEE Floating-Point Arithmetic

-ftrap=mode Compiler Options

Floating-Point Exceptions and Fortran

Handling Exceptions

Trapping a Floating-Point Exception

SPARC: Nonstandard Arithmetic

IEEE Routines

Flags and ieee_flags()

Turning Off All Warning Messages With ieee_flags

Detecting an Exception With ieee_flags

IEEE Extreme Value Functions

Exception Handlers and ieee_handler()

Establishing an Exception Handler Function

Writing User Exception Handler Functions

Detecting an Exception by Handler

Locating an Exception by Handler

Disabling All Signal Handlers

Retrospective Summary

Debugging IEEE Exceptions

Further Numerical Adventures

Avoiding Simple Underflow

Continuing With the Wrong Answer

SPARC: Excessive Underflow

Interval Arithmetic

`-ftrap=`mode Compiler Options

Flags and `ieee_flags()`

Turning Off All Warning Messages With `ieee_flags`

Detecting an Exception With `ieee_flags`

Exception Handlers and `ieee_handler()`