Reproducible Results - Oracle® Developer Studio 12.5: Numerical Computation Guide

Language:

5.5 Reproducible Results

As is discussed in section D.11 of the Numerical Computation Guide, even standard-conforming implementations of IEEE arithmetic might produce different results. Often these results are equally good, but equally often it is tedious or difficult to prove that. For many purposes, it's better to sacrifice some performance to reduce the amount of error analysis necessary to validate results. It's not obvious when minor or major differences in output are equally good, and whether they are due to user program errors, compiler optimization errors, or hardware errors.

There are several principal root causes of varying results of IEEE floating-point arithmetic. The following lists these causes and describes some approaches to reducing gratuitous variation across the releases and supported platforms of Oracle Developer Studio. Note that each approach increases reproducibility while potentially reducing performance. Sometimes the performance loss can be noticeable.

5.5.1 Transcendental Functions

Most of the common math library functions standardized by programming languages, such ass exponential, logarithmic, and trigonometric functions, are expensive to round correctly, compared to rational arithmetic or algebraic functions like square root (sqrt()). Nearly correctly rounded functions are suitable for most purposes, and much faster. But the fastest nearly-correctly-rounded functions differ on different platforms.

Use portable code for the functions used by the application. One source of such code is the Freely-Distributable Math Library, fdlibm. It can be obtained from the Netlib software repository.
Avoid the –xvector option. The vectorized versions of transcendental functions are optimized for a particular platform and produce slightly different results on different platforms.
Avoid the x86 hardware transcendental instructions. Even though these instructions have error bounds almost as small as possible, they are not quite correctly rounded. Also, the Intel and AMD versions differ occasionally, even though both are quite good. With Oracle Developer Studio C/C++ compilers, –xbuiltin=%default can be used, especially after –fast, to make sure that none of the transcendental instructions are substituted inline by the compiler for built-in transcendental functions. Likewise the –xnolibmil option after –fast disables inline templates; libm.il from Oracle Developer Studio might have some templates that invoke the transcendental instructions.

5.5.2 Associative Operations

Addition and multiplication are associative in real arithmetic - sums and products may be computed in any order. But in the presence of roundoff, the order of evaluation affects the computed answer.

Avoid the –xreduction parallelization option. Oracle Developer Studio optimizes reductions in a way that is not deterministic.
Avoid Fortran's DOT and MATMUL operations. These intrinsics in Fortran 90 and later are implemented by different methods on different platforms and will round differently. If parallelization is also enabled, the results might not be deterministic due to reduction optimizations. Dot product and matrix multiplication operations can be coded in portable Fortran such as that available in the LAPACK library from the Netlib software repository.

5.5.3 Indeterminate Evaluation

In many languages, the order of external expression evaluation is not specified by the language. Thus if ranf(x)() is a random number generator, the expression ranf(x) * a + ranf(x) * b() might give different results for different compilers, or different optimization levels of the same compiler, if the order of evaluation of the two ranf(x)() invocations changes.

Avoid expressions with two external references - split such expressions into several statements with at most one external reference in each. Thus

z = ranf(x) * a + ranf(x) * b()

can be replaced by

t = ranf(x) * a()

z = t + ranf(x) * b().

5.5.4 Non-Portable Types

long double in C/C++ is implemented differently on SPARC and x86 in Studio, with 113 significant bits and 64 significant bits respectively. Programs with explicit long double variables are thus bound to behave differently on SPARC and x86.

5.5.5 Implicit Higher Precision

In some situations, expressions might be evaluated in higher precision than is explicit in the source code. This can happen when x87 extended precision registers are used to evaluate expressions involving single or double precision variables. It can also happen when fused multiply-add operations are substituted for pairs of multiplications and additions.

Avoid optimizing multiply-add pairs as fused multiply-add operations. Use –fma=none after –fast.
If –xarch=386 must be used and there is no explicit use of long double types, then it might be possible to mitigate the effects of extended-precision expression evaluation by compiling with –fprecision=single if all variables are float, or –fprecision=double if all variables are double. However if Fortran complex*8 variables are in use under –xarch=386, then there is no way to insure that all expression evaluations occur in single precision. Using –m64 is preferable to –m32 because function values are passed in registers of the same precision as the functions.