|Numerical Computation Guide|
SPARC Behavior and Implementation
This chapter discusses issues related to the floating-point units used in SPARC workstations and describes a way to determine which code generation flags are best suited for a particular workstation.
This section lists a number of SPARC floating-point units and describes the instruction sets and exception handling features they support. See the SPARC Architecture Manual Version 8 Appendix N, "SPARC IEEE 754 Implementation Recommendations", and Version 9 Appendix B, "IEEE Std 754-1985 Requirements for SPARC-V9", for brief descriptions of what happens when a floating-point trap is taken, the distinction between trapped and untrapped underflow, and recommended possible courses of action for SPARC implementations that provide a non-IEEE (nonstandard) arithmetic mode.
TABLE B-1 lists the hardware floating-point implementations used by SPARC workstations. Many early SPARC systems have floating-point units derived from cores developed by TI or Weitek:
- TI family - includes the TI8847 and the TMS390C602A
- Weitek family - includes the 1164/1165, the 3170, and 3171
These two families of FPUs have been licensed to other workstation vendors, so chips from other semiconductor manufacturers may be found in some SPARC workstations. Some of these other chips are also shown in the table
TABLE B-1 SPARC Floating-Point Options Weitek 1164/ 1165-based FPU or no FPU Kernel emulates floating-point instructions Obsolete Slow; not recommended
TI 8847-based FPU TI 8847; controller from Fujitsu or LSI Sun-4/1xx
SPARCstation 1 (4/60)
1989 Most SPARCstation 1 workstations have Weitek 3170
Weitek 3170- based FPU SPARCstation 1 (4/60) SPARCstation 1+ (4/65) 1989, 1990
TI 602a SPARCstation 2 (4/75) 1990
Weitek 3172- based FPU SPARCstation SLC (4/20)
SPARCstation IPC (4/40)
Weitek 8601 or Fujitsu 86903 Integrated CPU and FPU SPARCstation IPX (4/50)
SPARCstation ELC (4/25)
1991 IPX uses 40 MHz CPU/FPU; ELC uses 33 MHz
Cypress 602 Resides on Mbus Module SPARCserver 6xx 1991
TI TMS390S10 (STP1010) microSPARC-I SPARCstation LX
1992 No FsMULd in hardware
Fujitsu 86904 (STP1012) microSPARC-II SPARCstation 4 and 5
No FsMULd in hardware
TI TMS390Z50 (STP1020A) SuperSPARC-I SPARCserver 6xx
STP1021A SuperSPARC-II SPARCserver 6xx
Ross RT620 hyperSPARC SPARCstation 10/HSxx
Fujitsu 86907 TurboSPARC SPARCstation 4 and 5
STP1030A UltraSPARC-I Ultra-1, Ultra-2, Ex000 V9+VIS
STP1031 UltraSPARC-II Ultra-2, E450, Ultra-30, Ultra-60, Ultra-80, Ex500 Ex000, E10000 V9+VIS
SME1040 UltraSPARC-IIi Ultra-5, Ultra-10 V9+VIS
The last column in the preceding table shows the compiler flags to use to obtain the fastest code for each FPU. These flags control two independent attributes of code generation: the
-xarchflag determines the instruction set the compiler may use, and the
-xchipflag determines the assumptions the compiler will make about a processor's performance characteristics in scheduling the code. Because all SPARC floating-point units implement at least the floating-point instruction set defined in the SPARC Architecture Manual Version 7, a program compiled with
-xarch=v7will run on any SPARC system, although it may not take full advantage of the features of later processors. Likewise, a program compiled with a particular
-xchipvalue will run on any SPARC system that supports the instruction set specified with
-xarch, but it may run more slowly on systems with processors other than the one specified.
The floating-point units listed in the table preceding the microSPARC-I implement the floating-point instruction set defined in the SPARC Architecture Manual Version 7. Programs that must run on systems with these FPUs should be compiled with
-xarch=v7. The compilers make no special assumptions regarding the performance characteristics of these processors, so they all share a single
-xchip=old. (Not all of the systems listed in TABLE B-1 are still supported by Sun WorkShop Compilers; they are listed solely for historical purposes. Refer to the appropriate version of the Numerical Computation Guide for the code generation flags to use with compilers supporting these systems.)
The microSPARC-I and microSPARC-II floating-point units implement the floating-point instruction set defined in the SPARC Architecture Manual Version 8 except for the FsMULd and quad precision instructions. Programs compiled with
-xarch=v8will run on systems with these processors, but because unimplemented floating-point instructions must be emulated by the system kernel, programs that use FsMULd extensively (such as Fortran programs that perform a lot of single precision complex arithmetic), may encounter severe performance degradation. To avoid this, compile programs for systems with these processors with
The SuperSPARC-I, SuperSPARC-II, hyperSPARC, and TurboSPARC floating-point units implement the floating-point instruction set defined in the SPARC Architecture Manual Version 8 except for the quad precision instructions. To get the best performance on systems with these processors, compile with
The UltraSPARC-I, UltraSPARC-II, and UltraSPARC-IIi floating-point units implement the floating-point instruction set defined in the SPARC Architecture Manual Version 9 except for the quad precision instructions; in particular, they provide 32 double precision floating-point registers. To allow the compiler to use these registers, compile with
-xarch=v8plus(for programs that run under a 32-bit OS) or
-xarch=v9(for programs that run under a 64-bit OS). These processors also provide extensions to the standard instruction set. The additional instructions, known as the Visual Instruction Set or VIS, are rarely generated automatically by the compilers, but they may be used in assembly code. Therefore, to take full advantage of the instruction set these processors support, use
-xchipoptions can be specified simultaneously using the
-xtargetmacro. (That is, the
-xtargetflag simply expands to a suitable combination of
-xcacheflags.) The default code generation option is
-xtarget=generic. See the
f95(1) man pages and the compiler manuals for more information including a complete list of
-xtargetvalues. Additional -
xarchinformation is provided in the Fortran User's Guide, C User's Guide, and C++ User's Guide.
Floating-Point Status Register and Queue
All SPARC floating-point units, regardless of which version of the SPARC architecture they implement, provide a floating-point status register (FSR) that contains status and control bits associated with the FPU. All SPARC FPUs that implement deferred floating-point traps provide a floating-point queue (FQ) that contains information about currently executing floating-point instructions. The FSR can be accessed by user software to detect floating-point exceptions that have occurred and to control rounding direction, trapping, and nonstandard arithmetic modes. The FQ is used by the operating system kernel to process floating-point traps and is normally invisible to user software.
Software accesses the floating-point status register via
LDFSRinstructions that store the FSR in memory and load it from memory, respectively. In SPARC assembly language, these instructions are written as follows:
st %fsr, [addr] ! store FSR at specified addressld [addr], %fsr ! load FSR from specified address
The inline template file
libm.illocated in the directory containing the libraries supplied with Sun WorkShop Compilers contains examples showing the use of
FIGURE B-1 shows the layout of bit fields in the floating-point status register.
FIGURE B-1 SPARC Floating-Point Status Register
(In versions 7 and 8 of the SPARC architecture, the FSR occupies 32 bits as shown. In version 9, the FSR is extended to 64 bits, of which the lower 32 match the figure; the upper 32 are largely unused, containing only three additional floating point condition code fields.)
resrefers to bits that are reserved,
veris a read-only field that identifies the version of the FPU, and
qneare used by the system when it processes floating-point traps. The remaining fields are described in the following table
TABLE B-2 Floating-Point Status Register Fields RM rounding direction mode TEM trap enable modes NS nonstandard mode fcc floating point condition code aexc accrued exception flags cexc current exception flags
RMfield holds two bits that specify the rounding direction for floating-point operations. The
NSbit enables nonstandard arithmetic mode on SPARC FPUs that implement it; on others, this bit is ignored. The
fccfield holds floating-point condition codes generated by floating-point compare instructions and used by branch and conditional move operations. Finally, the
cexcfields contain five bits that control trapping and record accrued and current exception flags for each of the five IEEE 754 floating-point exceptions. These fields are subdivided as shown in TABLE B-3.
TABLE B-3 Exception Handling Fields
TEM, trap enable modes
NVM 27 OFM 26 UFM 25 DZM 24 NXM 23
aexc, accrued exception flags
nva 9 ofa 8 ufa 7 dza 6 nxa 5
cexc, current exception flags
nvc 4 ofc 3 ufc 2 dzc 1 nxc 0
(The symbols NV, OF, UF, DZ, and NX above stand for the invalid operation, overflow, underflow, division-by-zero, and inexact exceptions respectively.)
Special Cases Requiring Software Support
In most cases, SPARC floating-point units execute instructions completely in hardware without requiring software support. There are four situations, however, when the hardware will not successfully complete a floating-point instruction:
- The floating-point unit is disabled.
- The instruction is not implemented by the hardware (such as
fsqrt[sd]on Weitek 1164/1165-based FPUs,
fsmuldon microSPARC-I and microSPARC-II FPUs, or quad precision instructions on any SPARC FPU).
- The hardware is unable to deliver the correct result for the instruction's operands.
- The instruction would cause an IEEE 754 floating-point exception and that exception's trap is enabled.
In each situation, the initial response is the same: the process "traps" to the system kernel, which determines the cause of the trap and takes the appropriate action. (The term "trap" refers to an interruption of the normal flow of control.) In the first three situations, the kernel emulates the trapping instruction in software. Note that the emulated instruction can also incur an exception whose trap is enabled.
In the first three situations above, if the emulated instruction does not incur an IEEE floating-point exception whose trap is enabled, the kernel completes the instruction. If the instruction is a floating-point compare, the kernel updates the condition codes to reflect the result; if the instruction is an arithmetic operation, it delivers the appropriate result to the destination register. It also updates the current exception flags to reflect any (untrapped) exceptions raised by the instruction, and it "or"s those exceptions into the accrued exception flags. It then arranges to continue execution of the process at the point at which the trap was taken.
When an instruction executed by hardware or emulated by the kernel software incurs an IEEE floating-point exception whose trap is enabled, the instruction is not completed. The destination register, floating point condition codes, and accrued exception flags are unchanged, the current exception flags are set to reflect the particular exception that caused the trap, and the kernel sends a
SIGFPEsignal to the process.
The following pseudo-code summarizes the handling of floating-point traps. Note that the
aexcfield can normally only be cleared by software.
FPop provokes a trap;if trap type is fp_disabled, unimplemented_FPop, orunfinished_FPop thenemulate FPop;texc ¨ all IEEE exceptions generated by FPop;if (texc and TEM) = 0 thenf[rd] ¨ fp_result; // if fpop is an arithmetic opfcc ¨ fcc_result; // if fpop is a comparecexc ¨ texc;aexc ¨ (aexc or texc);elsecexc ¨ trapped IEEE exception generated by FPop;throw SIGFPE;
A program will encounter severe performance degradation when many floating-point instructions must be emulated by the kernel. The relative frequency with which this happens can depend on several factors including, of course, the type of trap.
Under normal circumstances, the
fp_disabledtrap should occur only once per process. The system kernel disables the floating-point unit when a process is first started, so the first floating-point operation executed by the process will cause a trap. After processing the trap, the kernel enables the floating-point unit, and it remains enabled for the duration of the process. (It is possible to disable the floating-point unit for the entire system, but this is not recommended and is done only for kernel or hardware debugging purposes.)
unimplemented_FPoptrap will obviously occur any time the floating-point unit encounters an instruction it does not implement. Since most current SPARC floating-point units implement at least the instruction set defined by the SPARC Architecture Manual Version 8 except for the quad precision instructions, and the Sun WorkShop Compilers do not generate quad precision instructions, this type of trap should not occur on most systems. As mentioned above, two notable exceptions are the microSPARC-I and microSPARC-II processors, which do not implement the FsMULd instruction. To avoid
unimplemented_FPoptraps on these processors, compile programs with the
The remaining two trap types,
unfinished_FPopand trapped IEEE exceptions, are usually associated with special computational situations involving NaNs, infinities, and subnormal numbers.
IEEE Floating-Point Exceptions, NaNs, and Infinities
When a floating-point instruction encounters an IEEE floating-point exception whose trap is enabled, the instruction is not completed; instead the system delivers a
SIGFPEsignal to the process. If the process has established a
SIGFPEsignal handler, that handler is invoked, and otherwise, the process aborts. Since trapping is most often enabled for the purpose of aborting the program when an exception occurs, either by invoking a signal handler that prints a message and terminates the program or by resorting to the system default behavior when no signal handler is installed, most programs do not incur many trapped IEEE floating-point exceptions. As described in Chapter 4, however, it is possible to arrange for a signal handler to supply a result for the trapping instruction and continue execution. Note that severe performance degradation can result if many floating-point exceptions are trapped and handled in this way.
Most SPARC floating-point units will also trap on at least some cases involving infinite or NaN operands or IEEE floating-point exceptions even when trapping is disabled or an instruction would not cause an exception whose trap is enabled. This happens when the hardware does not support such special cases; instead it generates an
unfinished_FPoptrap and leaves the kernel emulation software to complete the instruction. Different SPARC FPUs vary as to the conditions that result in an
unfinished_FPoptrap: for example, most early SPARC FPUs as well as the hyperSPARC FPU trap on all IEEE floating-point exceptions regardless of whether trapping is enabled, while UltraSPARC FPUs can trap "pessimistically" when a floating-point exception's trap is enabled and the hardware is unable to determine whether or not an instruction would raise that exception. On the other hand, the SuperSPARC-I, SuperSPARC-II, TurboSPARC, microSPARC-I, and microSPARC-II FPUs handle all exceptional cases in hardware and never generate
unfinished_FPoptraps occur in conjunction with floating-point exceptions, a program can avoid incurring an excessive number of these traps by employing exception handling (i.e., testing the exception flags, trapping and substituting results, or aborting on exceptions). Of course, care must be taken to balance the cost of handling exceptions with that of allowing exceptions to result in
Subnormal Numbers and Nonstandard Arithmetic
The most common situations in which some SPARC floating-point units will trap with an
unfinished_FPopinvolve subnormal numbers. Many SPARC FPUs will trap whenever a floating-point operation involves subnormal operands or must generate a nonzero subnormal result (i.e., a result that incurs gradual underflow). Because underflow is somewhat rare but difficult to program around, and because the accuracy of underflowed intermediate results often has little effect on the overall accuracy of the final result of a computation, the SPARC architecture includes a nonstandard arithmetic mode that provides a way for a user to avoid the performance degradation associated with
unfinished_FPoptraps involving subnormal numbers.
The SPARC architecture does not precisely define nonstandard arithmetic mode; it merely states that when this mode is enabled, processors that support it may produce results that do not conform to the IEEE 754 standard. However, all existing SPARC implementations that support this mode use it to disable gradual underflow, replacing all subnormal operands and results with zero. (There is one exception: Weitek 1164/1165 FPUs only flush subnormal results to zero in nonstandard mode, they do not treat subnormal operands as zero.)
Not all SPARC implementations provide a nonstandard mode. Specifically, the SuperSPARC-I, SuperSPARC-II, TurboSPARC, microSPARC-I, and microSPARC-II floating-point units handle subnormal operands and generate subnormal results entirely in hardware, so they do not need to support nonstandard arithmetic. (Any attempt to enable nonstandard mode on these processors is ignored.) Therefore, gradual underflow incurs no performance loss on these processors.
To determine whether gradual underflows are affecting the performance of a program, you should first determine whether underflows are occurring at all and then check how much system time is used by the program. To determine whether underflows are occurring, you can use the math library function
ieee_retrospective()to see if the underflow exception flag is raised when the program exits. Fortran programs call
ieee_retrospective()by default. C and C++ programs need to call
ieee_retrospective()explicitly prior to exit. If any underflows have occurred,
ieee_retrospective()prints a message similar to the following:
Note: IEEE floating-point exception flags raised:
See the Numerical Computation Guide, ieee_flags(3M)
If the program encounters underflows, you might want to determine how much system time the program is using by timing the program execution with the
demo% /bin/time myprog > myprog.output305.3 real 32.4 user 271.9 sys
If the system time (the third figure shown above) is unusually high, multiple underflows might be the cause. If so, and if the program does not depend on the accuracy of gradual underflow, you can enable nonstandard mode for better performance. There are two ways to do this. First, you can compile with the
-fnsflag (which is implied as part of the macros
-fnonstd) to enable nonstandard mode at program startup. Second, the value-added math library
libsunmathprovides two functions to enable and disable nonstandard mode, respectively: calling
nonstandard_arithmetic()enables nonstandard mode (if it is supported), while calling
standard_arithmetic()restores IEEE behavior. The C and Fortran syntax for calling these functions is as follows:
Caution Since nonstandard arithmetic mode defeats the accuracy benefits of gradual underflow, you should use it with caution. For more information about gradual underflow, see Chapter 2.
Nonstandard Arithmetic and Kernel Emulation
On SPARC floating-point units that implement nonstandard mode, enabling this mode causes the hardware to treat subnormal operands as zero and flush subnormal results to zero. The kernel software that is used to emulate trapped floating-point instructions, however, does not implement nonstandard mode, in part because the effect of this mode is undefined and implementation-dependent and because the added cost of handling gradual underflow is negligible compared to the cost of emulating a floating-point operation in software.
If a floating-point operation that would be affected by nonstandard mode is interrupted (for example, it has been issued but not completed when a context switch occurs or another floating-point instruction causes a trap), it will be emulated by kernel software using standard IEEE arithmetic. Thus, under unusual circumstances, a program running in nonstandard mode might produce slightly varying results depending on system load. This behavior has not been observed in practice. It would affect only those programs that are very sensitive to whether one particular operation out of millions is executed with gradual underflow or with abrupt underflow.
fpversion(1) Function -- Finding Information About the FPU
fpversionutility distributed with the compilers identifies the installed CPU and estimates the processor and system bus clock speeds.
fpversiondetermines the CPU and FPU types by interpreting the identification information stored by the CPU and FPU. It estimates their clock speeds by timing a loop that executes simple instructions that run in a predictable amount of time. The loop is executed many times to increase the accuracy of the timing measurements. For this reason,
fpversionis not instantaneous; it can take several seconds to run.
fpversionalso reports the best
-xtargetcode generation option to use for the host system.
On an Ultra 4 workstation,
fpversiondisplays information similar to the following. (There may be variations due to differences in timing or machine configuration.)
fpversionA SPARC-based CPU is available.CPU's clock rate appears to be approximately 461.1 MHz.Kernel says CPU's clock rate is 480.0 MHz.Kernel says main memory's clock rate is 120.0 MHz.Sun-4 floating-point controller version 0 found.An UltraSPARC chip is available.FPU's frequency appears to be approximately 492.7 MHz.Use "-xtarget=ultra2 -xcache=16/32/1:2048/64/1" code-generation option.Hostid = hardware_host_id
fpversion(1)manual page for more information.
Sun Microsystems, Inc.
Copyright information. All rights reserved.
|Library | Contents | Previous | Next | Index|