Oracle® Solaris Studio 12.4: Numerical Computation Guide

Exit Print View

Updated: January 2015
 
 

2.2.4 Quadruple Format

The floating-point environment's quadruple-precision format also conforms to the IEEE definition of double-extended format. This format is not in Oracle Solaris Studio C/C++ compilers for x86. The quadruple-precision format occupies four 32-bit words and consists of three fields: a 112-bit fraction f; a 15-bit biased exponent e; and a 1-bit sign s. These are stored contiguously as shown in the following figure.

The highest addressed 32-bit word contains the least significant 32-bits of the fraction, denoted f[31:0]. The next two 32-bit words contain f[63:32] and f[95:64], respectively. Bits 0:15 of the next word contain the 16 most significant bits of the fraction, f[111:96], with bit 0 being the least significant of these 16 bits, and bit 15 being the most significant bit of the entire fraction. Bits 16:30 contain the 15-bit biased exponent, e, with bit 16 being the least significant bit of the biased exponent and bit 30 being the most significant; and bit 31 contains the sign bit, s.

The following figure numbers the bits as though the four contiguous 32-bit words were one 128-bit word in which bits 0:111 store the fraction, f; bits 112:126 store the 15-bit biased exponent, e; and bit 127 stores the sign bit, s.

Figure 2-3  Quadruple Format

image:Figure representing quadruple format.

The values of the bit patterns in the three fields f, e, and s, determine the value represented by the overall bit pattern.

Table 2–6 shows the correspondence between the values of the three constituent fields and the value represented by the bit pattern in quadruple-precision format. u means don't care, because the value of the indicated field is irrelevant to the determination of values for the particular bit patterns.

Table 2-6  Values Represented by Bit Patterns
Quadruple Bit Pattern
Value
0 < e < 32767
(–1)s × 2e–16383 × 1.f ( normal numbers)
e = 0, f ≠ 0
(at least one bit in f is nonzero)
(–1)s × 216382 × 0.f ( subnormal numbers)
e = 0, f = 0
(all bits in f are zero)
(–1)s × 0.0 (signed zero)
s = 0, e = 32767, f = 0
(all bits in f are zero)
+INF (positive infinity)
s = 1, e = 32767; f = 0
(all bits in f are zero)
-INF (negative infinity)
s = u, e = 32767, f ≠ 0
(at least one bit in f is nonzero)
NaN (Not-a-Number)

Examples of important bit patterns in the quadruple-precision double-extended storage format are shown in Table 2–7. The bit patterns in the second column appear as four 8-digit hexadecimal numbers. The left-most number is the value of the lowest addressed 32-bit word, and the right-most number is the value of the highest addressed 32-bit word. The maximum positive normal number is the largest finite number representable in the quadruple precision format. The minimum positive subnormal number is the smallest positive number representable in the quadruple precision format. The minimum positive normal number is often referred to as the underflow threshold. (The decimal values for the maximum and minimum normal and subnormal numbers are approximate; they are correct to the number of figures shown.)

Table 2-7  Bit Patterns in Quadruple Format
Common Name
Bit Pattern (SPARC)
Decimal Value
+0
00000000 00000000 00000000 00000000
0.0
–0
80000000 00000000 00000000 00000000
–0.0
1
3fff0000 00000000 00000000 00000000
1.0
2
40000000 00000000 00000000 00000000
2.0
max normal
7ffeffff ffffffff ffffffff ffffffff
1.1897314953572317650857593266280070e+4932
min normal
00010000 00000000 00000000 00000000
3.3621031431120935062626778173217526e–4932
max subnormal
0000ffff ffffffff far-off ffffffff
3.3621031431120935062626778173217520e–4932
min pos subnormal
00000000 00000000 00000000 00000001
6.4751751194380251109244389582276466e–4966
+∞
7fff0000 00000000 00000000 00000000
+∞
–∞
ffff0000 00000000 00000000 00000000
–∞
Not-a-Number
7fff8000 00000000 00000000 00000000
NaN

The hex value of the NaN shown in Table 2–7 is just one of the many bit patterns that can be used to represent NaNs.