float and double numbers are said to contain a “hidden,” or implied, bit, providing for one more bit of precision than would otherwise be the case. In the case of long double, the leading bit is implicit (SPARC) or explicit (x86); this bit is 1 for normal numbers, and 0 for subnormal numbers.
Table F–10 float Representations
normal number (0<e<255): |
(-1)Sign2 (exponent- 127)1.f |
subnormal number (e=0, f!=0): |
(-1)Sign2 (-126)0.f |
zero (e=0, f=0): |
(-1)Sign0.0 |
signaling NaN |
s=u, e=255(max); f=.0uuu-uu; at least one bit must be nonzero |
quiet NaN |
s=u, e=255(max); f=.1uuu-uu |
Infinity |
s=u, e=255(max); f=.0000-00 (all zeroes) |
Table F–11 double Representations
normal number (0<e<2047): |
(-1)Sign2 (exponent- 1023)1.f |
subnormal number (e=0, f!=0): |
(-1)Sign2 (-1022)0.f |
zero (e=0, f=0): |
(-1)Sign0.0 |
signaling NaN |
s=u, e=2047(max); f=.0uuu-uu; at least one bit must be nonzero |
quiet NaN |
s=u, e=2047(max); f=.1uuu-uu |
Infinity |
s=u, e=2047(max); f=.0000-00 (all zeroes) |
Table F–12 long double Representations
normal number (0<e<32767): |
(-1)Sign2 (exponent- 16383)1.f |
subnormal number (e=0, f!=0): |
(-1)Sign2 (-16382)0.f |
zero (e=0, f=0): |
(-1)Sign0.0 |
signaling NaN |
s=u, e=32767(max); f=.0uuu-uu; at least one bit must be nonzero |
quiet NaN |
s=u, e=32767(max); f=.1uuu-uu |
Infinity |
s=u, e=32767(max); f=.0000-00 (all zeroes) |