float and double numbers are said to contain a "hidden," or implied, bit, providing for one more bit of precision than would otherwise be the case. In the case of long double, the leading bit is implicit (SPARC) or explicit (Intel); this bit is 1 for normal numbers, and 0 for subnormal numbers.
Table A-10 float Representations
normal number (0<e<255): |
(-1)Sign2 (exponent - 127)1.f |
subnormal number (e=0, f!=0): |
(-1)Sign2 (-126)0.f |
zero (e=0, f=0): |
(-1)Sign0.0 |
signaling NaN |
s=u, e=255(max); f=.0uuu-uu; at least one bit must be nonzero |
quiet NaN |
s=u, e=255(max); f=.1uuu-uu |
Infinity |
s=u, e=255(max); f=.0000-00 (all zeroes) |
Table A-11 double Representations
normal number (0<e<2047): |
(-1)Sign2 (exponent - 1023)1.f |
subnormal number (e=0, f!=0): |
(-1)Sign2 (-1022)0.f |
zero (e=0, f=0): |
(-1)Sign0.0 |
signaling NaN |
s=u, e=2047(max); f=.0uuu-uu; at least one bit must be nonzero |
quiet NaN |
s=u, e=2047(max); f=.1uuu-uu |
Infinity |
s=u, e=2047(max); f=.0000-00 (all zeroes) |
Table A-12 long double Representations
normal number (0<e<32767): |
(-1)Sign2 (exponent - 16383)1.f |
subnormal number (e=0, f!=0): |
(-1)Sign2 (-16382)0.f |
zero (e=0, f=0): |
(-1)Sign0.0 |
signaling NaN |
s=u, e=32767(max); f=.0uuu-uu; at least one bit must be nonzero |
quiet NaN |
s=u, e=32767(max); f=.1uuu-uu |
Infinity |
s=u, e=32767(max); f=.0000-00 (all zeroes) |