Go to main content

man pages section 3: Library Interfaces and Headers

Exit Print View

Updated: July 2017
 
 

float(3HEAD)

Name

float.h, float - floating types

Synopsis

#include <float.h> 

Description

The characteristics of floating types are defined in terms of a model that describes a representation of floating-point numbers and values that provide information about an implementation's floating-point arithmetic.

The following parameters are used to define the model for each floating-point type:

s

sign (±1)

b

base or radix of exponent representation (an integer >1)

e

exponent (an integer between a minimum emin and a maximum emax)

p

precision (the number of base-b digits in the significand)

fk

non-negative integers less than b (the significand digits)

In addition to normalized floating-point numbers (f1>0 if x≠0), floating types might be able to contain other kinds of floating-point numbers, such as subnormal floating-point numbers (x≠0, e=e min, f1=0) and unnormalized floating-point numbers (x≠0, e=emin, f1=0), and values that are not floating-point numbers, such as infinities and NaNs. A NaN is an encoding signifying Not-a-Number. A quiet NaN propagates through almost every arithmetic operation without raising a floating-point exception; a signaling NaN generally raises a floating-point exception when occurring as an arithmetic operand.

The accuracy of the library functions in math.h(3HEAD) and complex.h(3HEAD) that return floating-point results is defined on the libm(3LIB) manual page.

All integer values in the <float.h> header, except FLT_ROUNDS, are constant expressions suitable for use in #if preprocessing directives; all floating values are constant expressions. All except DECIMAL_DIG, FLT_EVAL_METHOD, FLT_RADIX, and FLT_ROUNDS have separate names for all three floating-point types. The floating-point model representation is provided for all values except FLT_EVAL_METHOD and FLT_ROUNDS.

The rounding mode for floating-point addition is characterized by the value of FLT_ROUNDS:

–1

Indeterminable.

0

Toward zero.

1

To nearest.

2

Toward positive infinity.

3

Toward negative infinity.

The values of operations with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision might be greater than required by the type. The use of evaluation formats is characterized by the architecture-dependent value of FLT_EVAL_METHOD:

-1

Indeterminable.

0

Evaluate all operations and constants just to the range and precision of the type.

1

Evaluate operations and constants of type float and double to the range and precision of the double type; evaluate long double operations and constants to the range and precision of the long double type.

2

Evaluate all operations and constants to the range and precision of the long double type.

The values given in the following list are defined as constants.

  • Radix of exponent representation, b.

    FLT_RADIX
  • Number of base-FLT_RADIX digits in the floating-point significand, p.

    FLT_MANT_DIG
    DBL_MANT_DIG
    LDBL_MANT_DIG
  • Number of decimal digits, n, such that any floating-point number in the widest supported floating type with pmax radix b digits can be rounded to a floating-point number with n decimal digits and back again without change to the value.

    DECIMAL_DIG
  • Number of decimal digits, q, such that any floating-point number with q decimal digits can be rounded into a floating-point number with p radix b digits and back again without change to the q decimal digits.

    FLT_DIG
    DBL_DIG
    LDBL_DIG
  • Minimum negative integer such that FLT_RADIX raised to that power minus 1 is a normalized floating-point number, emin.

    FLT_MIN_EXP
    DBL_MIN_EXP
    LDBL_MIN_EXP
  • Minimum negative integer such that 10 raised to that power is in the range of normalized floating-point numbers.

    FLT_MIN_10_EXP
    DBL_MIN_10_EXP
    LDBL_MIN_10_EXP
  • Maximum integer such that FLT_RADIX raised to that power minus 1 is a representable finite floating-point number, emax.

    FLT_MAX_EXP
    DBL_MAX_EXP
    LDBL_MAX_EXP
  • Maximum integer such that 10 raised to that power is in the range of representable finite floating-point numbers.

    FLT_MAX_10_EXP
    DBL_MAX_10_EXP
    LDBL_MAX_10_EXP

The values given in the following list are defined as constant expressions with values that are greater than or equal to those shown:

  • Maximum representable finite floating-point number.

    FLT_MAX
    DBL_MAX
    LDBL_MAX

The values given in the following list are defined as constant expressions with implementation-defined (positive) values that are less than or equal to those shown:

  • The difference between 1 and the least value greater than 1 that is representable in the given floating-point type, b1 - p.

    FLT_EPSILON
    DBL_EPSILON
    LDBL_EPSILON
  • Minimum normalized positive floating-point number, bemin-1.

    FLT_MIN
    DBL_MIN
    LDBL_MIN

Attributes

See attributes(5) for descriptions of the following attributes:

ATTRIBUTE TYPE
ATTRIBUTE VALUE
Interface Stability
Committed
Standard

See Also

complex.h(3HEAD), math.h(3HEAD), attributes(5), standards(5)