|s| e | f |
C float type
s : 1 bit
e : 8 bits
f : 23 bits
If e is not 0xff, the value is:
Value = (-1)s * M * 2E
a. If e as an unsigned int is not all 0's and not all 1's (!= 0
and != 255) this is a NORMALIZED floating point value.
Sign = (-1)s
Exponent E = e - 127
M = 1 + f/223
b. If e == 0, (DENORMALIZED; can represent small values including
0)
Sign = (-1)s
Exponent E = 1 - 127 = -126 (not 0 - 127)
M = f/223
Note: +0 and -0 have different bit reps.
c. If e = 0xff, (SPECIAL VALUES; +inf and -inf)
if s = 0, f = 0, value is +inf
if s = 1, f = 0, value is -inf
if f != 0, value is NaN (Not a Number)
Floating Point Arithmetic
+inf + +inf = +inf
-inf + -inf = -inf
+inf - +inf = NaN
x + +inf = +inf if x is not a special value
x + -inf = -inf if x is not a special value
x + NaN = NaN for all x