previous | start | next

Arithmetic Operations on IEEE Floating Point Values

      |s| e |   f   |

      C float type
      s : 1 bit
      e : 8 bits
      f : 23 bits

      If e is not 0xff, the value is:

      Value = (-1)s * M * 2E

      a. If e as an unsigned int is not all 0's and not all 1's (!= 0
         and != 255) this is a NORMALIZED floating point value.

        Sign = (-1)s
        Exponent E = e - 127
        M = 1 + f/223

     b. If e == 0, (DENORMALIZED; can represent small values including
     0)

        Sign = (-1)s
        Exponent E = 1 - 127 = -126 (not 0 - 127)
        M = f/223
          
          Note: +0 and -0 have different bit reps.

     c. If e = 0xff, (SPECIAL VALUES; +inf and -inf)
        
        if s = 0, f = 0, value is +inf
        if s = 1, f = 0, value is -inf
        if f != 0, value is NaN  (Not a Number)
      
   

Floating Point Arithmetic

    +inf + +inf = +inf
    -inf + -inf = -inf
    +inf - +inf = NaN
    
    x + +inf = +inf    if x is not a special value
    x + -inf = -inf    if x is not a special value
    x + NaN = NaN      for all x
   


previous | start | next