CSC373 Sep20

slide version

single file version

Contents

  1. Overflow and Unsigned (8 bit) integers
  2. Overflow and (8 bit) signed integers
  3. Summary for signed and unsigned 8 bit integers
  4. Computing -x without unary minus
  5. Binary to Decimal Conversion signed integers
  6. tmax
  7. Challenge Function
  8. Finding the Answer
  9. Solution
  10. Logical Right Shift
  11. Signed Integer Conversions I
  12. Signed Integer Conversions II
  13. Unsigned Integer Conversion
  14. Mixing Signed and Unsigned Integers I
  15. Mixing Signed and Unsigned Integers II
  16. IEEE Floating Point Representation
  17. Fractional Decimal Values
  18. Fractional Binary Values
  19. Floating Point Example
  20. Floating point representation of 100.5
  21. IEEE Floating Point

Overflow and Unsigned (8 bit) integers[1] [top]

When adding 8 bit unsigned integers (type unsigned char), overflow occurs if there is a carry out of bit position 7. But the carry is simply discarded.

What is the value of the carry into the 8th position? Answer: 28 = 256.

For mathematical binary addition, overflow doesn't occur. Mathematically, there is no largest integer.

If overflow occurs addition using 8 bit unsigned integers, discarding the carry is the same as subtracting 256 from the mathematical addition result.

Examples

unsigned char x unsigned char y unsigned char z = x + y
(mathematically)
(z <= 255)? z : z - 256
254 1 255 255
254 2 256 0 (= 256 - 256)
254 3 257 1 (= 257 - 256)

Overflow and (8 bit) signed integers[2] [top]

Since the 256 possible bit patterns for 8 bits (28 = 256) are to represent both positive and negative values, half of them must be negative.

So the range is

      -128 to +127
    

So overflow occurs if mathematically a sum is > 127 or < -128.

For the 8 bit signed type char, binary addition works the same way as for unsigned char.

That is, if there is a carry out of the 7th bit, it is discarded.

This carry in to the 8th bit position has value 256 (= 28). So discarding it is the same mathematically as subtracting 256.

Examples

char x char y char z = x + y
(mathematically)
(z <= 127)? z : z - 256
126 1 127 127
126 2 128 -128 (= 128 - 256)
126 3 129 -127 (= 129 - 256)

(If z is negative mathematically, then as signed char, the value is (z >= -128) ? z : z + 256.)

Summary for signed and unsigned 8 bit integers[3] [top]

For signed integers, the left most bit position determines whether the integer is negative or not. This is called the sign bit.

For the char type the sign bit position is 7.

For the int type the sign bit position is 31.

8 Bits As unsigned char As char
0000 0000 0 0
0000 0001 1 1
. . .
. . .
. . .
0111 1111 127 127
1000 0000 128 -128
1000 0001 129 -127
. . .
. . .
. . .
1111 1111 255 -1

Computing -x without unary minus[4] [top]

Because C integer types use a fixed number of bits, they are not the same as mathematical integers which are not bounded.

Since the carry bit out of the sign bit is discarded we can use bit operations to calculate the negative of an integer x.

The negative of a C x (e.g. of type int) is a C integer y (also type int in this case) such that

      x + y = 0
    

That is, if x + y = 0, then y is -x.

How do we find such a y?

Easy!

  1. First find an integer (z) such that

          x + z = 0xFFFFFFFF 
        

    Answer: z = ~x;

  2. But now, just add 1 since 0xFFFFFFFF + 1 is 0 (in C).

    	  x + (~x + 1) = (x + ~x) + 1 = 0xFFFFFFFF + 1 = 0
    	

    So this shows that -x can be calculated using only ~ and + operators:

    	  -x = ~x + 1
    	

Binary to Decimal Conversion signed integers[5] [top]

For char type (8 bits), the value of bit position k is 2k for all but the sign bit.

For the sign bit position k = 7, the value is negative:

      -27 = -128

For int type (32 bits), the sign bit is at position k = 31 and the value of that position is

      -231 = -2147483648.

Example

      char x = 0x83;
    

In binary x is

      x = 0x83 = 1 0 0 0  0 0 1 1
                 |            | |
               -128           2 1

In decimal x = -128 + 2 + 1 = -125
    

tmax[6] [top]

What is the largest 32 bit signed int? (I.e, what is tmax?)

In hex,

 Hex:     tmax = 0x7FFFFFF;
 Binary:  tmax = 0111 1111 1111 1111 1111 1111 1111 1111 
 Decimal: tmax = 2147483647 = 231 - 1
    

What is the hex representation of -2147483647?

The most negative signed integer has value -2147483648 (= -tmax - 1). It is called tmin

Note that because of overflow tmax + 1 is tmin.

What is ther hex respresentation of tmin?

Challenge Function[7] [top]

Write the function

 /**
  * Computes x / 2 (integer division)
  * Examples: div2( 6,2) =  3
  *           div2( 5,2) =  2
  *           div2(-6,2) = -3
  *           div2(-5,2) = -2
  * Restrictions: Use only int constants that fit in 1 byte
  *               Use only << >> & +
  * Max operators: 12
  */
 int div2(int x)
 {
   return 2;
 }
    

Finding the Answer[8] [top]

The answer is almost x >> 1.

If x is odd, then x / 2 mathematicall falls between two integers (7 / 2 is 3.5 which is between 3 and 4). Shifting to the right 1 bit will result in the smaller value 3. This is exactly what we want for positive integers.

But for x = -7, the value lies betwee -4 and -3. Shifting to the right 1 bit again gives the smaller value -4. But this is not what we want.

So the correct value for x / 2 could be computed by x >> 1 provided we add 1 when x is odd and negative.

If we could use 'if', and &&

Note x is odd if (x & 1) == 1

int div2(int x)
{    
 int y = x >> 1;
 if (x < 0 && ((x & 1) == 1) ) {
    y++;
 }
 return y;
}
      

Solution[9] [top]

The trick involves using x >> 31 as a mask with the bitwise and: &.

(x >> 31) is 0 if x >= 0

(x >> 31) is 0xFFFFFFFF if x >= 0

We can use this to add 1 to x >> 1 only when x is odd and negative.

Here is the solution using only the allowed bit operators.

 /**
  * Computes x / 2 (integer division)
  * Examples: div2( 6,2) =  3
  *           div2( 5,2) =  2
  *           div2(-6,2) = -3
  *           div2(-5,2) = -2
  * Restrictions: Use only int constants that fit in 1 byte
  *               Use only << >> & +
  * Max operators: 12
  */
 int div2(int x)
 {
   int xs = x >> 31; /* 0 if x >= 0 and 0xFFFFFFFF if x < 0 */
   int y = x >> 1;   /* x / 2 if x is >= 0 or if x is even */
   int z = x & 1;    /* 1 if x is odd; otherwise 0 */

   return (xs & (y + z)) + (~xs & y);
   /* return 2; */
 }
    

Logical Right Shift[10] [top]

Data lab function:

/*
 * logicalShift - shift x to the right by n, using a logical shift
 *   Can assume that 1 <= n <= 31
 *   Examples: logicalShift(0x87654321,4) = 0x08765432
 *   Legal ops: ~ & ^ | + << >>
 *   Max ops: 16
 *   Rating: 3
 */
int logicalShift(int x, int n) {
  return 2;
}

Signed Integer Conversions I[11] [top]

Signed Integer types are: char, short, int, long.

On many machines including our server, int and long are the same size: 4 bytes.

However, sizeof(char) = 1, sizeof(short) = 2, and sizeof(int) = 4 on our machine.

Smaller sizes can be assigned (converted) to larger sized integers by sign extension.

For example,

int x1;
int x2;
char c1 = 0x41; // decimal:   65; binary 0100 0001; so sign bit is 0
char c2 = 0x81; // decimal: -127; binary 1000 0001; so sign bit is 1
 
x1 = c1;        // binary: 0000 0000 0000 0000 0000 0000 0100 0001
                //    hex: 0x00000041; decimal: (still) 65

x2 = c2;        // binary: 1111 1111 1111 1111 1111 1111 1000 0001
                //    hex: 0xFFFFFF81; decimal (still) -127
    

Signed Integer Conversions II[12] [top]

C allows larger sized integers to be assigned to smaller sized integer types without a cast!

This can lead to loss of precision (i.e., a different value) if the larger sized integer doesn't 'fit' in the smaller size.

How does the conversion work? It simply drops the leading signficant bytes. For example:

char c1;  int x1 = 0x00000041; //   65
char c2;  int x2 = 0xFFFFFF81; // -127
char c3;  int x3 = 0x00000081; //  129
char c4;  int x4 = 0xFFFFFF41; // -191

c1 = x1; // c1 is now 0x41 or   65; int 65 'fits' in 8 bits (char)
c2 = x2; // c2 is now 0x81 or -127; int -127 'fits' in 8 bits (char)
c3 = x3; // c3 is now 0x81 or -127; int 129 does not 'fit' in 8 bits
c4 = x4; // c4 is now 0x41 or   65; int -191 does not 'fit' in 8 bits
    

An 32 bit int value will fit in 8 bits if (and only if)

 bits 31:8 are the same as bit 7      

0xFFFFFF81 = 1111 1111 1111 1111 1111 1111 1000 0001 (yes)

0xFFFFFF41 = 1111 1111 1111 1111 1111 1111 0000 0001 (no)

0xFFFF4F81 = 1111 1111 1111 1111 0100 1111 1000 0001 (no)

    

An int value in x will 'fit' in a char ch if x still has its original value after these assignments:

    ch = x;
    x = ch;
    

Unsigned Integer Conversion[13] [top]

Assigning a smaller unsigned integer to a larger unsigned integer type just pads with 0 bits.

Assigning a larger unsigned integer type to a smaller unsigned integer type just deletes the leading (more signficant) bytes.

So an unsigned int will 'fit' in an unsigned char only if the most significant three bytes are all 0's.

Mixing Signed and Unsigned Integers I[14] [top]

Assigning an unsigned integer to a signed integer or the reverse, simply copies the bytes without change.

The difference is how the same bits are now interpreted.

Example:

 unsigned char uc = 130; // 0x82
 char c;

 c = uc; // c is 0x82 (surprise? no), but as a signed char this is -126      
    

Mixing Signed and Unsigned Integers II[15] [top]

If a signed integer is compared with an unsigned integer, the signed integer is interpreted as unsigned and the comparison is done as betwween unsigned values.

Example:

 int x = -1;                  // 0xFFFFFFFF
 int y = 34;                  // 0x00000022
 unsigned int u = 35;         // 0x00000023      
 unsigned int v = 4026531840; // 0xF0000000
    
Comparison Computed as
Signed or Unsigned
1 (true) or 0 (false)
x < y signed 1
u < v unsigned 1
x < u unsigned 0
x < v unsigned 0
y < u unsigned 1
y < v unsigned 1

IEEE Floating Point Representation[16] [top]

The C int type represents signed integers using 2's complement binary representation.

The C unsigned int type represents unsigned integers, using binary representation.

The C float and double types represent real numbers using IEEE floating point bit reprensentation.

Typical sizes of these C types:

      type            size
      int             4 bytes
      unsigned int    4 bytes
      float           4 bytes
      double          8 bytes
    

Fractional Decimal Values[17] [top]

Each position in a decimal integer has a value that is a power of 10.

   n = 3   5   8  = 3*100 + 5*10 + 8*1
       |   |   |
      102 101 100
        
    

If a number has a decimal part, the positions to the right of the decimal are also powers of 10, but they are negative powers:

   n = 3   5   8  .  1   4  = = 3*100 + 5*10 + 8*1 + 1/10 + 4/100
       |   |   |     |   |
      102 101 100    10-1 10-2
        
    

Fractional Binary Values[18] [top]

Similarly a "binary point" can be used to represent fractional values in binary:

   n = 1   1   0  = 1*4 + 1*2 + 0*1 = 6
       |   |   |
      22   21  20
        
    

If a number has a fractional part, the positions to the right of the binary point are also powers of 2, but they are negative powers:

   n = 1   1   0  .  1    1  =  1*4 + 1*2 + 0*1 + 1/2 + 1/4 = 6.75
       |   |   |     |    |
      22   21  20    2-1  2-2
        
    

Floating Point Example[19] [top]

If IEEE floating point representation is used with

   1 bit sign field         s
   8 bit exponent field     e
  23 bit fraction field     f
  --
  32 bits

For the value x = 100.5 (decimal)

the floating point representation requires

 the value 100.5 = (-1)s * 2E * M

 where M (the significand) is either in the range 

    1 <= M < 2  (normalized) 

or 

    0 <= M < 1 (denormalized)

(a) What are the values of E and M?

(b) What is the bit representation of x; that is, what are the s, e,
and f fields?

100.5 = 64 + 32 + 4 + 1/2= 26 + 25 + 22 + 2-1

So, 100.5 (decimal) can be in expressed in binary: 

100.510 = 1100100.12

Floating point representation of 100.5[20] [top]

To represent the binary number 100.5, its binary form 1100100.1 can be written in IEEE
standard form (-1)s * M * 2E as

                (-1)0 * 1.1001001 * 26

So s = 0, E = 6, and M = 1.1001001


IEEE Floating Point[21] [top]

IEEE represents 3 special values +inf, -inf, and NaN which are determined "specially" by the bit fields s, e, and f.

0 and extremely small IEEE floating point numbers are represented as denormalized values.

All other floating point values are represented as normalized values.

Denormalized
s 00000000 f
Normalized
s 0<e<255 f
Infinity
s 11111111 00000000000000........00
NaN
s 11111111 f not all 0's