When adding 8 bit unsigned integers (type unsigned char),
overflow occurs if there is a carry out of bit position 7.
But the carry is simply discarded.
What is the value of the carry into the 8th
position? Answer: 28 = 256.
For mathematical binary addition, overflow doesn't occur. Mathematically, there
is no largest integer.
If overflow occurs addition using 8 bit unsigned integers,
discarding the carry is the same as subtracting 256 from the
mathematical addition result.
Examples
unsigned char x |
unsigned char y |
unsigned char z = x + y (mathematically) |
(z <= 255)? z : z - 256 |
254 |
1 |
255 |
255 |
254 |
2 |
256 |
0 (= 256 - 256) |
254 |
3 |
257 |
1 (= 257 - 256) |
Overflow and (8 bit) signed integers[2] [top]
Since the 256 possible bit patterns for 8 bits (28 =
256) are to represent both positive and negative values, half of
them must be negative.
So the range is
-128 to +127
So overflow occurs if mathematically a sum is > 127 or <
-128.
For the 8 bit signed type char,
binary addition works the same way as for unsigned char.
That is, if there is a carry out of the 7th bit, it
is discarded.
This carry in to the 8th bit position has value 256
(= 28). So discarding it is the same mathematically
as subtracting 256.
Examples
char x |
char y |
char z = x + y (mathematically) |
(z <= 127)? z : z - 256 |
126 |
1 |
127 |
127 |
126 |
2 |
128 |
-128 (= 128 - 256) |
126 |
3 |
129 |
-127 (= 129 - 256) |
(If z is negative mathematically, then as signed char, the
value is (z >= -128) ? z : z + 256.)
For signed integers, the left most bit position determines
whether the integer is negative or not. This is called the sign
bit.
For the char type the sign bit position is 7.
For the int type the sign bit position is 31.
8 Bits |
As unsigned char |
As char |
0000 0000 |
0 |
0 |
0000 0001 |
1 |
1 |
. |
. |
. |
. |
. |
. |
. |
. |
. |
0111 1111 |
127 |
127 |
1000 0000 |
128 |
-128 |
1000 0001 |
129 |
-127 |
. |
. |
. |
. |
. |
. |
. |
. |
. |
1111 1111 |
255 |
-1 |
Because C integer types use a fixed number of bits, they are
not the same as mathematical integers which are not bounded.
Since the carry bit out of the sign bit is discarded we can use
bit operations to calculate the negative of an integer x.
The negative of a C x (e.g. of type int) is a C integer y
(also type int in this case) such that
x + y = 0
That is, if x + y = 0, then y is -x.
How do we find such a y?
Easy!
-
First find an integer (z) such that
x + z = 0xFFFFFFFF
Answer: z = ~x;
-
But now, just add 1 since 0xFFFFFFFF + 1 is 0 (in C).
x + (~x + 1) = (x + ~x) + 1 = 0xFFFFFFFF + 1 = 0
So this shows that -x can be calculated using only ~ and + operators:
-x = ~x + 1
Binary to Decimal Conversion signed integers[5] [top]
For char type (8 bits), the value of bit position k is
2k for all but the sign bit.
For the sign bit position k = 7, the value is negative:
-27 = -128
For int type (32 bits), the sign bit is at position k = 31 and the
value of that position is
-231 = -2147483648.
Example
char x = 0x83;
In binary x is
x = 0x83 = 1 0 0 0 0 0 1 1
| | |
-128 2 1
In decimal x = -128 + 2 + 1 = -125
What is the largest 32 bit signed int? (I.e, what is
tmax?)
In hex,
Hex: tmax = 0x7FFFFFF;
Binary: tmax = 0111 1111 1111 1111 1111 1111 1111 1111
Decimal: tmax = 2147483647 = 231 - 1
What is the hex representation of -2147483647?
The most negative signed integer has value -2147483648 (= -tmax
- 1). It is called tmin
Note that because of overflow tmax + 1 is tmin.
What is ther hex respresentation of tmin?
Write the function
/**
* Computes x / 2 (integer division)
* Examples: div2( 6,2) = 3
* div2( 5,2) = 2
* div2(-6,2) = -3
* div2(-5,2) = -2
* Restrictions: Use only int constants that fit in 1 byte
* Use only << >> & +
* Max operators: 12
*/
int div2(int x)
{
return 2;
}
The answer is almost x >> 1.
If x is odd, then x / 2 mathematicall falls between two
integers (7 / 2 is 3.5 which is between 3 and 4). Shifting to the right 1 bit will result
in the smaller value 3. This is exactly what we want for positive
integers.
But for x = -7, the value lies betwee -4 and -3. Shifting to
the right 1 bit again gives the smaller value -4. But this is not
what we want.
So the correct value for x / 2 could be computed by x >>
1 provided we add 1 when x is odd and negative.
If we could use 'if', and &&
Note x is odd if (x & 1) == 1
int div2(int x)
{
int y = x >> 1;
if (x < 0 && ((x & 1) == 1) ) {
y++;
}
return y;
}
The trick involves using x >> 31 as a mask
with the bitwise and: &.
(x >> 31) is 0 if x >= 0
(x >> 31) is 0xFFFFFFFF if x >= 0
We can use this to add 1 to x >> 1 only when x is odd and
negative.
Here is the solution using only the allowed bit operators.
/**
* Computes x / 2 (integer division)
* Examples: div2( 6,2) = 3
* div2( 5,2) = 2
* div2(-6,2) = -3
* div2(-5,2) = -2
* Restrictions: Use only int constants that fit in 1 byte
* Use only << >> & +
* Max operators: 12
*/
int div2(int x)
{
int xs = x >> 31; /* 0 if x >= 0 and 0xFFFFFFFF if x < 0 */
int y = x >> 1; /* x / 2 if x is >= 0 or if x is even */
int z = x & 1; /* 1 if x is odd; otherwise 0 */
return (xs & (y + z)) + (~xs & y);
/* return 2; */
}
Data lab function:
/*
* logicalShift - shift x to the right by n, using a logical shift
* Can assume that 1 <= n <= 31
* Examples: logicalShift(0x87654321,4) = 0x08765432
* Legal ops: ~ & ^ | + << >>
* Max ops: 16
* Rating: 3
*/
int logicalShift(int x, int n) {
return 2;
}
Signed Integer types are: char, short, int, long.
On many machines including our server, int and long are the
same size: 4 bytes.
However, sizeof(char) = 1, sizeof(short) = 2, and sizeof(int) =
4 on our machine.
Smaller sizes can be assigned (converted) to larger sized
integers by sign extension.
For example,
int x1;
int x2;
char c1 = 0x41; // decimal: 65; binary 0100 0001; so sign bit is 0
char c2 = 0x81; // decimal: -127; binary 1000 0001; so sign bit is 1
x1 = c1; // binary: 0000 0000 0000 0000 0000 0000 0100 0001
// hex: 0x00000041; decimal: (still) 65
x2 = c2; // binary: 1111 1111 1111 1111 1111 1111 1000 0001
// hex: 0xFFFFFF81; decimal (still) -127
C allows larger sized integers to be assigned to smaller sized
integer types without a cast!
This can lead to loss of precision (i.e., a different value) if
the larger sized integer doesn't 'fit' in the smaller size.
How does the conversion work? It simply drops the leading
signficant bytes. For example:
char c1; int x1 = 0x00000041; // 65
char c2; int x2 = 0xFFFFFF81; // -127
char c3; int x3 = 0x00000081; // 129
char c4; int x4 = 0xFFFFFF41; // -191
c1 = x1; // c1 is now 0x41 or 65; int 65 'fits' in 8 bits (char)
c2 = x2; // c2 is now 0x81 or -127; int -127 'fits' in 8 bits (char)
c3 = x3; // c3 is now 0x81 or -127; int 129 does not 'fit' in 8 bits
c4 = x4; // c4 is now 0x41 or 65; int -191 does not 'fit' in 8 bits
An 32 bit int value will fit in 8 bits if (and only if)
bits 31:8 are the same as bit 7
0xFFFFFF81 = 1111 1111 1111 1111 1111 1111 1000 0001 (yes)
0xFFFFFF41 = 1111 1111 1111 1111 1111 1111 0000 0001 (no)
0xFFFF4F81 = 1111 1111 1111 1111 0100 1111 1000 0001 (no)
An int value in x will 'fit' in a char ch if x
still has its original value after these assignments:
ch = x;
x = ch;
Assigning a smaller unsigned integer to a larger unsigned
integer type just pads with 0 bits.
Assigning a larger unsigned integer type to a smaller unsigned
integer type just deletes the leading (more signficant) bytes.
So an unsigned int will 'fit' in an unsigned char only if the
most significant three bytes are all 0's.
Assigning an unsigned integer to a signed integer or the
reverse, simply copies the bytes without change.
The difference is how the same bits are now interpreted.
Example:
unsigned char uc = 130; // 0x82
char c;
c = uc; // c is 0x82 (surprise? no), but as a signed char this is -126
If a signed integer is compared with an unsigned integer, the
signed integer is interpreted as unsigned and the comparison is
done as betwween unsigned values.
Example:
int x = -1; // 0xFFFFFFFF
int y = 34; // 0x00000022
unsigned int u = 35; // 0x00000023
unsigned int v = 4026531840; // 0xF0000000
Comparison |
Computed as Signed or Unsigned |
1 (true) or 0 (false) |
x < y |
signed |
1 |
u < v |
unsigned |
1 |
x < u |
unsigned |
0 |
x < v |
unsigned |
0 |
y < u |
unsigned |
1 |
y < v |
unsigned |
1 |
The C int type represents signed integers using 2's
complement binary representation.
The C unsigned int type represents unsigned integers, using
binary representation.
The C float and double types represent real
numbers using IEEE floating point bit reprensentation.
Typical sizes of these C types:
type size
int 4 bytes
unsigned int 4 bytes
float 4 bytes
double 8 bytes
Each position in a decimal integer has a value that is a power
of 10.
n = 3 5 8 = 3*100 + 5*10 + 8*1
| | |
102 101 100
If a number has a decimal part, the positions to the right of
the decimal are also powers of 10, but they are negative
powers:
n = 3 5 8 . 1 4 = = 3*100 + 5*10 + 8*1 + 1/10 + 4/100
| | | | |
102 101 100 10-1 10-2
Similarly a "binary point" can be used to represent fractional
values in binary:
n = 1 1 0 = 1*4 + 1*2 + 0*1 = 6
| | |
22 21 20
If a number has a fractional part, the positions to the right of
the binary point are also powers of 2, but they are negative
powers:
n = 1 1 0 . 1 1 = 1*4 + 1*2 + 0*1 + 1/2 + 1/4 = 6.75
| | | | |
22 21 20 2-1 2-2
If IEEE floating point representation is used with
1 bit sign field s
8 bit exponent field e
23 bit fraction field f
--
32 bits
For the value x = 100.5 (decimal)
the floating point representation requires
the value 100.5 = (-1)s * 2E * M
where M (the significand) is either in the range
1 <= M < 2 (normalized)
or
0 <= M < 1 (denormalized)
(a) What are the values of E and M?
(b) What is the bit representation of x; that is, what are the s, e,
and f fields?
100.5 = 64 + 32 + 4 + 1/2= 26 + 25 + 22 + 2-1
So, 100.5 (decimal) can be in expressed in binary:
100.510 = 1100100.12
To represent the binary number 100.5, its binary form 1100100.1 can be written in IEEE
standard form (-1)s * M * 2E as
(-1)0 * 1.1001001 * 26
So s = 0, E = 6, and M = 1.1001001
IEEE represents 3 special values +inf, -inf, and NaN which are
determined "specially" by the bit fields s, e, and f.
0 and extremely small IEEE floating
point numbers are represented as denormalized values.
All other floating point values are represented as normalized values.
Denormalized |
s |
00000000 |
f |
Infinity |
s |
11111111 |
00000000000000........00 |
NaN |
s |
11111111 |
f not all 0's |