- Computer Systems I (CSC373)
- Instructor: Glenn Lancaster
- Labs
- Linux environment
- Introduction to C programming
- Introduction to bit operations in C
- All programming will be in C
- Linux accounts on cdmlinux.cdm.depaul.edu
- Online Linux tutorials and auxilliary text
- Data lab
- Bomb lab
- Buffer Overflow lab
Each lab is challenging and will increase your deeper
understanding of hardware and system software.
Start early! Each lab has multiple parts with separate due dates for
each part.
-
Consists of 15 puzzles to be solved by using mostly C bit
operations.
-
You must write 15 functions.
-
Each function is required to compute a specified expression
using only a restricted set of operations (mostly bit
operators).
What you will get out of doing this lab:
- Deeper knowledge of hex notation, C bit operations
- Introduction to C programming
- Using Linux build utilities
- How integers are represented at the bit level
You will be given an executable program but not the C source
code.
The program prompts for input (6 times).
If you type the wrong input, the program explodes!
You will use various Linux tools: a debugger, object dump,
etc. to determine what input is expected.
What you will get out of this lab:
- An understanding of Intel assembler instructions
- What assembler code is generated by C program elements.
- How parameters are passed to functions on a stack
- How and where information is stored on the stack at each
function call
- How integer data is manipulated by assembler code.
You will be given code for which you will mount various
attacks to cause the program to execute differently and/or to
execute unrelated code.
What you will get out of this lab
- Understanding of buffer overflow attacks
- An understanding of call stack organization and ways it is
susceptible to attacks
- An understanding of how some C library functions are susceptible
to buffer overflow attacks.
- Basic defenses against buffer overflow attacks.
- statement syntax same: if, for, while, return, break,
continue, etc.
- some operators are the same: +, -, * , /, %, ++, --, ?:
- some basic data types are the same: int, float, double,
char
- comment syntax same
- C has no class types
- C variables can be used even though not initialized
- C has pointer types and operators on pointers.
- C can allocate new memory with library function 'malloc',
specifying a number of bytes, not a data type; Java allocates new memory with
the new operator yielding an instance of some data type
- C has no garbage collector; memory allocated with malloc,
must be explicitly released with the library function
'free'.
- C function declarations can be separate from the implementations
There have been several C "standards". You will likely find
a few surprises:
1
2 int main()
3 {
4 for(int i = 1; i <= 5; i++) { // ERROR unless compiled with -std=c99
5 printf("Hello\n");
6 }
7 return 0;
8 }
Line 4 error: 'for' loop initial declaration used outside C99
mode
The loop variable must be declared before the for loop. E.g at
line 4:
1
2 int main()
3 {
4 int i; // ERROR FIXED
5 for(i = 1; i <= 5; i++) {
6 printf("Hello\n");
7 }
8 return 0;
9 }
File name: ex1.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main()
5 {
6 int x, y;
7 int max, min;
8 int status;
9
10 printf("\nEnter two integers separated by whitespace: ");
11 status = scanf("%d%d", &x, &y);
12
13 max = (x < y) ? y : x;
14 min = (x < y) ? x : y;
15
16 printf("You input values %d and %d\n", x, y);
17 printf("Max = %d, Min = %d\n\n", max, min);
18
19 return 0;
20 }
#include <stdio.h>
#include <stdlib.h>
- Header files typically contain declarations (not the implementation)
of functions
- stdio.h has declarations of standard i/o functions such as:
printf and scanf
- stdlib.h has declarations of other C library
functions such as: malloc, free, exit, rand, etc.
int main()
{
...
return 0;
}
Every C program has a function named 'main' with int return type
By convention a return value of 0 means 'success'
printf("You input values %d and %d\n", x, y);
If the value of x is 5 and y is 10, this printf statement
will output:
You input values 5 and 10
- "You input values %d and %d\n" is the format
string
- %d is a format specification
- The two format specifications %d are replaced in left to
right order by the remaining values in the
argument list: x and y
- The value to replace a %d conversion specification should be an integer.
Other conversion specifications:
Conversion spec |
Expected value type |
%d |
int |
%f |
float or double |
%c |
char |
%s |
char array1 |
1For the %s format, the char array is expected to have an entry
containing the null character, '\0', that indicates the end of
the character string. The printf function keeps printing
successive characters until it finds this null character.
A pointer value is a memory address that holds some a value
of some type
An address of an int location is a "pointer to an int".
An address of a float location is a "pointer to a float".
The & operator can be applied to
variables (and expressions that are bound to accessible memory
locations). The result is the memory address.
If x is of type int, then
&x is of type pointer to
int.
Variables can be declared of type pointer to int:
int * p; // p is of type pointer to int
and be assigned an address:
int x = 5;
int * p;
p = &x;
If y is of type int and p is of type
pointer to int, p should not be assigned to y. The types
don't match:
Example:
int x = 5;
int y;
int * p = &x;
y = p; // NO! y is an int; p's value is the address of x, not
// the int value of x.
p = 6; // NO! 6 is an int, not a pointer to int;
Dereferencing operator *[18] [top]
When used as a prefix operator, the *
symbol is the dereferencing operator.
It should be applied to a pointer variable (or a pointer
expression).
The result is an alias for what p is pointing to.
So *p in the example above is an
alias for x.
Corrected Example:
int x = 5;
int y;
int * p = &x;
y = *p; // This is the same as y = x;
// It assigns x's value 5 to y.
*p = 6; // This is the same as x = 6;
// p's value is unchanged; it is still the address of x
// This changes x's value to be 6.
status = scanf("%d%d", &x, &y);
scanf reads from standard input
(1) reads data from the standard input, skipping leading
whitespace;
(2) converts the input
to the appropriate type according to given format specification,
and
(3) stores the result into corresponding
address location.
%d in the format string means read characters from input
and convert to an integer and store at the location specified
by the next argument.
&x evaluates to the
address of the int variable x; so the converted input value will
be stored in x.
this scanf will read two character strings of digits,
convert them to two binary integers and store these integers
in variables x and y
the return value of scanf is the number of successful
input conversions;
in this example the variable status will be assigned 2 if both
inputs are valid integers.
1.
Enter two integers separated by whitespace: 5 10
You input values 5 and 10
Max = 10, Min = 5
2.
Enter two integers separated by whitespace: 3.4 19
You input values 3 and 2862629
Max = 2862629, Min = 3
In example 2, only the first conversion succeeds (status is 1).
3 is stored in x. The second conversion fails since an
integer can't start with "."
So y is not initialized.
No compile error or run time exception is thrown in C.
So x and y are printed, but the value of y is whatever
happend to be in the memory location for y.
Run preprocessor to proces preprocessor statements; e.g.,
#include statements to get a file ex1.i
Run the compiler on ex1.i to get an "assembler" file ex1.s containing
symbolic machine instructions, but not the code for library functions -
printf and scanf.
Run the assembler on ex1.s to get an object file ex1.o of
machine instruction where instruction names are replace by
numeric codes and labels are replaced by addresses or offsets.
Run the linker on ex.o to combine it with the object files
in the c library (such as printf.o) to get an executable file ex1.
All steps are handled by the gcc script. In spite of being a script
gcc is usually referred to somewhat inaccurately as the compiler.
gcc -Wall ex1.c -o ex1
Option -Wall means print all warnings
Option -o ex1 means to name the output executable file to be 'ex1'
There are several editors installed on the Linux machine with
your accounts.
emacs provides facilities for editing, compiling, correcting errors
nano is the simpler to learn and use than emacs, but also has
the capability of compiling and correcting errors although not quite as conveniently as emacs
vim is improved vi; vim is
only suggested in case you are already familiar with it or with
vi.
The Documents page on the class web site has links to guides
to using emacs or nano.
Intel processor instructions for arithmetic operations expect
integers to be represented in a particular way in memory.
The int type typically is expected to be represented by a
sequence of 32 1's and 0's; that is, 32 bits.
Representation of integers using only 1's and 0's means that
the (decimal) value of each position is a power of 2: 1, 2, 4, 8, 16,
etc. instead of 1, 10, 100, 1000, etc.
Binary: 1 0 1 1 = 1 * 8 + 0 * 4 + 1 * 2 + 1 * 1 = 11 (in decimal)
Position value: 8 4 2 1
Decimal: 1 0 1 1 = 1 * 1000 + 0 * 100 + 1 * 10 + 1 * 1 = 1011 (in decimal)
Position value: 1000 100 10 1
With 4 bits, there are 16 (= 2 x 2 x 2 x 2) possible combinations of 0's and
1's in each of the 4 positions.
-
There are 16 hexadecimal 'digits'
-
The first 10 are the usual digits: 0 - 9
-
The last 6 are the first 6 letters of the alphabet: A -
F
-
The decimal value of A is 10; B is 11 decimal, etc.
The 16 values 0 - 15 are each represented by 1 hex
"digit".
4 binary digits are needed to represent the same 16
values.
Hex values are preceded by 0x as in C.
Hex |
Decimal |
Binary |
0x0 |
0 |
0000 |
0x1 |
1 |
0001 |
0x2 |
2 |
0010 |
0x3 |
3 |
0011 |
0x4 |
4 |
0100 |
0x5 |
5 |
0101 |
0x6 |
6 |
0110 |
0x7 |
7 |
0111 |
0x8 |
8 |
1000 |
0x9 |
9 |
1001 |
0xA |
10 |
1010 |
0xB |
11 |
1011 |
0xC |
12 |
1100 |
0xD |
13 |
1101 |
0xE |
14 |
1110 |
0xF |
15 |
1111 |
C int type is (typically) represented using 32 bits (32 1's and
0's).
Since each hex digit is equivalent to 4 bits, an int can also
be represented with only 8 hex 'digits' ( 4 * 8 = 32).
Converting hex to binary is easy:
In C programs, hex integers are indicated by prefix 0x or 0X
Example:
Write 0x5CA in binary (should be 3 hex digits * 4 bits/hex
digit = 12 bits)
Convert hex 0x5CA to binary
(1) Convert each hex digit to the corresponding 4 binary bits:
Hex: 0x 5 C A
Binary: 0101 1101 1010
Answer: 010111011010
Converting binary to hex is just the reverse:
Example:
Write the binary string 10011110000111 in hex
(1) Separate 10011110001111 into groups of 4 (starting from the right):
10 0111 1000 1111
Add leadding 0's if necessary so the leftmost group also has 4 bits
0010 0111 1000 1111
(2) Replace each group of 4 bits by the corresponding hex 'digit'
and use 0x indicate this is a hex value:
Binary: 0010 0111 1000 1111
Hex 2 7 8 F
Answer: 0x278F
Fill in the missing entries in the following table:
Binary |
Hex |
10001100 |
0x8C |
|
0x18 |
01000110 |
|
|
0xAB |
01111111 |
|
Converting binary to decimal requires knowing the position
value of each bit.
The right most bit is position 0:
For an 8 bit binary value
Binary: 0 1 0 0 1 0 1 1
Position: 7 6 5 4 3 2 1 0
Position Value: 128 64 32 16 8 4 2 1
The value of bit position k is 2k
The decimal value of binary 01001011 is the sum of the
positions values of the non-zero bits: 64 + 8 + 2 + 1 = 75
Converting hex to decimal requires knowing the position
value of each hex digit.
The right most hex digit is again position 0:
For the 3 hex digit value 0x1AC
Hex: 1 A C
Position: 2 1 0
Position Value: 256 16 1
The value of hex position k is 16k: (162 = 256)
The decimal value of hex 0x1AC is the sum of the
positions values of the non-zero hex multipllied by the value of the hex digit at that
position: 256 * 1 + 16 * 10 + 1 * 12 + 1 = 428
(0x1 = 1, 0xA = 10 , 0xC = 12)
Operations
AND, OR, and XOR (exclusive OR), and NOT are defined, thinking of
1 as true and 0 as false.
So, for example, 1 AND 1 is 1, while 1 AND 0 is 0.
p |
q |
p AND q |
0 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
1 |
1 |
p |
q |
p OR q |
0 |
0 |
0 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
1 |
p |
q |
p XOR q |
0 |
0 |
0 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
Curious facts:
-
NOT p is a unary operator.
There are exactly 2 unary operators.
-
p AND q, p OR q, and p XOR q are binary operators.
There are exactly 16 different possible binary
operators: p OP q.
However, each such binary operator can easily be expressed using a
combination of only the AND, OR, and NOT operators.
Important: The C bit operators act on
integers!
They are called bit operators because they act on each pair of
bits at the same position of the two integers operands.
Bit operation (at each bit position) |
C Notation |
AND |
a & b |
OR |
a | b |
XOR |
a ^ b |
Not |
~a |
Examples (for 4 bit integers)
Fill in the missing entries for this example with 8 bit values.
y |
11011001 |
0x0 |
00000000 |
y ^ 0x0 |
|
y |
11011001 |
~y |
|
0xFF |
11111111 |
y ^ 0xFF |
|
x |
y |
~x |
~y |
x & y |
~(x & y) |
~x | ~y |
~~x |
0 |
0 |
1 |
1 |
0 |
1 |
1 |
0 |
0 |
1 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
0 |
0 |
1 |
0 |
1 |
1 |
1 |
1 |
1 |
0 |
0 |
1 |
0 |
0 |
1 |
This verifies the first version of DeMorgan's Law below and
the idempotent law (~~x = x) for single bit values x and y.
~(x & y) = (~x) | (~y)
~(x | y) = (~x) & (~y)
~~x = x
These DeMorgan's Laws for single bit values are analogous the following version applied
to logical statements p, q with values true
corresponding to 1 and false corresponding to 0:
The following are logical statements equivalent
- It is not the case that both p and q are
true
- Either p is not true or q is not true
Symbolically
NOT (p AND q) = (NOT p) OR (NOT q)
and similarly
NOT (p OR q) = (NOT p) AND (NOT q)
The bit operations {~, &, |, ^} in C
operate on integers at each bit of the integer.
So the DeMorgan's Identities just verified for single bit
operands still hold for integers:
If x and y are C integers then
~(x & y) = (~x) | (~y)
~(x | y) = (~x) & (~y)
~~x = x
Example with 8 bit integers x and y
Verifying that ~(x & y) = (~x) | (~y)
x |
0110 0011 |
y |
1101 0111 |
x & y |
0100 0011 |
~x |
1001 1100 |
~y |
0010 1000 |
(~x) | (~y) |
1011 1100 |
~(x & y) |
1011 1100 |
If x is an integer, the unary C operator !x is defined as
!x = 1 if x is 0
!x = 0 if x is not 0
Like the bit operators, bang acts on integers with an integer
result, but unlike the bit operators, it uses the whole integer
value, not each separate bit.
What are the following expressions where x is an integer
value ( 4 bytes or equivalently, 8 hex digits)
!!x |
|
(if x is 0) |
!!x |
|
(if x is not 0) |
x ^ x |
|
|
!(x ^ x) |
|
|
The binary representation already discussed is not quite
sufficient for C, since we also need to represent negative
integers as sequences of bits. (Can't put a minus sign in
between bits in memory, just 0's and 1's.)
C also needs to represent values between 0 and 1, etc.
C numeric types:
Integers (signed: positive, negative and 0)
Unsigned integers (only 0 and positive)
Floating point numbers (single and double sizes)
type typical size value range
char 1 byte (8 bits) -128 to 127
unsigned char 1 byte 0 to 255
short 2 bytes -32768 to 32767
unsigned short 2 bytes 0 to 65535
int 4 bytes -2147483648 to 2147483647
unsigned int 4 bytes 0 to 4294967295
long 4 bytes -2147483648 to 2147483647
unsigned long 4 bytes 0 to 4294967295
char x;
x = 12;
Storage allocated for x: 1 byte (8 bits)
Binary Decimal Hexadecimal
00001100 12 0C
Consider the integer type: unsigned char with values: 0 to
255.
What is 255 + 1 as an unsigned char? Not 256.
255 + 1 is out of range - overflow.
But no error occurs, it "wraps around" to get the next value,
which is 0:
0 1 2 ... 254 255
The mod operator, % provides a way of calculating how C
computes values of unsigned integers that overflow.
There are 256 possible unsigned char values. The mod operator
x % 256 will give a value in the range 0 to 255. Just what is
required of the unsigned char type.
(255 + 1) % 256 = 0
1 #include <stdio.h>
2 #include <stdlib.h>
3
4
5
6 int main()
7 {
8 unsigned char x, y;
9 unsigned char z;
10 x = 250;
11 y = 16;
12 z = x + y;
13
14 printf("%u + %u = %u\n", x, y, z);
15
16 return 0;
17
18 }
Output:
250 + 16 = 10
Notes:
The %u is a conversion specification to output z
as an unsigned integer.
This program compiles and runs with no errors reported even though
the 10 result is clearly wrong mathematically.
This just emphasizes the fact that arithmetic for these integer
data types does not obey the same rules as ordinary integer
arithmetic.
How does a programmer write code to check whether overflow may
occur for these data types?
More on this later!