373 Lec1

Data lab
Bomb lab
Buffer Overflow lab

Each lab is challenging and will increase your deeper understanding of hardware and system software.

Start early! Each lab has multiple parts with separate due dates for each part.

Data Lab [5] [top]

Consists of 15 puzzles to be solved by using mostly C bit operations.
You must write 15 functions.
Each function is required to compute a specified expression using only a restricted set of operations (mostly bit operators).

What you will get out of doing this lab:

Deeper knowledge of hex notation, C bit operations
Introduction to C programming
Using Linux build utilities
How integers are represented at the bit level

Bomb Lab [6] [top]

You will be given an executable program but not the C source code.

The program prompts for input (6 times).

If you type the wrong input, the program explodes!

You will use various Linux tools: a debugger, object dump, etc. to determine what input is expected.

What you will get out of this lab:

An understanding of Intel assembler instructions
What assembler code is generated by C program elements.
How parameters are passed to functions on a stack
How and where information is stored on the stack at each function call
How integer data is manipulated by assembler code.

Buffer Overflow Lab [7] [top]

You will be given code for which you will mount various attacks to cause the program to execute differently and/or to execute unrelated code.

What you will get out of this lab

Understanding of buffer overflow attacks
An understanding of call stack organization and ways it is susceptible to attacks
An understanding of how some C library functions are susceptible to buffer overflow attacks.
Basic defenses against buffer overflow attacks.

C and Java similarities [8] [top]

statement syntax same: if, for, while, return, break, continue, etc.
some operators are the same: +, -, * , /, %, ++, --, ?:
some basic data types are the same: int, float, double, char
comment syntax same

Some C and Java differences [9] [top]

C has no class types
C variables can be used even though not initialized
C has pointer types and operators on pointers.
C can allocate new memory with library function 'malloc', specifying a number of bytes, not a data type; Java allocates new memory with the new operator yielding an instance of some data type
C has no garbage collector; memory allocated with malloc, must be explicitly released with the library function 'free'.
C function declarations can be separate from the implementations

C Pitfalls (if you know Java or C++)[10] [top]

There have been several C "standards". You will likely find a few surprises:

    1	
    2	int main()
    3	{
    4	   for(int i = 1; i <= 5; i++) { // ERROR unless compiled with -std=c99
    5	      printf("Hello\n");
    6	   }
    7	   return 0;
    8	}

Line 4 error: 'for' loop initial declaration used outside C99 mode

The loop variable must be declared before the for loop. E.g at line 4:

    1	
    2	int main()
    3	{
    4	  int i;  // ERROR FIXED
    5	  for(i = 1; i <= 5; i++) {
    6	     printf("Hello\n");
    7	  }
    8	  return 0;
    9	}

Code Example [11] [top]

File name: ex1.c

    1	#include <stdio.h>
    2	#include <stdlib.h>
    3	
    4	int main()
    5	{
    6	  int x, y;
    7	  int max, min;
    8	  int status;
    9	
   10	  printf("\nEnter two integers separated by whitespace: ");
   11	  status = scanf("%d%d", &x, &y);
   12	  
   13	  max = (x < y) ? y : x;
   14	  min = (x < y) ? x : y;
   15	
   16	  printf("You input values %d and %d\n", x, y);
   17	  printf("Max = %d, Min = %d\n\n", max, min);
   18	
   19	  return 0;
   20	}

Header Files [12] [top]

#include <stdio.h>
#include <stdlib.h>

Header files typically contain declarations (not the implementation) of functions
stdio.h has declarations of standard i/o functions such as: printf and scanf
stdlib.h has declarations of other C library functions such as: malloc, free, exit, rand, etc.

main [13] [top]

int main()
{
  ...
  return 0;
}

Every C program has a function named 'main' with int return type

By convention a return value of 0 means 'success'

Formatted printing [14] [top]


printf("You input values %d and %d\n", x, y);

If the value of x is 5 and y is 10, this printf statement will output:

You input values 5 and 10

"You input values %d and %d\n" is the format string
%d is a format specification
The two format specifications %d are replaced in left to right order by the remaining values in the argument list: x and y
The value to replace a %d conversion specification should be an integer.

Other conversion specifications:

Conversion spec	Expected value type
%d	int
%f	float or double
%c	char
%s	char array¹

¹For the %s format, the char array is expected to have an entry containing the null character, '\0', that indicates the end of the character string. The printf function keeps printing successive characters until it finds this null character.

Pointers [15] [top]

A pointer value is a memory address that holds some a value of some type

An address of an int location is a "pointer to an int".

An address of a float location is a "pointer to a float".

Address Operator and pointer Variables [16] [top]

The & operator can be applied to variables (and expressions that are bound to accessible memory locations). The result is the memory address.

If x is of type int, then &x is of type pointer to int.

Variables can be declared of type pointer to int:

      int * p; // p is of type pointer to int

and be assigned an address:

      int x = 5;
      int * p;

      p = &x;

Don't Mix Types [17] [top]

If y is of type int and p is of type pointer to int, p should not be assigned to y. The types don't match:

Example:

      int x = 5;
      int y;
      int * p = &x;

      y = p;  // NO! y is an int; p's value is the address of x, not
              // the int value of x.

      p = 6;  // NO! 6 is an int, not a pointer to int;

Dereferencing operator *[18] [top]

When used as a prefix operator, the * symbol is the dereferencing operator.

It should be applied to a pointer variable (or a pointer expression).

The result is an alias for what p is pointing to.

So *p in the example above is an alias for x.

Corrected Example:

      int x = 5;
      int y;
      int * p = &x;

      y = *p; // This is the same as y = x;
              // It assigns x's value 5 to y.

      *p = 6; // This is the same as x = 6;
              // p's value is unchanged; it is still the address of x
              // This changes x's value to be 6.

Input [19] [top]

 status = scanf("%d%d", &x, &y);

scanf reads from standard input
(1) reads data from the standard input, skipping leading whitespace;
(2) converts the input to the appropriate type according to given format specification, and
(3) stores the result into corresponding address location.
%d in the format string means read characters from input and convert to an integer and store at the location specified by the next argument.
&x evaluates to the address of the int variable x; so the converted input value will be stored in x.
this scanf will read two character strings of digits, convert them to two binary integers and store these integers in variables x and y
the return value of scanf is the number of successful input conversions;
in this example the variable status will be assigned 2 if both inputs are valid integers.

Sample executions [20] [top]

Enter two integers separated by whitespace: 5 10
You input values 5 and 10
Max = 10, Min = 5

Enter two integers separated by whitespace: 3.4 19
You input values 3 and 2862629
Max = 2862629, Min = 3

In example 2, only the first conversion succeeds (status is 1).

3 is stored in x. The second conversion fails since an integer can't start with "."

So y is not initialized.

No compile error or run time exception is thrown in C.

So x and y are printed, but the value of y is whatever happend to be in the memory location for y.

Building the executable ex1 from ex1.c [21] [top]

Run preprocessor to proces preprocessor statements; e.g., #include statements to get a file ex1.i
Run the compiler on ex1.i to get an "assembler" file ex1.s containing symbolic machine instructions, but not the code for library functions - printf and scanf.
Run the assembler on ex1.s to get an object file ex1.o of machine instruction where instruction names are replace by numeric codes and labels are replaced by addresses or offsets.
Run the linker on ex.o to combine it with the object files in the c library (such as printf.o) to get an executable file ex1.

All steps are handled by the gcc script. In spite of being a script gcc is usually referred to somewhat inaccurately as the compiler.

Command to build ex1 [22] [top]

gcc -Wall ex1.c -o ex1

Option -Wall means print all warnings

Option -o ex1 means to name the output executable file to be 'ex1'

Creating/Editing Files [23] [top]

There are several editors installed on the Linux machine with your accounts.

emacs
nano
vim

emacs provides facilities for editing, compiling, correcting errors

nano is the simpler to learn and use than emacs, but also has the capability of compiling and correcting errors although not quite as conveniently as emacs

vim is improved vi; vim is only suggested in case you are already familiar with it or with vi.

The Documents page on the class web site has links to guides to using emacs or nano.

Representation of Integers: binary [24] [top]

Intel processor instructions for arithmetic operations expect integers to be represented in a particular way in memory.

The int type typically is expected to be represented by a sequence of 32 1's and 0's; that is, 32 bits.

Representation of integers using only 1's and 0's means that the (decimal) value of each position is a power of 2: 1, 2, 4, 8, 16, etc. instead of 1, 10, 100, 1000, etc.

Binary:            1   0  1  1 = 1 * 8    + 0 * 4   + 1 * 2  + 1 * 1 = 11 (in decimal)
Position value:    8   4  2  1

Decimal:           1   0  1  1 = 1 * 1000 + 0 * 100 + 1 * 10 + 1 * 1 = 1011 (in decimal)
Position value: 1000 100 10  1

With 4 bits, there are 16 (= 2 x 2 x 2 x 2) possible combinations of 0's and 1's in each of the 4 positions.

Representation of Integers: hex [25] [top]

There are 16 hexadecimal 'digits'
The first 10 are the usual digits: 0 - 9
The last 6 are the first 6 letters of the alphabet: A - F
The decimal value of A is 10; B is 11 decimal, etc.

Binary and Hex Notation [26] [top]

The 16 values 0 - 15 are each represented by 1 hex "digit".

4 binary digits are needed to represent the same 16 values.

Hex values are preceded by 0x as in C.

Hex	Decimal	Binary
0x0	0	0000
0x1	1	0001
0x2	2	0010
0x3	3	0011
0x4	4	0100
0x5	5	0101
0x6	6	0110
0x7	7	0111
0x8	8	1000
0x9	9	1001
0xA	10	1010
0xB	11	1011
0xC	12	1100
0xD	13	1101
0xE	14	1110
0xF	15	1111

Converting Hex to Binary [27] [top]

C int type is (typically) represented using 32 bits (32 1's and 0's).

Since each hex digit is equivalent to 4 bits, an int can also be represented with only 8 hex 'digits' ( 4 * 8 = 32).

Converting hex to binary is easy:

Replace each hex digit by the corresponding 4 bits

In C programs, hex integers are indicated by prefix 0x or 0X

Example:

Write 0x5CA in binary (should be 3 hex digits * 4 bits/hex digit = 12 bits)

    Convert hex 0x5CA to binary

(1) Convert each hex digit to the corresponding 4 binary bits:
   Hex:    0x     5     C     A
Binary:         0101  1101  1010


Answer: 010111011010

Converting Binary to Hex [28] [top]

Converting binary to hex is just the reverse:

Process the bits in groups of 4, starting from the right:
Replace each group of 4 bits by the corresponding single hex digit

Example:

Write the binary string 10011110000111 in hex

 (1) Separate 10011110001111 into groups of 4 (starting from the right):
      
       10 0111 1000 1111

     Add leadding 0's if necessary so the leftmost group also has 4 bits

       0010 0111 1000 1111

 (2) Replace each group of 4 bits by the corresponding hex 'digit'
     and use 0x indicate this is a hex value:

     Binary: 0010 0111 1000 1111
     Hex      2     7   8    F

 Answer: 0x278F

Exercise [29] [top]

Fill in the missing entries in the following table:

Binary	Hex
10001100	0x8C
	0x18
01000110
	0xAB
01111111

Converting Binary to Decimal [30] [top]

Converting binary to decimal requires knowing the position value of each bit.

The right most bit is position 0:

For an 8 bit binary value

        Binary:      0  1  0  0  1  0  1  1
      Position:      7  6  5  4  3  2  1  0
Position Value:    128 64 32 16  8  4  2  1

The value of bit position k is 2^k

The decimal value of binary 01001011 is the sum of the positions values of the non-zero bits: 64 + 8 + 2 + 1 = 75

Converting Hex to Decimal [31] [top]

Converting hex to decimal requires knowing the position value of each hex digit.

The right most hex digit is again position 0:

For the 3 hex digit value 0x1AC

           Hex:      1  A  C
      Position:      2  1  0
Position Value:    256 16  1

The value of hex position k is 16^k: (16² = 256)

The decimal value of hex 0x1AC is the sum of the positions values of the non-zero hex multipllied by the value of the hex digit at that position: 256 * 1 + 16 * 10 + 1 * 12 + 1 = 428

(0x1 = 1, 0xA = 10 , 0xC = 12)

Bit operations [32] [top]

Operations AND, OR, and XOR (exclusive OR), and NOT are defined, thinking of 1 as true and 0 as false.

So, for example, 1 AND 1 is 1, while 1 AND 0 is 0.

p	q	p AND q
0	0	0
0	1	0
1	0	0
1	1	1

p	q	p OR q
0	0	0
0	1	1
1	0	1
1	1	1

p	q	p XOR q
0	0	0
0	1	1
1	0	1
1	1	0

p	NOT p
0	1
1	0

Curious facts:

NOT p is a unary operator.

There are exactly 2 unary operators.
p AND q, p OR q, and p XOR q are binary operators.
There are exactly 16 different possible binary operators: p OP q.

However, each such binary operator can easily be expressed using a combination of only the AND, OR, and NOT operators.

C Bit Operators [33] [top]

Important: The C bit operators act on integers!

They are called bit operators because they act on each pair of bits at the same position of the two integers operands.

Bit operation (at each bit position)	C Notation
AND	a & b
OR	a \| b
XOR	a ^ b
Not	~a

Examples (for 4 bit integers)

x	1011
y	0101
x \| y	1111

x	1011
y	0101
x & y	0001

x	1011
y	0101
x ^ y	1110

x	1011
~x	0100

Special Identities [34] [top]

Fill in the missing entries for this example with 8 bit values.

y	11011001
0x0	00000000
y ^ 0x0

y	11011001
~y
0xFF	11111111
y ^ 0xFF

DeMorgan's Identities (single bit)[35] [top]

x	y	~x	~y	x & y	~(x & y)	~x \| ~y	~~x
0	0	1	1	0	1	1	0
0	1	1	0	0	1	1	0
1	0	0	1	0	1	1	1
1	1	0	0	1	0	0	1

This verifies the first version of DeMorgan's Law below and the idempotent law (~~x = x) for single bit values x and y.

	  ~(x & y) = (~x) | (~y)

	  ~(x | y) = (~x) & (~y)

	  ~~x = x

These DeMorgan's Laws for single bit values are analogous the following version applied to logical statements p, q with values true corresponding to 1 and false corresponding to 0:

The following are logical statements equivalent

It is not the case that both p and q are true
Either p is not true or q is not true

Symbolically

      NOT (p AND q) =  (NOT p) OR (NOT q)

and similarly

      NOT (p OR q) = (NOT p) AND (NOT q)

DeMorgan's Identities In C (integers)[36] [top]

The bit operations {~, &, |, ^} in C operate on integers at each bit of the integer.

So the DeMorgan's Identities just verified for single bit operands still hold for integers:

If x and y are C integers then

	  ~(x & y) = (~x) | (~y)

	  ~(x | y) = (~x) & (~y)

	  ~~x = x

Example with 8 bit integers x and y

Verifying that ~(x & y) = (~x) | (~y)

x	0110 0011
y	1101 0111
x & y	0100 0011
~x	1001 1100
~y	0010 1000
(~x) \| (~y)	1011 1100
~(x & y)	1011 1100

Bang [37] [top]

If x is an integer, the unary C operator !x is defined as

	!x = 1 if x is 0
	!x = 0 if x is not 0

Like the bit operators, bang acts on integers with an integer result, but unlike the bit operators, it uses the whole integer value, not each separate bit.

What are the following expressions where x is an integer value ( 4 bytes or equivalently, 8 hex digits)

!!x		(if x is 0)
!!x		(if x is not 0)
x ^ x
!(x ^ x)

Basics - Data Representation [38] [top]

The binary representation already discussed is not quite sufficient for C, since we also need to represent negative integers as sequences of bits. (Can't put a minus sign in between bits in memory, just 0's and 1's.)

C also needs to represent values between 0 and 1, etc.

C numeric types:

Integers (signed: positive, negative and 0)

Unsigned integers (only 0 and positive)

Floating point numbers (single and double sizes)

C Integer Data Types [39] [top]

type            typical size                value range
char             1 byte (8 bits)            -128 to 127
unsigned char    1 byte                        0 to 255
short            2 bytes                    -32768 to 32767
unsigned short   2 bytes                       0 to 65535
int              4 bytes                    -2147483648 to 2147483647
unsigned int     4 bytes                    0 to 4294967295
long             4 bytes                    -2147483648 to 2147483647
unsigned long    4 bytes                    0 to 4294967295

The char C data type [40] [top]

   char x;

   x = 12;

   Storage allocated for x: 1 byte (8 bits)

   Binary          Decimal            Hexadecimal
   00001100        12                 0C

C Integer Arithmetic![41] [top]

Consider the integer type: unsigned char with values: 0 to 255.

What is 255 + 1 as an unsigned char? Not 256.

255 + 1 is out of range - overflow.

But no error occurs, it "wraps around" to get the next value, which is 0:

      0 1 2 ...   254 255

The mod operator, % provides a way of calculating how C computes values of unsigned integers that overflow.

There are 256 possible unsigned char values. The mod operator x % 256 will give a value in the range 0 to 255. Just what is required of the unsigned char type.

(255 + 1) % 256 = 0

Overflow [42] [top]


    1	#include <stdio.h>
    2	#include <stdlib.h>
    3	
    4	
    5	
    6	int main()
    7	{
    8	  unsigned char x, y;
    9	  unsigned char z;
   10	  x = 250;
   11	  y = 16;
   12	  z = x + y;
   13	
   14	  printf("%u + %u  = %u\n", x, y, z);
   15	
   16	  return 0;
   17	  
   18	}

Output:
 250 + 16 = 10

Notes:
The %u is a conversion specification to output z 
as an unsigned integer.

Overflow?[43] [top]

This program compiles and runs with no errors reported even though the 10 result is clearly wrong mathematically.

This just emphasizes the fact that arithmetic for these integer data types does not obey the same rules as ordinary integer arithmetic.

How does a programmer write code to check whether overflow may occur for these data types?

CSC373 Sep06

slide version

single file version

Contents

Which course is this?[1] [top]

Today[2] [top]

Linux[3] [top]

Labs[4] [top]

Data Lab[5] [top]

Bomb Lab[6] [top]

Buffer Overflow Lab[7] [top]

C and Java similarities[8] [top]

Some C and Java differences[9] [top]

C Pitfalls (if you know Java or C++)[10] [top]

Code Example[11] [top]

Header Files[12] [top]

main[13] [top]

Formatted printing[14] [top]

Pointers[15] [top]

Address Operator and pointer Variables[16] [top]

Don't Mix Types[17] [top]

Dereferencing operator *[18] [top]

Input[19] [top]

Sample executions[20] [top]

Building the executable ex1 from ex1.c[21] [top]

Command to build ex1[22] [top]

Creating/Editing Files[23] [top]

Representation of Integers: binary[24] [top]

Representation of Integers: hex[25] [top]

Binary and Hex Notation[26] [top]

Converting Hex to Binary[27] [top]

Converting Binary to Hex[28] [top]

Exercise[29] [top]

Converting Binary to Decimal[30] [top]

Converting Hex to Decimal[31] [top]

Bit operations[32] [top]

C Bit Operators[33] [top]

Examples (for 4 bit integers)

Special Identities[34] [top]

DeMorgan's Identities (single bit)[35] [top]

DeMorgan's Identities In C (integers)[36] [top]

Example with 8 bit integers x and y

Bang[37] [top]

Basics - Data Representation[38] [top]

C Integer Data Types[39] [top]

The char C data type[40] [top]

C Integer Arithmetic![41] [top]

Overflow[42] [top]

Overflow?[43] [top]

Today [2] [top]

Linux [3] [top]

Labs [4] [top]

Data Lab [5] [top]

Bomb Lab [6] [top]

Buffer Overflow Lab [7] [top]

C and Java similarities [8] [top]

Some C and Java differences [9] [top]

Code Example [11] [top]

Header Files [12] [top]

main [13] [top]

Formatted printing [14] [top]

Pointers [15] [top]

Address Operator and pointer Variables [16] [top]

Don't Mix Types [17] [top]

Input [19] [top]

Sample executions [20] [top]

Building the executable ex1 from ex1.c [21] [top]

Command to build ex1 [22] [top]

Creating/Editing Files [23] [top]

Representation of Integers: binary [24] [top]

Representation of Integers: hex [25] [top]

Binary and Hex Notation [26] [top]

Converting Hex to Binary [27] [top]

Converting Binary to Hex [28] [top]

Exercise [29] [top]

Converting Binary to Decimal [30] [top]

Converting Hex to Decimal [31] [top]

Bit operations [32] [top]

C Bit Operators [33] [top]

Special Identities [34] [top]

Bang [37] [top]

Basics - Data Representation [38] [top]

C Integer Data Types [39] [top]

The char C data type [40] [top]

Overflow [42] [top]