CSC373 Sep06

slide version

single file version

Contents

  1. Which course is this?
  2. Today
  3. Linux
  4. Labs
  5. Data Lab
  6. Bomb Lab
  7. Buffer Overflow Lab
  8. C and Java similarities
  9. Some C and Java differences
  10. C Pitfalls (if you know Java or C++)
  11. Code Example
  12. Header Files
  13. main
  14. Formatted printing
  15. Pointers
  16. Address Operator and pointer Variables
  17. Don't Mix Types
  18. Dereferencing operator *
  19. Input
  20. Sample executions
  21. Building the executable ex1 from ex1.c
  22. Command to build ex1
  23. Creating/Editing Files
  24. Representation of Integers: binary
  25. Representation of Integers: hex
  26. Binary and Hex Notation
  27. Converting Hex to Binary
  28. Converting Binary to Hex
  29. Exercise
  30. Converting Binary to Decimal
  31. Converting Hex to Decimal
  32. Bit operations
  33. C Bit Operators
  34. Special Identities
  35. DeMorgan's Identities (single bit)
  36. DeMorgan's Identities In C (integers)
  37. Bang
  38. Basics - Data Representation
  39. C Integer Data Types
  40. The char C data type
  41. C Integer Arithmetic!
  42. Overflow
  43. Overflow?

Which course is this?[1] [top]

Today[2] [top]

Linux[3] [top]

  1. All programming will be in C
  2. Linux accounts on cdmlinux.cdm.depaul.edu
  3. Online Linux tutorials and auxilliary text

Labs[4] [top]

  1. Data lab
  2. Bomb lab
  3. Buffer Overflow lab

Each lab is challenging and will increase your deeper understanding of hardware and system software.

Start early! Each lab has multiple parts with separate due dates for each part.

Data Lab[5] [top]

What you will get out of doing this lab:

Bomb Lab[6] [top]

You will be given an executable program but not the C source code.

The program prompts for input (6 times).

If you type the wrong input, the program explodes!

You will use various Linux tools: a debugger, object dump, etc. to determine what input is expected.

What you will get out of this lab:

Buffer Overflow Lab[7] [top]

You will be given code for which you will mount various attacks to cause the program to execute differently and/or to execute unrelated code.

What you will get out of this lab

C and Java similarities[8] [top]

Some C and Java differences[9] [top]

C Pitfalls (if you know Java or C++)[10] [top]

There have been several C "standards". You will likely find a few surprises:

    1	
    2	int main()
    3	{
    4	   for(int i = 1; i <= 5; i++) { // ERROR unless compiled with -std=c99
    5	      printf("Hello\n");
    6	   }
    7	   return 0;
    8	}

Line 4 error: 'for' loop initial declaration used outside C99 mode

The loop variable must be declared before the for loop. E.g at line 4:

    1	
    2	int main()
    3	{
    4	  int i;  // ERROR FIXED
    5	  for(i = 1; i <= 5; i++) {
    6	     printf("Hello\n");
    7	  }
    8	  return 0;
    9	}

Code Example[11] [top]

File name: ex1.c

    1	#include <stdio.h>
    2	#include <stdlib.h>
    3	
    4	int main()
    5	{
    6	  int x, y;
    7	  int max, min;
    8	  int status;
    9	
   10	  printf("\nEnter two integers separated by whitespace: ");
   11	  status = scanf("%d%d", &x, &y);
   12	  
   13	  max = (x < y) ? y : x;
   14	  min = (x < y) ? x : y;
   15	
   16	  printf("You input values %d and %d\n", x, y);
   17	  printf("Max = %d, Min = %d\n\n", max, min);
   18	
   19	  return 0;
   20	}

Header Files[12] [top]

#include <stdio.h>
#include <stdlib.h>

main[13] [top]

int main()
{
  ...
  return 0;
}

Every C program has a function named 'main' with int return type

By convention a return value of 0 means 'success'

Formatted printing[14] [top]


printf("You input values %d and %d\n", x, y);

If the value of x is 5 and y is 10, this printf statement will output:

You input values 5 and 10

Other conversion specifications:

Conversion spec Expected value type
%d int
%f float or double
%c char
%s char array1

1For the %s format, the char array is expected to have an entry containing the null character, '\0', that indicates the end of the character string. The printf function keeps printing successive characters until it finds this null character.

Pointers[15] [top]

A pointer value is a memory address that holds some a value of some type

An address of an int location is a "pointer to an int".

An address of a float location is a "pointer to a float".

Address Operator and pointer Variables[16] [top]

The & operator can be applied to variables (and expressions that are bound to accessible memory locations). The result is the memory address.

If x is of type int, then &x is of type pointer to int.

Variables can be declared of type pointer to int:

      int * p; // p is of type pointer to int
    

and be assigned an address:

      int x = 5;
      int * p;

      p = &x;
    

Don't Mix Types[17] [top]

If y is of type int and p is of type pointer to int, p should not be assigned to y. The types don't match:

Example:

      int x = 5;
      int y;
      int * p = &x;

      y = p;  // NO! y is an int; p's value is the address of x, not
              // the int value of x.

      p = 6;  // NO! 6 is an int, not a pointer to int;
    

Dereferencing operator *[18] [top]

When used as a prefix operator, the * symbol is the dereferencing operator.

It should be applied to a pointer variable (or a pointer expression).

The result is an alias for what p is pointing to.

So *p in the example above is an alias for x.

Corrected Example:

      int x = 5;
      int y;
      int * p = &x;

      y = *p; // This is the same as y = x;
              // It assigns x's value 5 to y.

      *p = 6; // This is the same as x = 6;
              // p's value is unchanged; it is still the address of x
              // This changes x's value to be 6.
    

Input[19] [top]

 status = scanf("%d%d", &x, &y);	
      

Sample executions[20] [top]

1.

Enter two integers separated by whitespace: 5 10
You input values 5 and 10
Max = 10, Min = 5
      

2.

Enter two integers separated by whitespace: 3.4 19
You input values 3 and 2862629
Max = 2862629, Min = 3
      

In example 2, only the first conversion succeeds (status is 1).

3 is stored in x. The second conversion fails since an integer can't start with "."

So y is not initialized.

No compile error or run time exception is thrown in C.

So x and y are printed, but the value of y is whatever happend to be in the memory location for y.

Building the executable ex1 from ex1.c[21] [top]

All steps are handled by the gcc script. In spite of being a script gcc is usually referred to somewhat inaccurately as the compiler.

Command to build ex1[22] [top]

gcc -Wall ex1.c -o ex1	
      

Option -Wall means print all warnings

Option -o ex1 means to name the output executable file to be 'ex1'

Creating/Editing Files[23] [top]

There are several editors installed on the Linux machine with your accounts.

emacs provides facilities for editing, compiling, correcting errors

nano is the simpler to learn and use than emacs, but also has the capability of compiling and correcting errors although not quite as conveniently as emacs

vim is improved vi; vim is only suggested in case you are already familiar with it or with vi.

The Documents page on the class web site has links to guides to using emacs or nano.

Representation of Integers: binary[24] [top]

Intel processor instructions for arithmetic operations expect integers to be represented in a particular way in memory.

The int type typically is expected to be represented by a sequence of 32 1's and 0's; that is, 32 bits.

Representation of integers using only 1's and 0's means that the (decimal) value of each position is a power of 2: 1, 2, 4, 8, 16, etc. instead of 1, 10, 100, 1000, etc.

Binary:            1   0  1  1 = 1 * 8    + 0 * 4   + 1 * 2  + 1 * 1 = 11 (in decimal)
Position value:    8   4  2  1

Decimal:           1   0  1  1 = 1 * 1000 + 0 * 100 + 1 * 10 + 1 * 1 = 1011 (in decimal)
Position value: 1000 100 10  1
    

With 4 bits, there are 16 (= 2 x 2 x 2 x 2) possible combinations of 0's and 1's in each of the 4 positions.

Representation of Integers: hex[25] [top]

Binary and Hex Notation[26] [top]

The 16 values 0 - 15 are each represented by 1 hex "digit".

4 binary digits are needed to represent the same 16 values.

Hex values are preceded by 0x as in C.

Hex Decimal Binary
0x0 0 0000
0x1 1 0001
0x2 2 0010
0x3 3 0011
0x4 4 0100
0x5 5 0101
0x6 6 0110
0x7 7 0111
0x8 8 1000
0x9 9 1001
0xA 10 1010
0xB 11 1011
0xC 12 1100
0xD 13 1101
0xE 14 1110
0xF 15 1111

Converting Hex to Binary[27] [top]

C int type is (typically) represented using 32 bits (32 1's and 0's).

Since each hex digit is equivalent to 4 bits, an int can also be represented with only 8 hex 'digits' ( 4 * 8 = 32).

Converting hex to binary is easy:

In C programs, hex integers are indicated by prefix 0x or 0X

Example:

Write 0x5CA in binary (should be 3 hex digits * 4 bits/hex digit = 12 bits)

    Convert hex 0x5CA to binary

(1) Convert each hex digit to the corresponding 4 binary bits:
   Hex:    0x     5     C     A
Binary:         0101  1101  1010


Answer: 010111011010

    

Converting Binary to Hex[28] [top]

Converting binary to hex is just the reverse:

Example:

Write the binary string 10011110000111 in hex

 (1) Separate 10011110001111 into groups of 4 (starting from the right):
      
       10 0111 1000 1111

     Add leadding 0's if necessary so the leftmost group also has 4 bits

       0010 0111 1000 1111

 (2) Replace each group of 4 bits by the corresponding hex 'digit'
     and use 0x indicate this is a hex value:

     Binary: 0010 0111 1000 1111
     Hex      2     7   8    F

 Answer: 0x278F
    

Exercise[29] [top]

Fill in the missing entries in the following table:

Binary Hex
10001100 0x8C
0x18
01000110
0xAB
01111111

Converting Binary to Decimal[30] [top]

Converting binary to decimal requires knowing the position value of each bit.

The right most bit is position 0:

For an 8 bit binary value

        Binary:      0  1  0  0  1  0  1  1
      Position:      7  6  5  4  3  2  1  0
Position Value:    128 64 32 16  8  4  2  1
    

The value of bit position k is 2k

The decimal value of binary 01001011 is the sum of the positions values of the non-zero bits: 64 + 8 + 2 + 1 = 75

Converting Hex to Decimal[31] [top]

Converting hex to decimal requires knowing the position value of each hex digit.

The right most hex digit is again position 0:

For the 3 hex digit value 0x1AC

           Hex:      1  A  C
      Position:      2  1  0
Position Value:    256 16  1
    

The value of hex position k is 16k: (162 = 256)

The decimal value of hex 0x1AC is the sum of the positions values of the non-zero hex multipllied by the value of the hex digit at that position: 256 * 1 + 16 * 10 + 1 * 12 + 1 = 428

(0x1 = 1, 0xA = 10 , 0xC = 12)

Bit operations[32] [top]

Operations AND, OR, and XOR (exclusive OR), and NOT are defined, thinking of 1 as true and 0 as false.

So, for example, 1 AND 1 is 1, while 1 AND 0 is 0.

p q p AND q
0 0 0
0 1 0
1 0 0
1 1 1
p q p OR q
0 0 0
0 1 1
1 0 1
1 1 1
p q p XOR q
0 0 0
0 1 1
1 0 1
1 1 0
p NOT p
0 1
1 0

Curious facts:

  1. NOT p is a unary operator.

    There are exactly 2 unary operators.

  2. p AND q, p OR q, and p XOR q are binary operators.

    There are exactly 16 different possible binary operators: p OP q.

    However, each such binary operator can easily be expressed using a combination of only the AND, OR, and NOT operators.

C Bit Operators[33] [top]

Important: The C bit operators act on integers!

They are called bit operators because they act on each pair of bits at the same position of the two integers operands.

Bit operation
(at each bit position)
C Notation
AND a & b
OR a | b
XOR a ^ b
Not ~a
Examples (for 4 bit integers)
x 1011
y 0101
x | y 1111
x 1011
y 0101
x & y 0001
x 1011
y 0101
x ^ y 1110
x 1011
~x 0100

Special Identities[34] [top]

Fill in the missing entries for this example with 8 bit values.

y 11011001
0x0 00000000
y ^ 0x0
y 11011001
~y
0xFF 11111111
y ^ 0xFF

 

DeMorgan's Identities (single bit)[35] [top]

x y ~x ~y x & y ~(x & y) ~x | ~y ~~x
0 0 1 1 0 1 1 0
0 1 1 0 0 1 1 0
1 0 0 1 0 1 1 1
1 1 0 0 1 0 0 1

This verifies the first version of DeMorgan's Law below and the idempotent law (~~x = x) for single bit values x and y.

	  ~(x & y) = (~x) | (~y)

	  ~(x | y) = (~x) & (~y)

	  ~~x = x
      

These DeMorgan's Laws for single bit values are analogous the following version applied to logical statements p, q with values true corresponding to 1 and false corresponding to 0:

The following are logical statements equivalent

Symbolically

      NOT (p AND q) =  (NOT p) OR (NOT q)
    

and similarly

      NOT (p OR q) = (NOT p) AND (NOT q)
    

DeMorgan's Identities In C (integers)[36] [top]

The bit operations {~, &, |, ^} in C operate on integers at each bit of the integer.

So the DeMorgan's Identities just verified for single bit operands still hold for integers:

If x and y are C integers then

	  ~(x & y) = (~x) | (~y)

	  ~(x | y) = (~x) & (~y)

	  ~~x = x
      

Example with 8 bit integers x and y

Verifying that ~(x & y) = (~x) | (~y)

x 0110 0011
y 1101 0111
x & y 0100 0011
~x 1001 1100
~y 0010 1000
(~x) | (~y) 1011 1100
~(x & y) 1011 1100

Bang[37] [top]

If x is an integer, the unary C operator !x is defined as

	!x = 1 if x is 0
	!x = 0 if x is not 0
      

Like the bit operators, bang acts on integers with an integer result, but unlike the bit operators, it uses the whole integer value, not each separate bit.

What are the following expressions where x is an integer value ( 4 bytes or equivalently, 8 hex digits)

!!x (if x is 0)
!!x (if x is not 0)
x ^ x
!(x ^ x)

Basics - Data Representation[38] [top]

The binary representation already discussed is not quite sufficient for C, since we also need to represent negative integers as sequences of bits. (Can't put a minus sign in between bits in memory, just 0's and 1's.)

C also needs to represent values between 0 and 1, etc.

C numeric types:

Integers (signed: positive, negative and 0)

Unsigned integers (only 0 and positive)

Floating point numbers (single and double sizes)

C Integer Data Types[39] [top]

type            typical size                value range
char             1 byte (8 bits)            -128 to 127
unsigned char    1 byte                        0 to 255
short            2 bytes                    -32768 to 32767
unsigned short   2 bytes                       0 to 65535
int              4 bytes                    -2147483648 to 2147483647
unsigned int     4 bytes                    0 to 4294967295
long             4 bytes                    -2147483648 to 2147483647
unsigned long    4 bytes                    0 to 4294967295

The char C data type[40] [top]

   char x;

   x = 12;

   Storage allocated for x: 1 byte (8 bits)

   Binary          Decimal            Hexadecimal
   00001100        12                 0C

C Integer Arithmetic![41] [top]

Consider the integer type: unsigned char with values: 0 to 255.

What is 255 + 1 as an unsigned char? Not 256.

255 + 1 is out of range - overflow.

But no error occurs, it "wraps around" to get the next value, which is 0:

      0 1 2 ...   254 255
    

The mod operator, % provides a way of calculating how C computes values of unsigned integers that overflow.

There are 256 possible unsigned char values. The mod operator x % 256 will give a value in the range 0 to 255. Just what is required of the unsigned char type.

(255 + 1) % 256 = 0

Overflow[42] [top]


    1	#include <stdio.h>
    2	#include <stdlib.h>
    3	
    4	
    5	
    6	int main()
    7	{
    8	  unsigned char x, y;
    9	  unsigned char z;
   10	  x = 250;
   11	  y = 16;
   12	  z = x + y;
   13	
   14	  printf("%u + %u  = %u\n", x, y, z);
   15	
   16	  return 0;
   17	  
   18	}

Output:
 250 + 16 = 10

Notes:
The %u is a conversion specification to output z 
as an unsigned integer. 


Overflow?[43] [top]

This program compiles and runs with no errors reported even though the 10 result is clearly wrong mathematically.

This just emphasizes the fact that arithmetic for these integer data types does not obey the same rules as ordinary integer arithmetic.

How does a programmer write code to check whether overflow may occur for these data types?

More on this later!