The C-library function to read an input line (discarding the
newline) is
gets(char *str)
char line[50];
gets(line); // Reads input line up to newline, stores in line
with a null byte at the end
The function getsx is like gets, except that
Characters are typed as pairs of hex digits.
Non hex digit characters are ignored by getsx.
getsx stops when a newline is encountered.
1. The getbuf function calls getsx to read input into the char
array, buf of size 12.
2. But getbuf ignores what is copied into the buf array and
simply returns the integer 1.
3. By typing in more than 12 characters the buf array will
overflow, with the extra input being copied into memory
after the array storage.
Goal: By cleverly choosing the input characters to
enter, get new code to execute so that getbuf returns the integer value
0xdeadbeef
instead of 1.
1. The getsx function lets you type in characters such as 3f 1a 05 and
converts each pair of characters to the a single 8 bit byte with
the corresponding hex value; e.g., 0x3f, 0x1a, 0x05.
2. Devise an input string that is the byte encoding of a function that
simply returns the value 0xdeadbeef.
3. Key: Pad this input string so that it overwrites the char
array in getbuf and in fact overwrites the return address on the
stack:
+-------------+
| | <- input string will be stored in here
| |
| | getbuf's stack frame
| main's ebp |
+-------------+
|ret in main | Overwrite this address with a new one
+-------------+
| |
| |
| | main's stack frame
| |
| |
+-------------+
4. The new return address should simply be the address back in
getbuf's stack frame where the input string is stored in the buf
array.
1. In overwriting the return address to main with a new return
address, main's ebp value will also be overwritten since it is just
above the return value on the stack:
+-------------+
| | <- input string will be stored in here
| |
| | getbuf's stack frame
| main's ebp |
+-------------+
|ret in main | Overwrite this address with a new one
+-------------+
| |
| |
| | main's stack frame
| |
| |
+-------------+
2. So the input string must also have main's ebp and the new ret
address as the last 8 bytes.
3. When getbuf returns it will pop its stack frame as usual. We
haven't altered anything to affect that and the top of the stack
will contain main's stack frame.
4. However, our code will be executing instead of main.
1. Build a stack frame for itself, including the part that looks like
main called it.
2. So, our code first needs to push the return address in main (as if the call
instruction had executed).
3. Then it can proceed as normal as if it were written like this:
1 intattack()
2 {
3 return 0xdeadbeef;
4 }
1. Determine the ebp value for main, the caller of getbuf.
2. Determine the beginning address of the character array buf
in getbuf that will be loaded and that will overflow and overwrite
getbuf's stack frame.
3. Determine the byte code for our attack function.
4. Build the input string as this byte code plus padding bytes if
necessary plus main's ebp plus the beginning address of our code
(i.e., the beginning address of the character array).
1. Use gdb to determine ebp value and beginning address of character
array.
2. Use gcc -S to generate the assembly code for the attack
function.
3. Copy the code and add the pushl instruction to push main's return
address on the stack in a file named attack2.s. Then use gcc -c
attack2.s to assemble this code in to object file attack2.o
4. Use objdump on the object file attack2.o to get the byte encoding
of the function we want.
5. Append padding bytes (00 or 90) if necessary to the bytes from 4
and then add main's ebp value and the beginning address of the char
array buf where this code will be copied by getsx.
Here are three attempts to prevent this buffer
overflow problem:
The gets function in the gcc C library does some kind
of check to see that the input string doesn't overwrite the
stack state (old ebp and return address)
The gcc compiler can use a similar technique for arbitrary
functions meeting some criteria.
The operating system can place different padding to the
bottom of the stack each time the program is run.
Use objdump to produce an assembler listing, but which
includes the machine byte representation of each
instruction (redirect output to a different .s file):
$ objdump -d mycode.o >mycode_obj.s
Here is mycoe_obj.s
0: 68 ac 3d 04 08 push $0x8043dac
5: 68 00 05 40 fd push $0xfd400500
a: 89 e5 mov %esp,%ebp
c: b8 05 00 00 00 mov $0x5,%eax
11: 5d pop %ebp
12: c3 ret
The machine program code is 18 (0x12) bytes long and (ignoring blanks) is:
68 ac 3d 04 08 68 00 05 40 fd 89 e5 b8 05 00 00 00 5d c3