In a program many bugs and vulnerabilities come from how user input is handled. Programmers make assumptions about the kinds of input users will provide to a program. Where that input comes from and what type it will be. If you’re familiar with buffer overflow vulnerabilities then you already know one way user input can be used maliciously.
In this module we are going to take a first look at how user input can be handled in a very simple C program (not handled well). We are going to build on the previous program and see what differences are in the assembly dump. Then we are going to trace how the input is moved through the program. With that in mind lets have a look.
The Main Function:
You should be able to determine where the function prologue is. Lets skim the assembly and see what stands out. Each function call should give us an idea of what this program does. We have a printf() call, a scanf() call, and a function called greeting. We have seen printf() in our previous modules and should expect that address pushed onto the stack right before the call is a format string to be printed.
Lets take a look at the documentation for scanf() which tells us that the function takes two arguments. The first is a format identifier and the second is a location to place it. Do you remember how arguments are placed onto the stack for a function call in x86 architecture? If not don’t worry lets take a look at what happens by setting a breakpoint right after scanf() executes using the command
breakpoint *0x080484bd and then run the program.
Now we are stopped directly after returning from the scanf() call. There were two pieces of information pushed onto the stack directly before the call. The first was the memory address at ebp-0x6c. This wasn’t done directly, it was done by loading the address into eax and then pushing eax onto the stack. Take a look at the definition of the lea instruction. We then have another memory address pushed onto the stack. Lets find out what each of these addresses hold.
First we can figure out what the address of ebp-0x6c is by subtracting 0x6c from the register value. From our register printout we see that ebp = 0xFFFFDA58 and if we use
print 0xFFFFDA58 - 0x6c we have that the address pushed onto the stack is 0xFFFFD9EC. We can look at that memory address:
If we take a look at an ASCII chart we will see that the memory address holds kcaJ or Jack backwards. So this is the memory address we are storing the input at. So the address was pushed onto the stack first. The next value is where the format specifier is held. This gives us a nice reminder that with x86 architecture arguments are pushed on the stack from right to left.
The first area that user input is placed in our program is as data on the stack at location ebp-0x6c.
Following the Data:
Lets look down the instruction list some more. Our memory location holding our user input is loaded again into eax at 0x80484c6 and the address is pushed onto the stack right before calling the greeting function. Lets find out what the greeting function holds:
The greeting function calls printf() and so we have that we are passing our user input to the greeting function to print out something.
Exercise: Calculate the value of ebp+0x8 in this function when you run the binary through gdb and verify the data being pushed as an argument. If you aren’t sure how to do so review module 5 for a refresher.
Exercise: Draw a diagram of the stack frames for main and greeting and draw the flow of the user input through each stack frame. Use arrows to show where it is placed and referenced. As a hint in greeting is ebp+0x8 part of greetings stack frame or mains stack frame?
This program is very simple but lets do a deeper analysis of the binary. We have user input and we know where it is stored. But what do we know about the memory where it’s stored at?
We know that the string “Jack” was stored at memory address ebp-0x6c = ebp-108. So there are 108 bytes between the start of our data and ebp. Lets see if we can figure out what’s in that area or if it’s just garbage waiting to be filled. If we look at the memory at ebp-0x6c we see that the memory above our data isn’t full of zero’s so there is something there. Lets do an experiment and see what happens when we fill it with 108 bytes of data.
We have a really long string as a name in a greeting but nothing bad happens. But what happens if we add some more data at the end of our string so it’s longer than the 108 bytes?
Now we have a segfault. Take a look at those register values as well. Those 41’s are actually our A’s in ASCII. We can take a look at the memory at our user input memory address to see what it looks like now:
We’ve filled the block up with 41’s. Another interesting thing to take note of is the ecx register. Lets take a look at what’s happening with that in our program.
The first thing we see in main is loading the memory address of esp+0x4 into ecx which is a memory address which is 4 bytes in size. (8 bits = 1 byte and 32 bit address = 4 bytes). Then we have the stack prologue and push the value in ecx on the stack. Which means that the value from esp+0x4 is now at ebp-0x4. Jumping to the end of our program we see the value at ebp-0x4 being loaded back into ecx. We then place this value into esp after the leave instruction. Which means in our program execution that we gave too much input data to we are loading 0x41414141 – 0x4 = 0x4141413d into esp. It turns out that is not a memory location that can be accessed.
As long as we halt the program before we hit the ret instruction we don’t encounter a segmentation fault, which is an illegal memory access. But there was nothing during program execution that stopped us from overwriting data beyond the bounds intended by the programmer. This has just been our first view of a buffer overflow vulnerability in assembly. In addition we got a first look at buffer overflow mitigation as well. We will examine this situation more as we add complexity to the programs.
In this module we saw user input being supplied to a program for the first time. We then followed that user input through our binary to see what happened with it. There are many different methods of dealing with user input and some are better than others. C is a programming language that allows the programmer to do all kinds of unsafe things. We want to know what that looks like in our reverse engineering so we have experience finding unsafe situations in a binary (for whatever reason, I just find it fun and interesting)
Next time we will take a program that is similar to one found in Hacking the Art of Exploitation by Jon Erickson. We will perform a check on our user input and then look at what happens when an unsafe condition allows us to bypass that check.