In this module we are going to start getting hands on experience. You will find all of the binaries for these modules on my Github page under the Reversing-Program repo. You should also have the Reversing for Beginners book handy because we will be using that for these modules.
We are going to diverge from Reversing for Beginners a little. The book already does an amazing job of explaining concepts and showing what each piece looks like so there is no need to repeat all of that here. We will go through and look at the pieces of a program and see what we can match up. We want to build our ability to recognize C/C++ patterns in assembly language.
Lets Get Started:
In any C/C++ program you are required to have a main function. That is where we are going to start. Lets look at the following snippet of assembly.
push ebp mov ebp,esp mov eax,0x0 pop ebp ret
The first thing we need to do is understand what we are looking at. The first column is the assembly mnemonics. The second column contains the arguments for the mnemonic if it takes any. Notice that ret takes no arguments in this snippet.
Do you know what kind of file this came from? In a program as small as this there isn’t a lot to go on. Based on the registers we can guess that it’s a 32-bit x86 program. There’s a better way to find out without guessing though. If you happen to be on a Linux distro there is a nice utility called file that determines a file type. So we can use that utility.
$ file mod2gcc32 mod2gcc32: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=bedac3485f5e622719f7f12692654eefcca49c27, not stripped
There’s a lot in there but we are mostly concerned with that first line. ELF, which stands for Executable and Linked Format, is the binary type for Linux which tells us this binary was compiled for Linux. 32-bit tells us the architecture. That’s a good starting place for now. The file utility isn’t our only option though.
There is a great tool called radare2 or r2 that is a reverse engineering tool set. We will use many tools in these modules and see what works the best for us. What you enjoy and find simple may be different than what someone else does. It is also convenient that radare2 is available for Windows.
We are specifically interested in the rabin2 functionality of radare2. Running rabin2 on our binary results in the following:
$ rabin2 -I mod2gcc32 havecode true pic false canary false nx true crypto false va true intrp /lib/ld-linux.so.2 bintype elf class ELF32 lang c arch x86 bits 32 machine Intel 80386 os linux minopsz 1 maxopsz 16 pcalign 0 subsys linux endian little stripped false static false linenum true lsyms true relocs true rpath NONE binsz 6101
This gives us even more information than the file utility does. For example we see that we have a 32-bit ELF binary again. In addition to that we see that the original program was written in C and a lot more. I recommend getting radare2 and playing around with the tools to see what they can do.
Now that we know what kind of binary we are looking at lets look at the instructions. We are going to go to the documentation for this part. The first instruction is the push instruction and the Intel Manual tells us in section 5-3 that push pushes onto the stack. With that in mind you should now turn to Reverse Engineering for Beginners and look up what the stack is.
Therefore the first instruction pushes the value in the ebp register onto the stack. Returning to the manual lets find out what the value in ebp represents and find out why we would want to push it onto the stack.
The ebp register is a 32-bit general purpose register that is a pointer to data on the stack. That is ebp contains a memory address for data located somewhere else. To understand this we need to understand the stack a little more.
The stack is a data structure that is dynamically allocated during the running of a program. It is a First In Last Out (FILO) data structure. Each time a new function is called a new section of the stack is allocated by adjusting the top of the stack frame and moving the window of the stack we are looking at. This window is referred to as a stack frame. The register ebp holds the bottom or base of the stack frame memory address.
With that knowledge we know that the first instruction is placing the memory address of the base of the previous stack frame on the stack. Doing so will allow the program to jump execution back to the previous stack frame when we are done with the new one.
Next we see the mov instruction.
The mov instruction transfers data by copying it from source to destination. It has the form:
mov destination source
Where the source can be a register, a memory address, or a literal. The destination can be a register or a memory address. In Intel assembly syntax a memory location is denoted by square brackets, for example [eax + 4] would reference the memory address located at the value of eax + 4 bytes.
In our case here we are copying new values into two different registers. We have already discussed the ebp register so lets look at the esp register.
The esp register is another 32-bit general purpose register. It holds the memory address at the top of the stack. Now that we know about ebp and esp we should make note of an interesting feature of the stack. The stack grows towards smaller addresses. So by subtracting from the address in esp we are actually increasing the size of the stack.
By moving the address in esp to ebp we are pointing the base of the new stack frame we are constructing for our function at the top of the stack. This will allow creating a new stack from by extending esp to a lower memory address. The new stack frame will be contained between the esp and ebp registers.
The next instruction we see is the pop instruction.
The pop instruction removes the top piece of data from the stack and places it in the register passed as an argument. In the case of the binary here into ebp. Remember that ebp is where the base of the stack is and try to think of what is being done with this instruction.
The last instruction we see is the ret instruction.
From the Intel manual we find out that ret stands for return. This instruction is telling the program to return to the calling function. This transfers control back to the calling function. We will discuss the calling conventions when we get into more detail later.
In this module we have taken a look at our first binary and seen two ways to determine some of its properties. We have also gone over each of the instructions in the assembly dump of the main function. Next time we will go over what this binary is doing line by line. So for next time you should know what each of the discussed assembly mnemonics mean and have an idea of what the stack is. I urge you to grab the binary and explore what you can do with it in radare2 and your choice of operating system.