If you’re like me you want to get some hands on reverse engineering experience. Reading about things is fine but learning by doing is a great thing to combine with reading. With that in mind I am going to take a different approach to going over assembly language for reverse engineering. We aren’t going to start out by learning how to write programs in assembly. We are going to start out by writing programs in C and C++ and find out how they relate to assembly. Then we are going to look at the assembly and see how we can work with it.
Credit for this idea goes to the book Reverse Engineering for Beginners by Dennis Yurichev. Go get it and get ready to make it your friend. It’s a long book and we are going to get everything we possibly can out of it. The difference between what I’m planning here and the book is that I’m going to get into the assembly more. I’m assuming you don’t have any experience with assembly language, and possibly lack experience with C/C++.
The background you need for this series is a basic understanding of some programming language. How programming works etc. If you have ever taken an introductory programming course you should be fine. You also will need a Windows and Linux OS if you are going to compile the binaries and follow along. The next thing you need is patience. This is going to take a while.
Lets Get Started:
I have started a repo on Github that will hold all the binaries we are going to use in this series. There you will find the source code and the compiled binaries. When starting a module we will look at the binaries first. Then we will see if we can identify the purpose of the binary. Lets get some introductory material out of the way before we jump straight into binaries. The first thing we need to do is go over some fundamentals of assembly language.
Assembly language is the lowest programming language that is supposed to be readable by humans. We learned in the architecture modules that the CPU reads machine code. Assembly language is designed to represent that machine code as something we can interpret.
There are two types of syntax for x86 assembly language, Intel and At&t. I prefer the Intel syntax because I find it more readable, however knowing how they relate to each other and being able to read both is a worthwhile endeavor. There are also different syntax for ARM and other processors. Assembly language is processor specific and not portable.
In x86 assembly (from now on when I say x86 assembly I’m referring to Intel syntax as well) we have a general form of
Targets could be a memory address, a pair of source and destination registers/register + memory combination, value destination pair, operation input pairs, you get the idea.
Lets look at an example that will show up frequently:
mov destination register, source register
The mov instruction moves the value in the source register to the destination register. We will see mov instructions a lot as we look at programs. Here is a cheat sheet for some assembly instructions. Your homework for this module is to take a loot at the cheat sheet and make sure you understand it. You don’t have to memorize it, you’ll be seeing a lot of instructions as we move forward and they will start sticking in your mind. Take a look at the rest of that cheat sheet as well. There’s a lot of good information on it.
The last thing we are going to do in this module is take a look at an example of an operation in assembly language.
push ebp mov ebp, esp sub esp, 0x10 ret
The first instruction push places the value in register ebp on the stack. The stack is a portion of memory that the CPU controls and uses for program execution. Stack frames are created for each function in a C program, including the main function. The ebp register has a special purpose, go to the Intel manual that you should have downloaded in the architecture modules and look up what it does.
Next the mov instruction stores the value in the esp register in the ebp register. This instruction sets up the registers extend the stack for a new stack frame, which will be for a function in our program. The third instruction sub reduces the address held in esp by 0x10 or 16 bytes. The values we see in the assembly output will usually be in hexadecimal notation or base 16. The last instruction ret is responsible for getting rid of the stack frame and jumping back to the previous stack frame.
The purpose of this construct in assembly is to set up a new stack frame and then it returns to the calling stack frame.
There’s a whole lot of information to go over with assembly. We haven’t even scratched the surface but you now have an example some assembly code. Google and the Intel manuals are going to be our friends along with the book from the introduction.
In the beginning we are going to take it slow and we are going to go over everything in great detail. Discussing instructions and what affect they are having. As we get more comfortable and have some experience we will start focusing on control flow and deconstructing larger more complex programs.