Assembly Language: What It Is

I recently had an interaction with someone very new to the world of security and exploit development.  They had experience programming with languages like Java, Python, and JavaScript, but they had not idea what went on underneath the interpreter (Or in the interpreter I’m gathering.)  I’m not saying that as a criticism, you don’t know anything until you learn.  To that end I decided to write this post as a really basic high level view of assembly language and how it matters to computers.

Computers: Lets Execute Stuff.

Lets start with two absolutely necessary hardware components to executing a program (process).  The CPU is what actually executes the instructions.  The memory is where they CPU gets those instructions to execute.  There’s a whole lot that goes on that is being completely glossed over but for now that’s all we really need.

The CPU executes instructions that are sent to it.  Those instructions are not what you read when you type in some code on your IDE or text editor. System.out.println("String");
Means absolutely nothing to a CPU.  Like I said in the Computing Quick and Dirty post a program is compiled into computer executable form.  That executable form happens to be a set of opcodes that will be sent through the CPU.  Each opcode is an instruction for the CPU to do something.  What that something is isn’t important for this discussion.  Just know that the CPU understands strings of binary and based on those values it does things.

Assembly Language: It’s Almost Readable

Assembly language is the lowest level human readable programming language.  It’s a set of instructions that match the machine code in a one to one manner.   But there’s a catch.  Each CPU architecture has its’ own set of machine code instructions.  Therefore each CPU architecture has its own assembly language.

That’s right assembly language is completely non-portable to different architectures.  There are some exceptions for extended architectures.  For example it is possible to run a 32-bit x86 executable on a 64-bit x86_64 machine.  That’s because the the x86 instruction set is included in the x86_64 instruction set.  (You have to install the correct packages to get the 32-bit functionality as well, your Linux distro may or may not come with them)  However, it is not possible to have portability in the reverse direction.  It’s also not possible to run an x86 assembly program on an ARM chip.  You would have to completely rewrite it in ARM assembly.

As we saw in the Quick and Dirty post the form of the disassembled binary varies greatly on what type of compiling is being doen with your code.  In Reverse Engineering for Beginners by Dennis Yurichev you can compare and contrast several versions of a program in different assembly forms.  Notably x86, x86_64, MIPS, and ARM.  Seeing those types of comparisons are instructive from the standpoint of recognizing how the structures are similar and how they are different.

Assembly Language: What It Does

Now that we know what assembly language is the simple answer to what it does is execute instructions.  The key here is where and how it executes instructions.  Mostly assembly deals with memory addresses and registers.  It deals with calling functions in libraries as well but that can be boiled down to moving a register to a different address.

When doing things with assembly we are pushing and popping values on and off the stack.  Saving values in memory and in registers.  Doing arithmetic on those values.  Printing those values with various function calls or system calls.  Jumping the instruction pointer around based on values in registers and memory.  All within the constraints of the instruction set available for the CPU and the memory we have available.

Assembly Language: What Can I Do With It

One reason to learn assembly language is to figure out what is going on in code without the actual source code, i.e. reverse engineering.  Another reason is to directly inject a series of opcodes into a running program to take control of the CPU instruction execution, i.e. shellcode.  If we understand assembly we can go into a binary file and change the opcodes to change the program execution.

It’s pretty cool when  you can go into a binary and change the execution flow to bypass what you don’t want, or to execute what normally wouldn’t be executed.

Conclusion:

Assembly language is vitally important to reverse engineering and possibly exploit development.  (I’ll get back to assembly in exploit development later when I have something intelligent to say about it.  Not that the requirement of having something intelligent to say has stopped me from writing about reverse engineering.)  We have gone over a quick, high level, overview of what assembly language is.  There are people who are much smarter and better than I am who can teach you how to program in assembly.  See the book list for some.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s