Vulnerability Development mailing list archives

Re: shellcode -> asm?


From: Stephen <sa7ori () broken blackroses com>
Date: Tue, 8 Oct 2002 16:53:09 -0400 (EDT)

Many people have proported to be able to go from the hex of the shellcode
back to the actual human readable asm. Many people, dont seem to do it
properly. So I started writing something on my own to do it, one of the
biggest difficulties I had is (specifically on x86) basically demonstrated
below.  Many assume that all that is needed to do is construct a big
struct or array of the hex values of all the x86 commands, and simply step
through the shellcode doing the translation back to the corresponding asm
instruction. Using this method is REALLY unreliable, and is basically
impossible because of the way x86 handles some instructions based on the
operands etc.

for example:

0x80483b0 <main+20>:    mov    $0xb,%eax
0x80483b5 <main+25>:    mov    %esi,%ebx

two mov instructions that presumably have the same opcode right!?
so if we x/bx main+20 and main+25 the same hex opcode should presumably
be there. this isnt the case.

(gdb) x/bx main+20
0x80483b0 <main+20>:    0xb8
(gdb) x/bx main+25
0x80483b5 <main+25>:    0x89

if you get the INtel x86 developers notes you can generally get a list
of the hex opcodes for the instructions (24319101.pdf).
We can see that MOV has many faces one of which is 0x89, but as
demonstrated above, we cant rely on this as a general rule, so it is not
as easy as it looks.

Many "disassemblers" just construct large matrix of opcodes, their sizes
and such, but this really isnt accurate. What I see that most people
have done is to take the hex opcodes and then to convert them to binary
and take the bits that correspond to the actual x86 command and OR them
with the values of the operands of the operation (registers, etc) and then
convert them back to hex and test if they match with values in the
shellcode string. THis is VERY painstaking, and again considerably
unreliable. I suggest perhaps perusing the source
code of gdb to see how it does the OR and all its stuff (x86).
opcodes/i386-dis.c is a good place to get started (in the gdb src tree).

When it comes down to it, x86 is VERY nasty. good luck, I would try to
start small, and just keep building upon the routines that do the
coversion. Using the bitwise OR is just a good a method to start with as
any. For most x86 shellcode building a really rough matrix of coversion
values and doing ORs has worked in most GENERAL cases.


On Tue, 8 Oct 2002, Sean Zadig wrote:

Hi,
I'm doing some research into creating variants of common attacks, but I ran
into a problem of sorts. For most of the attacks I have, the shellcode
consists of the overflow and the actual malicious code that is run. I want
to be able to isolate the overflow from the rest of the shellcode and use
that to create attack variants. Problem is, I don't know where one ends and
the other begins! I figure if I turn the hex-encoded shellcode back into
assembly code, I could probably figure it out. I'm familiar with how to do
the reverse in gdb, but is it possible to do what I want? To restate:
shellcode -> asm is what I need. If this is a simple thing, my apologies -
but the security-basics list rejected my post =)
   -Sean Zadig

-----
Sean Zadig
Student, UC Davis
PGP Key ID: 0xDE44A79F
7EE1 C80A A0C1 B224 45CE  F74B 5835 0115 DE44 A79F


_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com



Current thread: