Dailydave mailing list archives

x86_RE_lib


From: Joel Eriksson <je () bitnux com>
Date: Fri, 3 Feb 2006 11:35:49 +0100

On Mon, Jan 30, 2006 at 02:11:26PM -0800, halvar () gmx de wrote:
For those on this list that enjoy reading very ugly python code, check
www.sabre-security.com/x86_RE_lib.zip -- depending on your viewpoint, it's one
of the following things:

1) The proof that python code is not necessarily more readable than perl
2) An example of how not to document or write python code
3) An 'idea-dump' into which I dump a lot of prototype code
4) A collection of relatively useful IDAPython utility functions for generating
   flowgraphs, inlining functions into these flowgraphs, doing (very limited, local,
   register-only) dataflow analysis, detecting vtables, building specialized graphs
   for uninitialized variable attacks.

Cheers,
Halvar 

Hi Halvar,

Thanks for making this available. I'm just getting started with IDAPython
so it saved me time being able to look at how some IDA API-functions are
used in practice. :) I've run into some problems though, not sure if it
is caused by a bug in IDA / IDAPython / the plugin API or your code.

The actual problem I'm trying to solve at the moment is eliminating a large
number of anti debugging / anti binary modification code sequences from a
certain program. The code sequence looks like this:

- Variable code that calculates an addr, a size and a value.
- A loop that looks like this:

loop:
    mov <tmp_reg>,[addr_reg+optional_and_variable_offset]
    xor|add|sub <value_reg>,<tmp_reg>
    sub <addr_reg>, constant_that_varies_from_1_to_4
    dec <size_reg>
    or|test <size_reg>, <size_reg> (this line is not always included)
    jnz loop

- Code that indirectly causes a crash if the value calculated in value_reg
  is incorrect. Sometimes by using the value when calculating an address
  that is dereferenced / called, sometimes when calculating an array index
  and sometimes it just checks if the calculated value equals a specified
  value, if it doesn't it messes up the stack and ret's into an invalid addr,
  to make it harder to determine where the check that caused the crash was
  located.

Throughout all of the above, jmp's over junk are sometimes inserted randomly.
Probably to confuse linear disassemblers, although that is no problem with
IDA that uses the recursive traversal approach.

Eliminating one check at a time doesn't work so well. If I modify code at or
insert a breakpoint at addr X and succeeds in locating and eliminating the
check that indirectly causes the crash that occurs when addr X has been
modified, I usually end up crashing somewhere else because of another check
that detects that the code that checks addr X has been modified, and so on.

Single-stepping would be a possibility, but takes ages, and for some reason
hardware breakpoints won't work for me in either GDB or IDA Pro when debugging
this certain program. The program is multi-threaded btw, and both GDB and the
IDA Pro linux seems to handle this quite poorly..

My idea was to make an IDA Python-script for finding code sequences that
match the loop, since that is the only part of it that is fairly static,
then manually eliminating each check by using the x86emu-plugin to calculate
the correct values and patching the code.

Pseudo-code of how I intended to do this:

- For each function:
    - Split up the function into basic blocks
    - Merge blocks that are only separated by jmp fwd; junk; fwd: ...
    - Print the addr of blocks that matches the loop described above

Since I was not that familiar with the IDA API I looked at your code to see how
you split up functions into basic blocks and how to get the instruction mnemonic
and the value of operands.

At first I wrote my own python classes for instructions, basic blocks and functions
but ended up with IDA hanging and sometimes the Python interpreter crashing when
splitting up certain functions into basic blocks.

I assumed I might have made a mistake somewhere, so I tried making a function
that uses the get_basic_block() and get_short_crefs_from() functions that you
defined instead. The code looks like this (will probably look weird without a
monospaced font :):

import x86_RE_lib

def get_func_flow(ea):
    """Return a dict of basic blocks in function at ea."""
    func = get_func(ea)
    ea = func.startEA
    flow = {}
    flow[ea] = get_basic_block(ea)
    list = [ ea ]
    while len(list) > 0:
        ea = list.pop(0)
        curr = flow[ea]
        if curr[-1][1] != "call":
            next = get_crefs_from(curr[-1][0])
        else:
            next = get_short_crefs_from(curr[-1][0])
        for ea in next:
            if not flow.has_key(ea):
                flow[ea] = get_basic_block(ea)
                list.append(ea)
    return flow

def strip_tags_from_flow(flow):
    """Remove IDA:s color coding from operand strings."""
    for block in flow.values():
        for line in block:
            for i in range(2,5)
                line[i] = tag_remove(line[i])
                if line[i] == None: line[i] = ""

def print_flow(flow):
    """Print the basic blocks in flow, sorted by address."""
    keys = flow.keys()
    keys.sort()
    for ea in keys:
        print "[ %x ]" % (ea)
        for line in flow[ea]:
            print disasm_line_to_string(line)
        print

Then I tested it on different functions with the following:

flow = get_func_flow(get_screen_ea())
strip_tags_from_flow(flow)
print_flow(flow)

It works fine for most functions, but for certain rather complicated functions
it hangs IDA completely or crashes IDAPython (it gets an exception and reloads).
Finally I tested using your create_flowgraph_from() function directly, and I
still get the same problem.

Unfortunately I cannot send you the actual code that I'm analyzing, but perhaps
a flowgraph generated by IDA of one of the functions that causes the problem can
give you an idea of what the problem might be:

   https://sec.bitnux.com/badfunc.jpg

"165 nodes, 1686 edge segments, 207 crossings" ... Yikes. :) 

If you like to I could also send you a complete listing of this function, after
removing strings and function names.

I hope to get the opportunity to buy BinNavi soon btw, it would be very useful
for some of the projects I'm working on. :)

-- 
Best Regards,
   Joel Eriksson
-------------------------------------------------
Cellphone: +46-70 228 64 16 Home: +46-18-30 35 55
Security Research & Systems Development at Bitnux
PGP Key Server pgp.mit.edu, PGP Key ID 0x08811B44
DF38 5806 0EFB 196E E4B6 34B5 4C01 73BB 0881 1B44
-------------------------------------------------


Current thread: