Nmap Development mailing list archives

[GSoC] Performance / Optimization Project

From: Alexandra Binca <alexandra.binca () gmail com>
Date: Tue, 24 Mar 2015 23:16:31 +0200
Hello,

My name is Alexandra Binca, I study for a Master's degree at
Politehnica University of Bucharest. I am very good with algorithms
and data structures (I like a lot to train on sites with algorithmic
challenges).

I want to apply this year to GSoC for the nmap organization, more
specifically to the optimization task. I did some profile on nmap
using valgrind (with callgrind) and kcachegrind and studied the
hotspots a bit. I am confident that over the summer I can reduce the
execution time of some parts of nmap by implementing better data
structures and algorithms. For the application period I focused on two
main things in order to demonstrate my ability to work with nmap and
C++:

1. Optimization of --exclude lists. The current algorithm implemented
in nmap to deal with --exclude lists has a O(N) complexity for
decision (is a certain address excluded?) and does not cover ipv6
addresses (nbase/nbase_addrset.c).  I implemented a new data structure
(a binary decision tree) that reduces the lookup complexity to
O(logN), where N is the exclude list size.  The algorithm is generic
(can work on spaces of any dimension) and uses a fair amount of
memory, thus by using it instead of addrset nmap gains speed and the
ability to extend it to ipv6 without wasting huge amounts of memory
(as the current idea would do). If you want to take a look, the code
is here [1] (along with benchmark and test functions).

2. General hotspots of nmap. In one of my tests, most of the time was
being spent in reading various configuration files. More specifically,
it seems that parsing /etc/services is really slow. At the moment I am
trying to reduce the execution time of the function that loads
/etc/services by at least 20%. I noticed a lot of calls to sscanf and
strtod, maybe it is possible to replace them or rewrite parts of the
code to make it faster.

Is it possible to setup a wiki account or something like that were I
could list the things I tried to do / did over the application period?

Also, feedback would be really appreciated :-).

TL;DR

Benchmark for binary decision tree for IPv4-like ranges (it is not
possible to run this test with the current implementation because the
complexity is too high):

RandomTest (D = 4) begins.
add areas speed for 1000 instances: 0ms
build tree speed for 1000 instances: 2ms
contains speed for 1000000 instances: 215ms
RandomTest (D = 4) passed.

[1] https://github.com/alexandrabinca/binary_decision_tree

Best regards,
Alexandra BINCA
_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at http://seclists.org/nmap-dev/
Current thread:

[GSoC] Performance / Optimization Project Alexandra Binca (Mar 24)