Nmap Development mailing list archives

Re: Review: Angry IP Scanner

From: Brandon Enright <bmenrigh () ucsd edu>
Date: Wed, 20 Aug 2008 17:20:24 +0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi David.  I think you forgot to CC nmap-dev on this so I'll respond to
the list and keep the whole message intact.  Reply below.


On Wed, 20 Aug 2008 10:41:14 -0600
David Fifield <david () bamsoftware com> wrote:

On Fri, Jun 06, 2008 at 02:05:52AM +0000, Brandon Enright wrote:

One thing I do like about it is the ability to narrow down the
random IP generation to a given base IP and netmask.  This may
not be easy to implement in Nmap (since a user may or may not
want a reserved IP, and the random IP generation code would have
to be changed to allow for this), and probably not worth it.  For
example, if you wanted random IPs in the range of 192.*.*.*,
should 192.168 be chosen or not?  It's still pretty cool, though.


This *is* nice and I've thought about implementing it for Nmap but
every time I give it serious thought I decide that outputting all
the IPs to a list with -sL and then a random section from that list
is a better way to do it.

I'm not aware of any generic algorithm, method, or technique that
could generate numbers in some arbitrary set of ranges without
duplicates that is both fast and memory efficient.  Creative use of
a Bloom Filter (http://en.wikipedia.org/wiki/Bloom_filter) would
work but that starts to get pretty damn time-consuming to do
right...


While researching something else I stumbled upon an algorithm for
this. It's Algorithm S in section 3.4.2 of the Art of Computer
Programming (page 142 in the third edition). To select n addresses
out of a pool of N, you simply iterate through the addresses and
select each one with probability

      (n - m) / (N - t),

where m is the number of addresses selected so far and t is the number
of records iterated over (both initialized to 0).

This requires knowing the size of the address pool in advance, which
is no problem for us. Clearly no collisions are possible (unless the
user lists an address more than once). It requires only a constant
amount of memory. A limitation is that while it gives you a random
selection, the addresses will still be in order relative to each
other. This can be mitigated with --randomize-hosts.

If we want to restrict random addresses to be in a list of addresses
this is the way to go. However just having a good algorithm doesn't
mean the feature should be implemented. Does someone have a
compelling use case? Otherwise I'll just let this post stand in case
the topic comes up again.

David Fifield


Thanks for the followup on this.  I love Knuth but the above algorithm
is somewhat simple-minded.  It requires linear memory which is what we
had with some of our other ideas.  I think that we decided that we'd
need to do better than linear memory if we were to actually implement
this feature.

I did come up with a way to do this but the "algorithm" was somewhat
complicated.  I didn't send it to the list because I'm notoriously
poor and explaining the random crap that pops into my head.

I'll give it a shot anyways:

Our goal here is to be able to use -iR but select IPs out of our own
range.  For example, an organization might want to pick 1000 IPs out of
"10.0.0.0/16".  Someone else might want to try to sample 10,000 router
addresses on the Internet in the range "*.*.*.1,65,129,193".

Knuth suggests that we list out all the possible IPs and then just
select them with the probability described above.  There is a much more
memory-efficient way to do it though by exploiting a linear recurrence
(LCG).

Before I get to that though, one must be able to map an index number
into a the desired range.  That is, pretend that you made an array of
this range: "*.*.*.1,65,129,193".

Then index 0 in that array would be "0.0.0.1".  Index 7 would be
"0.0.1.193".  In general, we can easily compute the size of the
imaginary array and we can easily compute the value of a single element
in that array given an index.  We can do both of these things without
pre-computing the entire array.

So what this did was map elements in our range into integer starting at
0 and going up to the size of the range.

What we need to do now is select integers out of that range without
duplicates.  That turns out to be easy using an LCG
(http://en.wikipedia.org/wiki/Linear_congruential_generator).  The
trouble is that it could be hard to pick constants that will generate a
maximal-period LCG for an arbitrarily sized range.

We can get around that too though.  To do so, pre-compute 32 sets of
constants, each with a power-of-two period.  That is, the first pair
would have period 2, the 3rd pair would have period 8, the 10th 1024
and so on.

So if the range of indexes you want to select from is 3000, you'd pick
the LCG constants with period 4096.  Then, seed the LCG randomly and
pull out one index at a time.  If you get an index greater than your
range (say, 3200), just discard that index and move on.

Because we computer ^2 sized LCG periods this is *at worst* 50%
efficient (and on average 75%) which is still linear time.  It also uses
constant memory which is a huge win over linear memory.

Like you said though, just because we have an algorithm or two that
work doesn't mean we should actually implement this.

Brandon

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAkisUl4ACgkQqaGPzAsl94JhDwCeJuiWUBLXyEm6+vCVc3EZXpR4
tnwAn34NqalK9eWb9suZnYEyK9vcQ2bu
=wxCo
-----END PGP SIGNATURE-----

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org

Current thread:

Re: Review: Angry IP Scanner Brandon Enright (Aug 20)
- Re: Review: Angry IP Scanner David Fifield (Aug 20)