Nmap Development mailing list archives

Re: massping-migration and other dev testing results


From: Brandon Enright <bmenrigh () ucsd edu>
Date: Wed, 12 Sep 2007 06:21:19 +0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 11 Sep 2007 23:39:21 -0600 plus or minus some time David Fifield
<david () bamsoftware com> wrote:

On Wed, Sep 12, 2007 at 02:07:19AM +0000, Brandon Enright wrote:
On Tue, 11 Sep 2007 12:36:05 -0600 David Fifield
<david () bamsoftware com> wrote:
Actually another thought just occurred to me. The
nmap-massping-migration branch also has debugging and profiling flags
turned on. I wonder if that could have such a large effect? I've
noticed that these flags make Nmap use a lot more CPU on my system,
but I haven't measured them to actually slow down any scans.

I had noticed the additional CPU usage but I didn't think much of it.
I've compiled debug and profiling code before and not seen this kind of
performance hit.  Perhaps profiling has a lot more work to do with C++
code?

Second, please run side-by-side the latest /nmap (the same as your
test above) and the latest nmap-massping-migration (r5824, the same
as above the exception of the removal of debugging flags). Run your
'a' tests concurrently, then your 'b' tests concurrently.

This made the difference we were looking for.

Groovy!

I was under the impression that --randomize-hosts only randomizes host
within a ping group.  The fact that it also increases the size of the
ping group is *huge*.  The man doesn't make this all that clear.  I went
ahead and looked at the code for this and I have a couple of thoughts.

You're right, it only randomizes hosts within a ping group. So it
wouldn't have any effect on those empty blocks, except, as you've said,
that it also increases the ping group size.

--randomize-hosts isn't in the nmap -h output either.

First, PING_GROUP_SZ is set to 4096.  When you use --randomize-hosts
'o.ping_group_sz = PING_GROUP_SZ * 4;' is run.  The man says the group
can grow to up to 8096.  There doesn't appear to be any special cap so
the group size would actually be 16384.

Oops. We changed the default PING_GROUP_SZ from 2048 to 4096 a little
while ago and I didn't know that was in the reference guide. That's
fixed.

Second, it really surprises me that an important value like this isn't
adjustable.  I thought --min-hostgroup set the ping group size but after
looking at the code, this doesn't appear to be the case.  I suppose most
people aren't scanning 10k+ hosts so it doesn't matter much.  For those
that do though, it really matters.

--min-hostgroup hasn't ever had an effect on ping scans, as far as I can
tell (not even with massping). Host discovery kind of does its own
thing, which is a legacy from massping.

- --min-hostgroup is one of those options that the man explains the high
level function of but doesn't properly capture how/what it really does.  A
line in the man something like "this option only affects port scanning
hosts.  A ping sweep (-sP) and host detection are not affected." might help.


Since this value is already so large, using the value from
--min-hostgroup is probably not a good idea.  Perhaps another option
like --min-ping-group.

Hmm.

Although the results below are not totally convincing, I still like this
idea.  Hey, Nmap should let you shoot yourself in the foot if you really
want to, right? ;-p


I have preliminary results that show larger ping_groups (using either
- --randomize-hosts, or recompiling) to really help.  Some of the scans
are still going though so I'll have to send a follow up email
illustrating this when I get home.

Okay, here are scan results.  To recap, my 'a' scan uses -T5 and has a
min/max parallelism of 1024 against ports 135,139,445,3389 on 180k
IPs.  My 'b' scan just uses -T5.

The 'a' scans were started simultaneously for the Nmap SVN trunk as well
as the massping-migration branch (MPM).  The same goes for the 'b'
scans.


david_mpm_r5824a.nmap:
# Nmap done at Tue Sep 11 20:54:22 2007 -- 186368 IP addresses (11790
hosts up) scanned in 288.591 seconds

david_nmap_r5824a.nmap:
# Nmap done at Tue Sep 11 20:54:22 2007 -- 186368 IP addresses (11936
hosts up) scanned in 287.305 seconds

All I can say here is whoa, that's close. And fast.


david_mpm_r5824b.nmap:
# Nmap done at Wed Sep 12 00:17:31 2007 -- 186368 IP addresses (15628
hosts up) scanned in 2640.259 seconds

david_nmap_r5824b.nmap:
# Nmap done at Wed Sep 12 00:49:08 2007 -- 186368 IP addresses (15901
hosts up) scanned in 4536.876 seconds

Okay, now your new code is starting to shine.  Much faster, almost as
accurate.



There's a script called host-list-compare.py in the
nmap-massping-migration branch. It takes two .nmap log files and prints
out which hosts are up in the first that aren't in the second, and vice
versa. I'm curious to know whether the mpm runs just missed a bunch of
hosts, or they missed a bunch of hosts but found a bunch of others.
Please run that script on your logs for these most recent scans and
report back the two lines that say "N extra hosts in X.nmap".

Okay, just the comparisons between MPM and NMAP:

====
176 extra hosts in david_mpm_r5824b.nmap:
449 extra hosts in david_nmap_r5824b.nmap:
====

Very interesting results.  The 179 is about 1/3 are wireless hosts and the
rest a random sampling of hosts.  The 449 are almost exclusively a
particular build/deployment/image of Windows that we have distributed all
over campus. When I get a chance I'll try to figure out why that build of
Windows needs a slower scan (or what it is about the networking gear that we
typically use with this build).


====
1729 extra hosts in david_mpm_r5824a.nmap:
1875 extra hosts in david_nmap_r5824a.nmap:
====
These are a random sampling of hosts all from a large set of the same
subnets.  If I had to describe the difference, I'd say that hosts in those
networks have a 50/50 chance of showing up.  Sometimes they show up in one
scan and not the other.


If you are interested in the difference between the a/b scans for a Nmap
build, I'll generate those too.


And then, if you wouldn't mind, grep through the logs for a few of the
addresses that were missed and see what the differences are. It's
significant if, say, a response is received late (after the end of a
ping group, which --packet-trace would show) versus not received at all.

I'm happy to do this but it will have to wait until tomorrow.


I don't know what the hosts on your network are like. During my testing
for the migration, I found that there are some weird hosts out there
that consistently take 30 seconds or more to respond to a probe. If
those 30 seconds don't elapse before the ping group is finished, the
host is marked down. There's not much you can do about those except use
a larger ping group or less aggressive timing parameters.

I'm not suggesting that's what's going on with your scans, it's just one
explanation I thought of. Another is that maybe the more robust
congestion window actually is oversaturating the line enough that
responses are being dropped without being detected.

The following statements ignore wireless:

In the general case, I don't think this is possible.  There are a small
handful of routers on the network who's CPU can't handle the huge number of
flows and drop packets.  For the most part though, the network is a
state-of-the-art multi-gigabit beast with 0 packet loss.

I agree that lose is occurring somewhere, I just don't think it is the
fault of the network.  I've seen other tools that use libpcap report
dropped packets once in a while.  Is it possible that Nmap either isn't
getting the packets out and they are being dropped by libpcap or that the
responses are getting dropped on the way in?


I'm going to follow up with tweaked PING_GROUP_SZ results but here is a
preview. I ran david_mpm_r5824b.nmap (took 2640 seconds) with
- --randomize-hosts and shaved off 600 seconds:

david_mpm_r5824c.nmap:
# Nmap done at Wed Sep 12 01:27:32 2007 -- 186368 IP addresses (13327
hosts up) scanned in 2019.837 seconds

This scan did find 2k fewer hosts, but since they were done around 5pm
local time some of this drop-off is hosts being turned off.

That's a nifty result. When we made the change to 4096 in ping groups, I
did some testing and didn't see much of a difference. I bet larger ping
groups have more of an effect on your network.

The preliminary results were better than the final tests.  I think this
deserves some more poking at on my part.


It looks like we're almost there with the migration. I've talked to
Fyodor, and I plan to merge this in the near future so that it can go
into a prerelease.

Sounds great.  Thanks for all your hard work tweaking this.  I'm really
excited to put this new code into production.


David Fifield


If anything else comes to mind that you want me to test, let me know.

Brandon

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFG54VfqaGPzAsl94IRAhetAJ4o0DA0ImLs+yXq0bLGJkAQCzughwCfZx9x
MVBBjzbyqDkFUIoO3J4Tlgs=
=Zd5r
-----END PGP SIGNATURE-----

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org


Current thread: