Nmap Development mailing list archives

[NSE][patch] More httpspider blacklist extensions, revamp function


From: Daniel Miller <bonsaiviking () gmail com>
Date: Wed, 13 Jun 2012 15:57:03 -0500

Hi list,

I was running into a problem with my XenServer instances, which host a MSI installer for XenCenter on a simple web server. Running any of the scripts that involve spidering resulted in downloading this 43MB file multiple times. I added "msi" to the list of default blacklisted extensions in httpspider.lua, and this solved the problem.

Of course, I couldn't stop there. I added more executable extensions ("msi", "bin"), archive extensions ("tgz", "tar.bz", "tar", "iso"), and a new category, document extensions (pdf, {doc,xls,ppt}{,x,m}, od[fsp], ps, xps).

I also noticed that the blacklist function being created in Crawler:addDefaultBlacklist() was bloated, containing 4 local tables declarations, nested for loops, and string concatenation in the innermost loop. I converted it into a closure over a new table which only requires one level of for loop, and already contains the properly formatted match patterns. Also, I moved the url:getPath() call out of the loop, added a string.lower(), and cached the result in a local variable for doing the string.match(). Previously, uppercase extensions in a URL would not have been matched.

Patch attached.

Dan

Attachment: httpspider-blacklist.patch
Description:

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/

Current thread: