Nmap Development mailing list archives

Re: [RFC] Improve NSE HTTP architecture.

From: Djalal Harouni <tixxdz () opendz org>
Date: Sun, 19 Jun 2011 21:09:02 +0100

On Thu, Jun 16, 2011 at 05:17:50PM -0700, Fyodor wrote:

On Tue, Jun 14, 2011 at 02:46:55PM +0100, Djalal Harouni wrote:

Currently there are more than 20 HTTP scripts, most of them are discovery
scripts that perform checks/tests in order to identify the HTTP
applications. These tests can be incorporated into the http-enum script to
reduce the size of the loaded and running code, and to achieve better
performance. Of course this will reduce the number of the HTTP scripts,
but writing an entire NSE script for a simple check that can be done in
5-10 Lua instructions is not the best solution either.


Reducing the total code size and optimizing performance is indeed very
important.  But of course we also have to keep user interface factors
in mind.  Right now, many http discovery scripts such as html-title
and http-robots.txt run by default with -A or -sC.  If we moved them
into http-enum and users had to know about them and specify special
arguments, I think that would dramatically reduce usage of the
functionality.

I agree.

This proposal relies on some of the Nmap information that should be
exported to NSE scripts:

* User specified script categories selection "--script='categories'".


That would be easy to add, but I worry about what scripts would do
with the information.  For example, suppose we have http-enum do vuln
checks if the 'vuln' category was selected.  Well, then what if the
user just specified script names specifically (which may or may not be
in vuln category)?  What if user specified --script=all?  Maybe rather
than try to reimplement the category selection functionality, the
script(s) could be made to work with it.  For example, if the shared
work is done in a library anyway, maybe you could have a small
http-enum-vuln script which users could enable by name or category or
whatever.

Yes another small script like http-enum-vuln, that will load 'vuln' or
'exploit' fingerprints or matches is a good solution, this way we avoid
the one-script-per-vuln, especially if that check is only 5 Lua
instructions. And loading fingerprints based on their categories should
be done by a library code.
So I'll say: a script that will load the 'intrusive', 'exploit', 'dos"
and 'vuln' fingerprints and matches, can be a popular script.

My main point on this is to use the same NSE categories, and not extra
categories like 'attack', etc.
The 'app' field in the fingerprint table can be used to identify the
application type.

5) Crawler and http-enum:

then http-enum with its matching code and other HTTP scripts can be
in a situation when they will not yield since there are no network
operations.  A solution in the http-enum matching code (this is the
big code) would be to use coroutines and make them yield explicitly.


Have you experienced this problem or is it just speculation?  It is
probably worth trying to reproduce it (if you haven't already) before
spending much time trying to fix it.

It's rather based on speculation.

So currently we consider that the crawler which is a discovery
script and other discovery scripts like http-enum must run in the
same dependency level.


For what it is worth, I had been assuming that the crawler would be a
library.  A script which needs spidering services would activate the
library and tell it what information is needed.  The spider library
would store (probably up to some limit) results so that it may not
have to make as many (or even any) requests when the next script asks
for similar information.

I agree, and perhaps we'll also have a special full capable crawler script.

6) Improve HTTP fingerprints and http-enum:
-------------------------------------------


This one seems pretty independent from some of your other suggestions.
So, if this is desired, at least it could be implemented at any time.
I do agree with you that it is often best to combine many similar http
tasks in one script and that there is room to enhance http-enum to do
a lot of that.

I do think we should try to avoid bloating things such that users need
to specify extra arguments to effectively use scripts.  At least
important/common scripts like http-enum stuff.  Required options are
more reasonable for obscure/special-purpose scripts.

* http-brute: the design of this script can be improved a lot.
  If the crawler and http-enum script are running, then a dynamically
  registred match table by the http-brute script that checks the returned
  status code and the 'www-authenticate' header field, will be used by the
  http-enum script, to discover multiple protected paths, which can be saved
  in the registry by the match misc handler, and later the http-brute script
  will try to brute force them.
  So in this situation the http-brute will depend on the http-enum script.


I agree that it would be great for http-brute to be able to use
information from enumeration/spidering scripts/libraries.  Though of
course the user should be able to use it to brute force a specific
page instead if desired.

We can make the http-brute insert fingerprints or matches dynamically,
which will be processed by http-enum, a match handler will save the paths in
the registry for later use, without changing the current behaviour when a
user specifies the path.

* http-auth: we have already said that this can be converted into a general
  match in the http-matchers.lua file. The downside of this is that we will
  remove this script. If we don't want to remove the script we can modify it
  to make it register that match dynamically.


Well, a key feature of that script is that it runs by default and
includes a piece of information which is quickly and easily determined
(whether authentication is required at the root of the given web
server).  So we wouldn't want to remove this script until we have a
way to replicate that behavior, I think.  So the combined script would
have to run by default, I guess.

Perhaps we can reproduce the adding targets feature for this specific
purpose. As I've said before, scripts should be able to register
fingerprints and matches dynamically, so perhaps we can add:

httpenum.lua library:

-- A global variable set to true to activate the behavior
-- e.g. when httpenum.lua is loaded by one of the http-enum scripts,
--      this should be automatic for other scripts, without specifying
--      script arguments.
httpenum.NEW_FINGERPRINTS


-- This function will check the NEW_FINGERPRINTS variable
-- before inserting new fingerprints.
httpenum.add_fingerprints(my_fingerprint,...)


Script rules can call this function to insert new fingerprints. So with
this solution we do not remove the current http-auth behaviour, and we'll
make it more smarter.

* http-date: we can also convert this script to a simple general fingerprint
  or make the script register the fingerprint dynamically.
  fingerprint {
      categories = {'discovery', 'safe'},
      probes = {path='/', method='HEAD'},
      matches = {
          status_code = 200,
          header['date'] = "(.+)",
          output_handler = function(#header.date_1#)
            -- parse #header.date_1#
          end,


Well, besides being default, http-date offers some nice features such
as telling the user how much the remote time differs from local time.
And we don't win much from eliminating this script since it is only 44
lines long (including documentation and empty lines).

Ok.

I guess deciding when it is better to split or combine scripts is a
very tough decision.  We faced that last week with Gorjan's
ip-geolocation script.  At first he combined several geolocation
providers into one script, but later split it into five scripts.
Which is better?  I don't know.  Each approach has advantages and
drawbacks.  I guess a key is to identify the general factors we should
use when deciding whether to split or combine scripts.  Because if we
have some folks busily combining scripts while others are busy
splitting them up, we don't make much progress.

A standard which can help us to make the right decision should be added
to the Nmap NSE doc.

Thanks.

-- 
tixxdz
http://opendz.org
_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/

Current thread:

[RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 14)
- Re: [RFC] Improve NSE HTTP architecture. Patrik Karlsson (Jun 15)
  - Re: [RFC] Improve NSE HTTP architecture. Ron (Jun 16)
    - Re: [RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 18)
  - Re: [RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 18)
- Re: [RFC] Improve NSE HTTP architecture. Fyodor (Jun 16)
  - Re: [RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 19)
    - Re: [RFC] Improve NSE HTTP architecture. Patrick Donnelly (Jun 20)
    - Re: [RFC] Improve NSE HTTP architecture. Djalal Harouni (Jun 20)