Nmap Development mailing list archives

[NSE] Sketch for XML/HTML parsing API


From: Lauri Kokkonen <lauri.u.kokkonen () gmail com>
Date: Thu, 19 Jan 2012 12:08:53 +0200

Hi,

First off, I am a student inspired by the possible GSOC money opportunity :P

I have come up with a sketch for XML/HTML parsing API. The idea is to have a
method next() that returns the next bit of XML (start tag, attribute name,
etc) from the input string. Along with next() there is state information for
keeping track whether we are inside a tag or between tags (basically).

Then we could build a set of useful methods around the core. For example,
find_start_tag() could find the next occurrence of the given start tag and
parse_attributes() could return a set of attributes given that we are
currently inside a tag. If needed it should be possible to extend the
interface with a SAX-style facility or even add DOM-like features such as
parsing a subtree into a data structure (like it was sketched in another
related thread on this list [1]).

Something like the following would be useful for httpspider.lua:

  while x:find_start_tag({"a","img","script"}) do
    a = x:parse_attributes()
    if a["href"] then ... end
    if a["src"] then ... end
  end

or maybe:

  while x:find_attribute({"href","src"}) do
    url = x:next().data
    ...
  end

Following would be useful for http-generator.nse because it will work for
whatever order the attributes are in:

  while x:find_start_tag({"meta"}) do
    a = x:parse_attributes()
    if a["generator"] then ... end
  end

One option is to implement this completely in Lua, maybe with the help of
LPeg. Another option is to use a combination of C/C++ and Lua. Is XML
parsing needed elsewhere in Nmap? Looking at a few scripts that parse
XML/HTML files I think that at least libraries like expat and libxml2 are an
overkill for the purpose. For reference, that approach was suggested in
threads [2] and [3].

Lauri


[1] [NSE] XML Parser RFC
    http://seclists.org/nmap-dev/2011/q2/1281
    http://seclists.org/nmap-dev/2011/q3/25
[2] Add XML support to NSE
    http://seclists.org/nmap-dev/2009/q3/1093
[3] [NSE script] web application fingerprinting
    http://seclists.org/nmap-dev/2008/q3/462
_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/


Current thread: