Nmap Development mailing list archives
[NSE] Sketch for XML/HTML parsing API
From: Lauri Kokkonen <lauri.u.kokkonen () gmail com>
Date: Thu, 19 Jan 2012 12:08:53 +0200
Hi, First off, I am a student inspired by the possible GSOC money opportunity :P I have come up with a sketch for XML/HTML parsing API. The idea is to have a method next() that returns the next bit of XML (start tag, attribute name, etc) from the input string. Along with next() there is state information for keeping track whether we are inside a tag or between tags (basically). Then we could build a set of useful methods around the core. For example, find_start_tag() could find the next occurrence of the given start tag and parse_attributes() could return a set of attributes given that we are currently inside a tag. If needed it should be possible to extend the interface with a SAX-style facility or even add DOM-like features such as parsing a subtree into a data structure (like it was sketched in another related thread on this list [1]). Something like the following would be useful for httpspider.lua: while x:find_start_tag({"a","img","script"}) do a = x:parse_attributes() if a["href"] then ... end if a["src"] then ... end end or maybe: while x:find_attribute({"href","src"}) do url = x:next().data ... end Following would be useful for http-generator.nse because it will work for whatever order the attributes are in: while x:find_start_tag({"meta"}) do a = x:parse_attributes() if a["generator"] then ... end end One option is to implement this completely in Lua, maybe with the help of LPeg. Another option is to use a combination of C/C++ and Lua. Is XML parsing needed elsewhere in Nmap? Looking at a few scripts that parse XML/HTML files I think that at least libraries like expat and libxml2 are an overkill for the purpose. For reference, that approach was suggested in threads [2] and [3]. Lauri [1] [NSE] XML Parser RFC http://seclists.org/nmap-dev/2011/q2/1281 http://seclists.org/nmap-dev/2011/q3/25 [2] Add XML support to NSE http://seclists.org/nmap-dev/2009/q3/1093 [3] [NSE script] web application fingerprinting http://seclists.org/nmap-dev/2008/q3/462 _______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://seclists.org/nmap-dev/
Current thread:
- [NSE] Sketch for XML/HTML parsing API Lauri Kokkonen (Jan 19)
- Re: [NSE] Sketch for XML/HTML parsing API David Fifield (Feb 01)
- Re: [NSE] Sketch for XML/HTML parsing API Lauri Kokkonen (Feb 06)
- Re: [NSE] Sketch for XML/HTML parsing API David Fifield (Feb 01)