Nmap Development mailing list archives
Automatic sorting of nmap-service-probes
From: David Fifield <david () bamsoftware com>
Date: Thu, 2 Feb 2012 21:33:05 -0800
This is a description of a project, the work to achieve which exceeds its benefits, but which could be fun for a hacker with some CS chops. The main idea is to identify when two different regular expressions can be matched by the same string, and use this to sort nmap-service-probes in a way that preserves the semantics of its ordering. I often want nmap-service-probes to be sorted broadly by server. For example, under the GetRequest probe, there is a big block of Apache patterns, a block of Xerox printer patterns, and so on. But many similar patterns are spread out in the file, because they just get added in the order they are submitted. Sometimes when I'm doing submissions I'll separate out a block once it's big enough, but I would really like that to happen mostly automatically (just by sorting on the p// field, for example). What prevents this from being easy is things like this: # Needs to go before the Apache match lines -Doug match http-proxy m|^HTTP/1\.[01] \d\d\d .*\r\nServer: Apache\r\n.*X-orenosp-filt:|s p/Orenosp reverse http proxy/ ... match http m|^HTTP/1\.[01] \d\d\d.*\r\nDate: .*\r\nServer: Apache\r\n| p/Apache httpd/ cpe:/a:apache:http_server/ When the same string can match two different patterns, the order matters. (Nmap takes the first match.) Here, this HTTP proxy claims to be Apache, but has a distinctive header field. If the order were reversed, the later, very generic Apache match would prevent the HTTP proxy from ever matching. When there is no string that can match two different patterns, they can me moved around without restrictions. So the main goal is to write an algorithm that decides whether there is a string that can match both of two given regular expressions. Then we'll run it against every pair of matchlines within a service probe; the output of the algorithm will be a list of pairs of lines that interfere with each other. Any regular expression is equivalent to an NFA. A pair of regular expressions is also equivalent to an NFA, one with two disjoint sets of states: the accept states of the joint NFA are those that represent a set of states that contain an accept state from both of the sub-NFAs. Any NFA is in turn equivalent to a DFA, which you can look at as a directed graph; then finding a string that satisfies both regular expressions is the same as finding a path through the graph from the start state to any accept state. If no such path exists, then the regular expresions cannot be matched by the same string. Finding a path could take exponential time in the worst case because an equivalent DFA may be exponentially large in the size of an NFA. But I'll bet that doesn't happen with the typical regular expressions in our database. You don't have to explicitly build the full DFA, only do it incrementally as you search for a path through the graph. You should read this good article: http://swtch.com/~rsc/regexp/regexp1.html Maybe there will be too many conflicts to be useful. It might not be obvious that the following two patterns can be matched by a single string: match http m|^HTTP/1\.0 200 OK\r\n.*Server: Apache\r\n|s match http m|^HTTP/1\.0 200 OK\r\n.*Server: IIS\r\n|s I would not have qualms about reordering those, because it's unlikely that a server responds with "Server: Apache\r\nServer: IIS\r\n". But if such cases are few enough, perhaps we can deal with them manually. David Fifield _______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://seclists.org/nmap-dev/
Current thread:
- Automatic sorting of nmap-service-probes David Fifield (Feb 02)