Nmap Development mailing list archives
[RFC] Mass rDNS performance tweak
From: jah <jah () zadkiel plus com>
Date: Wed, 14 Jan 2009 09:40:13 +0000
Hi folks, I've found that when performing reverse DNS resolution using nmap's async resolver and my ISP's DNS servers, the accuracy of the results is about 65-75% depending on the number of DNS servers being used and that somewhere between 2 and 3 requests are sent per target. Here's a typical stat: 250 IPs took 13.70s. Mode: Async [#: 1, OK: 171, NX: 0, DR: 79, TO: 363, SF: 0, TR: 534, CN: 0] Here you can see I'm querying a single (#: 1) server, we got OK: 171 PTR record responses, NX: 0 No such name responses, we dropped DR: 79 targets because we tried and failed 3 times to get a PTR record, TO: 363 requests timed-out (a stat not currently implemented in nmap), SF: 0 server fail responses, we transmitted a total of TR: 534 requests and got CN: 0 cname responses. The same 250 target IPs, which had been generated with: nmap -sL -iR 1100 --system-dns | grep "[0-9])" | awk '{ print $3 }' \ | sed 's/(\([^)]*\))/\1/g' | sort -n | uniq | head -n 250 > ptr_list (all having PTR records) were resolved using nmap --system-dns in about 50 seconds. So whilst the async resolving is quick, you can see that we failed to get the PTR records for 79 targets and that 534 requests were sent. The mass rdns code is quick because it maintains a certain number of outstanding requests "on the wire" for each DNS server being queried. This number dns_server_s::capacity. The capacity for each server begins at CAPACITY_MIN (currently with a value of 10) and is adjusted up and down in an attempt to keep as many requests on the wire as is reasonable without exceeding CAPACITY_MAX (200). When a response is received from a server, it's capacity is increased by CAPACITY_UP_STEP (currently 2). When a request times out, the capacity is reduced by a factor of CAPACITY_MINOR_DOWN_STEP (currently 0.9). If a request times out after all retries are exhausted at that server, the capacity is reduced by a factor of CAPACITY_MAJOR_DOWN_STEP (currently 0.7). For the DNS servers I have use of, this algorithm is too aggressive and something between me and the server drops requests - I imagine when they are sent too quickly. I've found that, firstly CAPACITY_MIN is too high, if I set this to a value of 2 I get more accurate results. Secondly, at the point at which a period has elapsed sufficient to detect that requests have timed out, enough responses may have been received to raise the capacity well beyond a reasonable level and this usually leads to further timed out requests later on. For instance, if you imagine that some of the first 10 requests sent will time out in four seconds and that every time we get a response we put 3 more on the wire (one to replace the completed one and two to step up to the increased capacity) we might have 30, 40 even 50 requests on the wire by the time those four seconds are up. As requests time out, the capacity falls and may fall all the way back to 10 quite quickly - so rather than maintaining an optimum capacity, what I see are wild fluctuations. If I set CAPACITY_MIN to 2 and CAPACITY_MAX to about 6 or 7 I get very nearly 100% accuracy and the total time for resolution isn't hugely more than it is currently. Another issue is that timed out requests aren't necessarily an indicator of the need to reduce capacity because they may have timed out for other reasons such as a non responsive nameserver. It's basically very difficult to determine the optimum capacity! Obviously, the values for CAPACITY_MIN and CAPACITY_MAX which work nicely for me may be well below the optimum for other users so I've tried adjusting the degrees by which capacity is increased and decreased and I've also tried various methods to dampen the fluctuations and to try and settle at a reasonable capacity where we get a good trade between accuracy and speed. Some of the things I've tried (in addition to experimenting with the variables for the current algorithm) are: Introduce delays between increases in capacity to allow time outs to balance the increases. Maintain a ratio of responses to timed-out requests and increase capacity only when the ratio is blow some threshold. Decouple the starting capacity from the minimum capacity so that we can start higher than minimum, but drop if necessary. And various combinations of all of these. Right now, I've had the best results by doing the following: We start with a capacity of 2 and don't increase this value until the read timeout for the first request has elapsed (4s if using one DNS server and 2.5 seconds if using more than one). Reset the timer after the first capacity increase and allow a maximum of 50 capacity increases during that period (again the read timeout). This repeats until resolution is complete. Capacity increases (a maximum of 0.1) and decreases are linked to ratio of responses to timeouts: capacity -= (float) drop_ratio capacity += 1 / (100 * MAX(drop_ratio, 0.1)) CAPACITY_MAJOR_DOWN_STEP is no longer performed because I feel that a request that does not complete is much less likely to be capacity related now that the algorithm is less aggressive. What this means is depending on the drop ratio, we'll increase capacity by a maximum of 5 during any timeslot which allows for timed-out requests to balance the capacity with decreases. This happens in small steps and the theory is that we should gradually approach the optimum capacity and then wobble fairly close to it thereafter. It may well not be perfect and I offer the attached patch so you can try it out to see how it affects resolution speed and accuracy for you. The above scan with this patch gave me: 250 IPs took 15.30s. Mode: Async [#: 1, OK: 250, NX: 0, DR: 0, TO: 10, SF: 0, TR: 260, CN: 1] I'm particularly interested to know whether the current MAX_CAPACITY of 200 is sane for anyone because with this patch, it would take quite a long time to reach that amount. Hope it works for you! jah
--- nmap_dns.cc.orig 2009-01-13 22:54:40.953125000 +0000 +++ nmap_dns.cc 2009-01-13 22:49:53.656250000 +0000 @@ -214,11 +214,10 @@ { 2500, 3000, -1, -1 }, // 3+ servers }; -#define CAPACITY_MIN 10 +#define CAPACITY_MIN 2 +#define CAPACITY_START 2 #define CAPACITY_MAX 200 -#define CAPACITY_UP_STEP 2 -#define CAPACITY_MINOR_DOWN_SCALE 0.9 -#define CAPACITY_MAJOR_DOWN_SCALE 0.7 +#define CAPACITY_UP_PER_PERIOD 50 // Each request will try to resolve on at most this many servers: #define SERVERS_TO_TRY 3 @@ -255,9 +254,15 @@ sockaddr_in addr; nsock_iod nsd; int connected; - int reqs_on_wire; - int capacity; int write_busy; + int reqs_on_wire; + int reqs_completed; + int reqs_timedout; + float drop_ratio; + float capacity; + int cpcty_up_allowed; + int cpcty_up_count; + struct timeval next_cpcty_up_time; std::list<request *> to_process; std::list<request *> in_process; }; @@ -290,7 +295,7 @@ /* The DNS cache, not just for entries from /etc/hosts. */ static std::list<host_elem *> etchosts[HASH_TABLE_SIZE]; -static int stat_actual, stat_ok, stat_nx, stat_sf, stat_trans, stat_dropped, stat_cname; +static int stat_actual, stat_ok, stat_nx, stat_sf, stat_trans, stat_dropped, stat_to, stat_cname; static struct timeval starttv; static int read_timeout_index; static u16 id_counter; @@ -318,16 +323,16 @@ memcpy(&now, nsock_gettimeofday(), sizeof(struct timeval)); if (o.debugging && (tp%SUMMARY_DELAY == 0)) - log_write(LOG_STDOUT, "mass_rdns: %.2fs %d/%d [#: %lu, OK: %d, NX: %d, DR: %d, SF: %d, TR: %d]\n", + log_write(LOG_STDOUT, "mass_rdns: %.2fs %d/%d [#: %lu, OK: %d, NX: %d, DR: %d, TO: %d, SF: %d, TR: %d]\n", TIMEVAL_MSEC_SUBTRACT(now, starttv) / 1000.0, tp, stat_actual, - (unsigned long) servs.size(), stat_ok, stat_nx, stat_dropped, stat_sf, stat_trans); + (unsigned long) servs.size(), stat_ok, stat_nx, stat_dropped, stat_to, stat_sf, stat_trans); } static void check_capacities(dns_server *tpserv) { if (tpserv->capacity < CAPACITY_MIN) tpserv->capacity = CAPACITY_MIN; if (tpserv->capacity > CAPACITY_MAX) tpserv->capacity = CAPACITY_MAX; - if (o.debugging >= TRACE_DEBUG_LEVEL) log_write(LOG_STDOUT, "CAPACITY <%s> = %d\n", tpserv->hostname, tpserv->capacity); + if (o.debugging >= TRACE_DEBUG_LEVEL) log_write(LOG_STDOUT, "CAPACITY <%s> = %.2f\n", tpserv->hostname, tpserv->capacity); } // Closes all nsis created in connect_dns_servers() @@ -467,15 +472,16 @@ if (tp > 0 && tp < min_timeout) min_timeout = tp; if (tp <= 0) { - tpserv->capacity = (int) (tpserv->capacity * CAPACITY_MINOR_DOWN_SCALE); + stat_to++; + tpserv->reqs_timedout++; + tpserv->drop_ratio = (float) tpserv->reqs_timedout / (tpserv->reqs_completed > 0 ? tpserv->reqs_completed : 1); + tpserv->capacity -= tpserv->drop_ratio; check_capacities(tpserv); tpserv->in_process.erase(reqI); tpserv->reqs_on_wire--; // If we've tried this server enough times, move to the next one if (read_timeouts[read_timeout_index][tpreq->tries] == -1) { - tpserv->capacity = (int) (tpserv->capacity * CAPACITY_MAJOR_DOWN_SCALE); - check_capacities(tpserv); servItemp = servI; servItemp++; @@ -495,8 +501,8 @@ // **** We've already tried all servers... give up if (o.debugging >= TRACE_DEBUG_LEVEL) log_write(LOG_STDOUT, "mass_rdns: *DR*OPPING <%s>\n", tpreq->targ->targetipstr()); - output_summary(); stat_dropped++; + output_summary(); total_reqs--; delete tpreq; @@ -528,6 +534,7 @@ std::list<request *>::iterator reqI; dns_server *tpserv; request *tpreq; + struct timeval now; for(servI = servs.begin(); servI != servs.end(); servI++) { tpserv = *servI; @@ -540,9 +547,23 @@ if (ia != 0 && tpreq->targ->v4host().s_addr != ia) continue; + tpserv->reqs_completed++; + tpserv->drop_ratio = (float) tpserv->reqs_timedout / tpserv->reqs_completed; + if (action == ACTION_CNAME_LIST || action == ACTION_FINISHED) { - tpserv->capacity += CAPACITY_UP_STEP; - check_capacities(tpserv); + memcpy(&now, nsock_gettimeofday(), sizeof(struct timeval)); + + if (tpserv->cpcty_up_allowed == 0 && TIMEVAL_MSEC_SUBTRACT(tpserv->next_cpcty_up_time, now) < 0) { + tpserv->cpcty_up_allowed = 1; + tpserv->cpcty_up_count = 1; + TIMEVAL_MSEC_ADD(tpserv->next_cpcty_up_time, now, read_timeouts[read_timeout_index][0]); + } + + if (tpserv->cpcty_up_allowed == 1) { + tpserv->capacity += ( 1 / (100 * MAX(tpserv->drop_ratio, 0.1)) ); + check_capacities(tpserv); + if (tpserv->cpcty_up_count++ == CAPACITY_UP_PER_PERIOD) tpserv->cpcty_up_allowed = 0; + } if (result) { tpreq->targ->setHostName(result); @@ -714,10 +735,11 @@ if (errcode == 2 && found) { if (o.debugging >= TRACE_DEBUG_LEVEL) log_write(LOG_STDOUT, "mass_rdns: SERVFAIL <id = %d>\n", packet_id); stat_sf++; + output_summary(); } else if (errcode == 3 && found) { if (o.debugging >= TRACE_DEBUG_LEVEL) log_write(LOG_STDOUT, "mass_rdns: NXDOMAIN <id = %d>\n", packet_id); - output_summary(); stat_nx++; + output_summary(); } return; @@ -768,8 +790,8 @@ if (process_result(ia.s_addr, outbuf, ACTION_FINISHED, packet_id)) { if (o.debugging >= TRACE_DEBUG_LEVEL) log_write(LOG_STDOUT, "mass_rdns: OK MATCHED <%s> to <%s>\n", inet_ntoa(ia), outbuf); - output_summary(); stat_ok++; + output_summary(); } } else if (atype == 5 && aclass == 1) { // TYPE 5 is CNAME @@ -866,9 +888,17 @@ if (o.ipoptionslen) nsi_set_ipoptions(s->nsd, o.ipoptions, o.ipoptionslen); s->reqs_on_wire = 0; - s->capacity = CAPACITY_MIN; + s->capacity = CAPACITY_START; s->write_busy = 0; - + s->reqs_timedout = 0; + s->reqs_completed = 0; + s->drop_ratio = 0; + s->cpcty_up_allowed = 0; + s->cpcty_up_count = CAPACITY_UP_PER_PERIOD; + + memcpy(&(s->next_cpcty_up_time), nsock_gettimeofday(), sizeof(struct timeval)); + TIMEVAL_MSEC_ADD(s->next_cpcty_up_time, s->next_cpcty_up_time, read_timeouts[read_timeout_index][0]); + nsock_connect_udp(dnspool, s->nsd, connect_evt_handler, NULL, (struct sockaddr *) &s->addr, sizeof(struct sockaddr), 53); nsock_read(dnspool, s->nsd, read_evt_handler, -1, NULL); s->connected = 1; @@ -1194,13 +1224,13 @@ if ((lasttrace = o.packetTrace())) nsp_settrace(dnspool, 5, o.getStartTime()); + + read_timeout_index = MIN(sizeof(read_timeouts)/sizeof(read_timeouts[0]), servs.size()) - 1; connect_dns_servers(); cname_reqs.clear(); - read_timeout_index = MIN(sizeof(read_timeouts)/sizeof(read_timeouts[0]), servs.size()) - 1; - Snprintf(spmobuf, sizeof(spmobuf), "Parallel DNS resolution of %d host%s.", num_targets, num_targets-1 ? "s" : ""); SPM = new ScanProgressMeter(spmobuf); @@ -1315,7 +1345,7 @@ gettimeofday(&starttv, NULL); - stat_actual = stat_ok = stat_nx = stat_sf = stat_trans = stat_dropped = stat_cname = 0; + stat_actual = stat_ok = stat_nx = stat_sf = stat_trans = stat_dropped = stat_to = stat_cname = 0; // mass_dns only supports IPv4. if (o.mass_dns && o.af() == AF_INET) @@ -1332,11 +1362,12 @@ // OK: Number of fully reverse resolved queries // NX: Number of confirmations of 'No such reverse domain eXists' // DR: Dropped IPs (no valid responses were received) + // TO: Number of Timed Out requests // SF: Number of IPs that got 'Server Failure's // TR: Total number of transmissions necessary. The number of domains is ideal, higher is worse - log_write(LOG_STDOUT, "DNS resolution of %d IPs took %.2fs. Mode: Async [#: %lu, OK: %d, NX: %d, DR: %d, SF: %d, TR: %d, CN: %d]\n", + log_write(LOG_STDOUT, "DNS resolution of %d IPs took %.2fs. Mode: Async [#: %lu, OK: %d, NX: %d, DR: %d, TO: %d, SF: %d, TR: %d, CN: %d]\n", stat_actual, TIMEVAL_MSEC_SUBTRACT(now, starttv) / 1000.0, - (unsigned long) servs.size(), stat_ok, stat_nx, stat_dropped, stat_sf, stat_trans, stat_cname); + (unsigned long) servs.size(), stat_ok, stat_nx, stat_dropped, stat_to, stat_sf, stat_trans, stat_cname); } else { log_write(LOG_STDOUT, "DNS resolution of %d IPs took %.2fs. Mode: System [OK: %d, ??: %d]\n", stat_actual, TIMEVAL_MSEC_SUBTRACT(now, starttv) / 1000.0,
_______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://SecLists.Org
Current thread:
- [RFC] Mass rDNS performance tweak jah (Jan 14)
- Re: [RFC] Mass rDNS performance tweak doug (Jan 14)
- Re: [RFC] Mass rDNS performance tweak David Fifield (Jan 22)
- Re: [RFC] Mass rDNS performance tweak jah (Jan 22)
- Re: [RFC] Mass rDNS performance tweak David Fifield (Jan 22)
- Re: [RFC] Mass rDNS performance tweak jah (Jan 22)