nanog mailing list archives

Re: DNS attacks evolve


From: Florian Weimer <fw () deneb enyo de>
Date: Sun, 10 Aug 2008 10:58:09 +0200

* Joe Greco:

I am very, very, very disheartened to be shown to be wrong.  As if 8 days
wasn't bad enough, a concentrated attack has been shown to be effective in
10 hours.  See http://www.nytimes.com/2008/08/09/technology/09flaw.html

Note that the actual bandwidth utilization on that GE link should be
somewhere between 10% and 20% if you send minimally sized replies during
spoofing.  In fact, the theoretically predicted time for 50% success
probability for 100mbps attacks is below one day.

This also matches the numbers posted here:

<http://tservice.net.ru/~s0mbre/blog/devel/networking/dns/2008_08_08.html>

1) Use of multiple IP addresses for queries (reduce success rate somewhat)

You must implement this carefully.  Just using a load-balanced DNS setup
doesn't work, for instance.  The attacker could trigger the cache misses
through a CNAME he controls, so he'd know which instance to attack in
each round.

2) Rate-limiting of query traffic, since I really doubt many sites actually
   have recursers that need to be able to spike to many times their normal
   traffic,

The problem with that is that 130,000 queries over a 10 hour period (as
in Evgeniy's experiment) are often lost in the noise.  Only if the
authoritative servers are RTT-wise close to your recursor, the attacker
benefits from high query rates.

3) Forwarding of failed queries (which I believe BIND doesn't currently
   allow) to a "backup" server (which would seem to be interesting in
   combination with 2)

I don't think any queries fail in this scenario.

4) I wonder if it wouldn't make sense to change the advice for large-scale
   recursers to run multiple instances of BIND, internally distribute the
   requests (random pf/ipfw load balancing) to present a version of 1) that 
   would render smaller segments of the user base vulnerable in the event of
   success.  It would represent more memory, more CPU, and more requests,
   but a smaller victory for attackers.

User-specific DNS caches are interesting from a privacy perspective,
too.  But I don't think they'll work, except when the cache is in the
CPE.

5) Modify BIND to report mismatch QID's.  Not a log report per hit, but some
   reasonable strategy.  Make the default installation instructions include
   a script to scan for these - often - and mail hostmaster.

Yes, better monitoring is crucial.  Recent BIND 9.5 has a counter for
mismatched replies, which should provide at least one indicator.  Due to
the diversity of potential attacks, it's very difficult to set up
generic monitoring.

6) Have someone explain to me the reasoning behind allowing the corruption
   of in-cache data, even if the data would otherwise be in-baliwick.  I'm 
   not sure I quite get why this has to be.  It would seem to me to be safer
   to discard the data.  (Does not eliminate the problem, but would seem to
   me to reduce it)

The idea is that the delegated zone can introduce additional servers not
listed in the delegated zone.  (It's one thing that gets you a bit of
IPv6 traffic.)  Unfortunately, it's likely that performance would suffer
for some sites if resolver 

7) Have someone explain to me the repeated claims I've seen that djbdns and
   Nominum's server are not vulnerable to this, and why that is.

For DJBDNS, see: <http://article.gmane.org/gmane.network.djbdns/13371>

Nominum has published a few bits about their secret sauce:

  <http://nominum.com/news_events/security_vulnerability_update.php>

TCP fallback on detected attack attempts is expected to be sufficiently
effective so that you can get away with a smaller source port pool.
Even if it's not, on some platforms, a smallish pool is the only way to
cope with the existing load until you can bring in more servers, so it's
better than nothing.

The TCP fallback idea was posted to namedroppers in 2006, in response to
one of Bert's early drafts which evolved into the forgery resilience
document, so it should not be encumbered.  The heuristics when to
trigger the attack could be, though.


Current thread: