nanog mailing list archives

Re: TCP congestion control and large router buffers


From: Fred Baker <fred () cisco com>
Date: Wed, 22 Dec 2010 09:14:33 -0800


On Dec 22, 2010, at 8:48 AM, Jim Gettys wrote:
I don't know if you are referring to the "RED in a different light" paper: that was never published, though an early 
draft escaped and can be found on the net.

Precisely. 

"RED in a different light" identifies two bugs in the RED algorithm, and proposes a better algorithm that only 
depends on the link output bandwidth.  That draft still has a bug.

The (almost completed) version of the paper that never got published; Van has retrieved it from back up, and I'm 
trying to pry it out of Van's hands to get it converted to something we can read today (it's in FrameMaker).

In the meanwhile, turn on (W)RED!  For routers run by most people on this list, it's always way better than nothing, 
even if Van doesn't think classic RED will solve the home router bufferbloat problem. (where we have 2 orders of 
magnitude variation of wireless bandwidth along with highly variable workload).  That's not true in the internet core.

But yes, I agree that we'd all be much helped if manufacturers of both ends of all links had the common decency of 
introducing a WRED (with ECN marking) AQM that had 0% drop probability at 40ms and 100% drop probability at 200ms 
(and linear increase between).

so, min-threshold=40 ms and max-threshold=200 ms. That's good on low speed links; it will actually control queue 
depths to an average of O(min-threshold) at whatever value you set it to. The problem with 40 ms is that it 
interacts poorly with some applications, notably voice and video.

It also doesn't match well to published studies like 
http://www.pittsburgh.intel-research.net/~kpapagia/papers/p2pdelay-analysis.pdf. In that study, a min-threshold of 
40 ms would have cut in only on six a-few-second events in the course of a five hour sample. If 40 ms is on the 
order of magnitude of a typical RTT, it suggests that you could still have multiple retransmissions from the same 
session in the same queue.

A good photo of buffer bloat is at
      ftp://ftpeng.cisco.com/fred/RTT/Pages/4.html
      ftp://ftpeng.cisco.com/fred/RTT/Pages/5.html

The first is a trace I took overnight in a hotel I stayed in. Never mind the name of the hotel, it's not important. 
The second is the delay distribution, which is highly unusual - you expect to see delay distributions more like

      ftp://ftpeng.cisco.com/fred/RTT/Pages/8.html

Thanks, Fred!  Can I use these in the general bufferbloat talk I'm working on with attribution?  It's a far better 
example/presentation in a graphic form than I currently have for the internet core case (where I don't even have 
anything other than memory of probing the hotel's ISP's network).

Yes. Do me a favor and remove the name of the hotel. They don't need the bad press.


(which actually shows two distributions - the blue one is fairly normal, and the green one is a link that spends 
much of the day chock-a-block).

My conjecture re 5.html is that the link *never* drops, and at times has as many as nine retransmissions of the same 
packet in it. The spikes in the graph are about a TCP RTO timeout apart. That's a truly worst case. For N-1 of the N 
retransmissions, it's a waste of storage space and a waste of bandwidth.

AQM is your friend. Your buffer should be able to temporarily buffer as much as an RTT of traffic, which is to say 
that it should be large enough to ensure that if you get a big burst followed by a silent period you should be able 
to use the entire capacity of the link to ride it out. Your min-threshold should be at a value that makes your 
median queue depth relatively shallow. The numbers above are a reasonable guide, but as in all things, YMMV.

Yup. AQM is our friend.

And we need it in many places we hadn't realised we did (like our OS's).
                         - Jim




Current thread: