nanog mailing list archives

Re: Random Early Detect and streaming video


From: Graham Johnston <johnston.grahamj () gmail com>
Date: Tue, 8 Nov 2022 16:14:38 -0600

Sorry, everyone, my initial reply was only to Saku so I'm replying again
for visibility to the list.

On Tue, 8 Nov 2022 at 02:57, Saku Ytti <saku () ytti fi> wrote:

Hey,


On Mon, 7 Nov 2022 at 21:58, Graham Johnston <johnston.grahamj () gmail com>
wrote:


I've been involved in service provider networks, small retail ISPs, for
20+ years now. Largely though, we've never needed complex QoS, as at
$OLD_DAY_JOB, we had been consistently positioned to avoid regular link
congestion by having  sufficient capacity. In the few instances when we've
had link congestion, egress priority queuing met our needs.

What does 'egress priority queueing' mean? Do you mean 'send all X,
before any Y, send all Y before any Z'? If this, then this must have
been quite some time now, as since traffic managers were implemented
in hardware ages ago, this hasn't been available. And the only thing
that has been available has been 'X has guaranteed rate X1, Y has Y1
and Z has Z1' and love it or hate it, that's the QoS tool industry has
decided you need.


Yeah, I'm sure I didn't use all of the features, we did have to set a
bandwidth-share value and possibly a bit more. I guess as I look at my past
it was more a case of not having to perform any rate limiting on the parts
of the network that I'm thinking about, and long term familiarity with that
platform as compared to the new environment which I'm less familiar with,
and is a different platform, Juniper to be specific.



combine that with the buffering and we should adjust the drop profile to
kick in at a higher percentage. Today we use 70% to start triggering the
drop behavior, but my head tells me it should be higher. The reason I am
saying this is that we are dropping packets ahead of full link congestion,
yes that is what RED was designed to do, but I surmise that we are making
this application worse than is actually intended.

I wager almost no one knows what their RED curve is, and different
vendors have different default curves which is then the curve almost
everyone uses. Some use a RED curve such that everything is basically
tail drop (Juniper, 0% drop at 96% fill and 100% drop at 98% fill).
Some are linear. Some allow defining just two points, some allow
defining 64 points. And almost no one has any idea what their curve
is, i.e. mostly it doesn't matter. If it usually mattered, we'd all
know what the curve is and why. As practical example Juniper has
basically


Overall, with my current concern being drops before they seem to be
necessary, combined with you comments about Juniper which I take to be the
behavior of default drop profile, I feel more confident that our current
drop profile behavior is just more aggressive than it needs to be.



In your case, I assume you have at least two points with 0% drop at
69% fill, then a linear curve from 70% to 100% fill with 1% to 100%
drop. It doesn't seem outright wrong to me. You have 2-3 goals here,
to avoid synchronising TCP flows so that you have steady fill, instead
of wave-like behaviour and to reduce queueing delay for packets not
dropped, which would experience as long a delay as there is queue if
tail dropped. You could have a 3rd possible goal, if you map more than
1 class of packets into the same queue you can still give them
different curves, so during congestion in a single queue can show two
different behaviours depending on packet.
So what is the problem you're trying to fix? Can you measure it?


As mentioned above, my problem/supposition is that we drop too much before
it's necessary and impact the customer experience in a way that isn't
needed. While I can't directly measure the customer experience, I can
measure drop rate versus bandwidth. If my supposition is correct, that a
drop profile that drops later (at a higher utilization rate), we'd see less
dropped packets, and possibly a higher utilization rate. While this whole
configuration policy is in place to reduce utilization, we operate these
links with a hard cap, thus I'd like to use as much of it as possible. What
may have changed is that in the past these links were functionally operated
at their capacity, rather than right now where we are slightly below
capacity.



I suspect in a modern high speed network with massive amounts of flows
the wave-like synchronisation is not a problem. If you can't measure
it or If your only goal is to reduce queueing delay because you have
'strategic' congestion, perhaps instead of worrying about RED, use
tail only and reduce queue size to something that is tolerable 1ms-5ms
max?


On many levels, it does seem like what I want is tail drop rather than RED.


--
  ++ytti


Thanks for your response, Saku. I also am a user of Oxidized, thanks for
that as well.

Current thread: