nanog mailing list archives

Re: MTU to CDN's


From: Mikael Abrahamsson <swmike () swm pp se>
Date: Fri, 19 Jan 2018 08:22:02 +0100 (CET)

On Thu, 18 Jan 2018, Michael Crapse wrote:

I don't mind letting the client premises routers break down 9000 byte
packets. My ISP controls end to end connectivity. 80% of people even let
our techs change settings on their computer, this would allow me to give
~5% increase in speeds, and less network congestion for end users for a one
time $60 service many people would want. It's also where the internet
should be heading... Not to beat a dead horse(re:ipv6 ) but why hasn't the
entire internet just moved to 9000(or 9600 L2) byte MTU? It was created for
the jump to gigabit... That's 4 orders of magnitude ago. The internet
backbone shouldn't be shuffling around 1500byte packets at 1tbps. That
means if you want to layer 3 that data, you need a router capable of more
than half a billion packets/s forwarding capacity. On the other hand, with
even just a 9000 byte MTU, TCP/IP overhead is reduced 6 fold, and
forwarding capacity needs just 100 or so mpps capacity. Routers that
forward at that rate are found for less than $2k.

As usual, there are 5-10 (or more) factors playing into this. Some, in random order:

1. IEEE hasn't standardised > 1500 byte ethernet packets
2. DSL/WIFI chips typically don't support > ~2300 because reasons.
3. Because 2, most SoC ethernet chips don't either
4. There is no standardised way to understand/probe the L2 MTU to your next hop (ARP/ND and probing if the value actually works)
5. PMTUD doesn't always work.
6. PLPMTUD hasn't been implemented neither in protocols nor hosts generally. 7. Some implementations have been optimized to work on packets < 2000 bytes and actually has less performance than if they have to support larger packets (they will allocate 2k buffer memory per packet), 9k is ill-fitting across 2^X values 8. Because of all above reasons, mixed-MTU LAN doesn't work, and it's going to be mixed-MTU unless you control all devices (which is typically not the case outside of the datacenter). 9. The PPS problem in hosts and routers was solved by hardware offloading to NICs and forwarding NPUs/ASICs with very high lookup speeds where PPS no longer was a big problem.

On the value to choose for "large MTU", 9000 for edge and 9180 for core is what I advocate, after non-trivial amount of looking into this. All major core routing platforms work with 9180 (with JunOS only supporting this after 2015 or something). So if we'd want to standardise on MTU that all devices should support, then it's 9180, but we'd typically use 9000 in RA to send to devices.

If we want a higher MTU to be deployable across the Internet, we need to make it incrementally deployable. Some key things to achieve that:

1. Get something like https://tools.ietf.org/html/draft-van-beijnum-multi-mtu-05 implemented. 2. Go to the IETF and get a document published that advises all protocols to support PLMTUD (RFC4821)

1 to enable mixed-MTU lans.
2 to enable large MTU hosts to actually be able to communicate when PMTUD doesn't work.

With this in place (wait ~10 years), larger MTU is now incrementally deployable which means it'll be deployable on the Internet, and IEEE might actually accept to standardise > 1500 byte packets for ethernet.

--
Mikael Abrahamsson    email: swmike () swm pp se


Current thread: