nanog mailing list archives
Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey
From: William Herrin <bill () herrin us>
Date: Thu, 8 Jul 2021 14:01:12 -0700
On Thu, Jul 8, 2021 at 5:31 AM Saku Ytti <saku () ytti fi> wrote:
Network experiences gray failures all the time, and I almost never care, unless a customer does.
Greetings, I would suggest that your customer does care, but as there is no simple test to demonstrate gray failures, your customer rarely makes it past first tier support to bring the issue to your attention and gives up trying. Indeed, name the networks with the worst reputations around here and many of them have those reputations because of a routine, uncorrected state of gray failure. To answer Laurent 's question: Yes, gray failures are a regular problem. Yes, most of us care. And for the most part we don't have particularly good ways to detect and isolate the problems, let alone fix them. When it's not a clean failure we really are driven by: customer says blank is broken, often followed by grueling manual effort just to duplicate the problem within our view. Can network researchers do anything about it? Maybe. Because of the end to end principle, only the endpoints understand the state of the connection and they don't know the difference capacity and error. They mostly process that information locally sharing only limited information with the other endpoint. Which means there's not much passing over the wire for the middle to examine and learn that there's a problem... and when there is it often takes correlating multiple packets to understand that a problem exists which, in the stateless middle with asymmetric routing, is not usable. The middle can only look at its immediate link stats which, when there's a bug, are misleading. What would you change to dig us out of this hole? Regards, Bill Herrin -- William Herrin bill () herrin us https://bill.herrin.us/
Current thread:
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey, (continued)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Jörg Kost (Jul 08)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Vanbever Laurent (Jul 08)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Baldur Norddahl (Jul 08)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Chriztoffer Hansen (Jul 09)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Saku Ytti (Jul 08)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Vanbever Laurent (Jul 08)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Saku Ytti (Jul 08)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Saku Ytti (Jul 08)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Warren Kumari (Jul 09)
- Re: Do you care about "gray" failures? Can we (network academics) help? A 10-min survey Yang Yu (Jul 09)