nanog mailing list archives

Re: United Airlines is Down (!) due to network connectivity problems


From: "Patrick W. Gilmore" <patrick () ianai net>
Date: Wed, 8 Jul 2015 15:31:06 -0400

I’m with Ferg-dog.

I can’t tell you the number of times someone (yes, including me) has designed, purchased, and installed a system with 
multiple backups, failovers, redundancies, etc., and some vital piece fails in a weird way which sends the whole thing 
into a tailspin.

Taking UA as an example, since we have the most information (FSVO “most”), namely it was a “bad router”. Let’s assume 
they had multiple routers configured with VRRP, BGP, OSPF, and an alphabet soup of other ways to detect and 
route-around failures. Now further assume one of those routers has a software or hardware bug which doesn’t take the 
router out of service, but leaves it up, replying to pings, answer SNMP polls, speaking BGP or OSPF, sending VRRP 
hellos, etc., etc. - but also eats half of all packets going _through_ the router. That can happen, I’ve seen it first 
hand.

All those redundant systems do nothing, since the “bad router” is doing everything a good router would do. The systems 
designed to catch such problems all think things are fine, but they are not. Is it an attack? No, it’s bad luck.

Now some will claim - and perhaps rightfully - that UA should have systems which monitor for exactly this type of 
failure as well. Perhaps they should have, or perhaps the problem was nothing like what I explained. Either way, the 
point still stands that a company can have had multiple redundancies in place, but still experienced a failure mode 
which caused exactly the problem described.


At this point, we move on to: “All three simultaneously?!? NO WAY!!” To which I would point out they were not 
simultaneous. UA was back up before NYSE went down. But even if they were simultaneous, sometimes stuff happens. The 
human mind is very good at seeing connections, even when there are none. Absent other evidence, I’m going to believe 
the companies’ public statements that this was not a hack. Perhaps I am being naive, but as I said, absent other 
evidence, it is a perfectly plausible explanation.

-- 
TTFN,
patrick


On Jul 08, 2015, at 14:56 , Jay Ashworth <jra () baylink com> wrote:

UA, WSJ /and/ NYSE all in the same day?

Once is an accident;  twice is a coincidence...

Three times is enemy action.

On July 8, 2015 1:18:47 PM EDT, Paul Ferguson <fergdawgster () mykolab com> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Given that the Internet is held together with paper clips, bailing
twine, and bubblegum, I'd prefer to take theses organizations' initial
word for the fact that there is nothing obviously malicious in these
outages.

The mainstream press, on the other hand, seems to want it to be a hack
or data breach or... something other than a "glitch". :-)

- - ferg


On 7/8/2015 10:15 AM, Mel Beckman wrote:

It's important to not form an opinion too early, especially anyone 
involved with forensic analysis of these systems. This is a
classic fault in amateur investigation: an early opinion will lead
you into confirmation bias, irrationally accepting data agreeing
with your opinions and rejecting that disproving it.

-mel beckman

On Jul 8, 2015, at 10:07 AM, Paul Ferguson 
<fergdawgster () mykolab com> wrote:

NYSE: "The issue we are experiencing is an internal technical issue
and is not the result of a cyber breach."

https://twitter.com/NYSE/status/618818929906085888

United Air statement CNBC: “An issue with a router degraded network
connectivity for various applications. We fixed the router."

https://twitter.com/barronstechblog/status/618816643821633536

- ferg



- -- 
Paul Ferguson
PGP Public Key ID: 0x54DC85B2
Key fingerprint: 19EC 2945 FEE8 D6C8 58A1 CE53 2896 AC75 54DC 85B2
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iF4EAREIAAYFAlWdW3cACgkQKJasdVTchbLr/wD/aBNnLFv+MU+QI1ja7dd9LiSN
Zkum4lSIutxFn1NmaYoBAIgO/Ig7FxD4vRzQK8bUturn4YGw9FXMT+EzVTKhIbVG
=/yYp
-----END PGP SIGNATURE-----

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.


Current thread: