nanog mailing list archives

Re: United Airlines is Down (!) due to network connectivity problems


From: Sean <sean76 () gmail com>
Date: Wed, 8 Jul 2015 16:41:42 -0500

I've been in UA's datacenter and while I'm no expert on their setup I can
say with some confidence that it's most likely NOT related to anything else
going on.  I don't want to violate any NDA I may or may not have signed but
I think I can safely say its all one big private network.  Whatever's
happening on the internet or with NYSE has got nothing to do with what is
more than likely in this case a big fat BGP clusterfudge like most of these
things are.  I don't have any inside info or anything just a slightly more
educated guess.

On Wed, Jul 8, 2015 at 2:31 PM, Patrick W. Gilmore <patrick () ianai net>
wrote:

I’m with Ferg-dog.

I can’t tell you the number of times someone (yes, including me) has
designed, purchased, and installed a system with multiple backups,
failovers, redundancies, etc., and some vital piece fails in a weird way
which sends the whole thing into a tailspin.

Taking UA as an example, since we have the most information (FSVO “most”),
namely it was a “bad router”. Let’s assume they had multiple routers
configured with VRRP, BGP, OSPF, and an alphabet soup of other ways to
detect and route-around failures. Now further assume one of those routers
has a software or hardware bug which doesn’t take the router out of
service, but leaves it up, replying to pings, answer SNMP polls, speaking
BGP or OSPF, sending VRRP hellos, etc., etc. - but also eats half of all
packets going _through_ the router. That can happen, I’ve seen it first
hand.

All those redundant systems do nothing, since the “bad router” is doing
everything a good router would do. The systems designed to catch such
problems all think things are fine, but they are not. Is it an attack? No,
it’s bad luck.

Now some will claim - and perhaps rightfully - that UA should have systems
which monitor for exactly this type of failure as well. Perhaps they should
have, or perhaps the problem was nothing like what I explained. Either way,
the point still stands that a company can have had multiple redundancies in
place, but still experienced a failure mode which caused exactly the
problem described.


At this point, we move on to: “All three simultaneously?!? NO WAY!!” To
which I would point out they were not simultaneous. UA was back up before
NYSE went down. But even if they were simultaneous, sometimes stuff
happens. The human mind is very good at seeing connections, even when there
are none. Absent other evidence, I’m going to believe the companies’ public
statements that this was not a hack. Perhaps I am being naive, but as I
said, absent other evidence, it is a perfectly plausible explanation.

--
TTFN,
patrick


On Jul 08, 2015, at 14:56 , Jay Ashworth <jra () baylink com> wrote:

UA, WSJ /and/ NYSE all in the same day?

Once is an accident;  twice is a coincidence...

Three times is enemy action.

On July 8, 2015 1:18:47 PM EDT, Paul Ferguson <fergdawgster () mykolab com>
wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Given that the Internet is held together with paper clips, bailing
twine, and bubblegum, I'd prefer to take theses organizations' initial
word for the fact that there is nothing obviously malicious in these
outages.

The mainstream press, on the other hand, seems to want it to be a hack
or data breach or... something other than a "glitch". :-)

- - ferg


On 7/8/2015 10:15 AM, Mel Beckman wrote:

It's important to not form an opinion too early, especially anyone
involved with forensic analysis of these systems. This is a
classic fault in amateur investigation: an early opinion will lead
you into confirmation bias, irrationally accepting data agreeing
with your opinions and rejecting that disproving it.

-mel beckman

On Jul 8, 2015, at 10:07 AM, Paul Ferguson
<fergdawgster () mykolab com> wrote:

NYSE: "The issue we are experiencing is an internal technical issue
and is not the result of a cyber breach."

https://twitter.com/NYSE/status/618818929906085888

United Air statement CNBC: “An issue with a router degraded network
connectivity for various applications. We fixed the router."

https://twitter.com/barronstechblog/status/618816643821633536

- ferg



- --
Paul Ferguson
PGP Public Key ID: 0x54DC85B2
Key fingerprint: 19EC 2945 FEE8 D6C8 58A1 CE53 2896 AC75 54DC 85B2
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iF4EAREIAAYFAlWdW3cACgkQKJasdVTchbLr/wD/aBNnLFv+MU+QI1ja7dd9LiSN
Zkum4lSIutxFn1NmaYoBAIgO/Ig7FxD4vRzQK8bUturn4YGw9FXMT+EzVTKhIbVG
=/yYp
-----END PGP SIGNATURE-----

--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.




Current thread: