nanog mailing list archives
Re: FYI Netflix is down
From: Brett Frankenberger <rbf+nanog () panix com>
Date: Mon, 2 Jul 2012 15:32:17 -0500
On Mon, Jul 02, 2012 at 09:09:09AM -0700, Leo Bicknell wrote:
In a message written on Mon, Jul 02, 2012 at 11:30:06AM -0400, Todd Underwood wrote:from the perspective of people watching B-rate movies: this was a failure to implement and test a reliable system for streaming those movies in the face of a power outage at one facility.I want to emphasize _and test_. Work on an infrastructure which is redundant and designed to provide "100% uptime" (which is impossible, but that's another story) means that there should be confidence in a failure being automatically worked around, detected, and reported. I used to work with a guy who had a simple test for these things, and if I was a VP at Amazon, Netflix, or any other large company I would do the same. About once a month he would walk out on the floor of the data center and break something. Pull out an ethernet. Unplug a server. Flip a breaker.
Sounds like something a VP would do. And, actually, it's an important step: make sure the easy failures are covered. But it's really a very small part of resilience. What happens when one instance of a shared service starts performing slowly? What happens when one instance of a redundant database starts timing out queries or returning empty result sets? What happens when the Ethernet interface starts dropping 10% of the packets across it? When happens when the Ethernet switch linecard locks up and stops passing dataplane traffic, but link (physical layer) and/or control plane traffic flows just fine? What happens when the server kernel panics due to bad memeory, reboots, gets all the way up, runs for 30 seconds, kernel panics, lather, rinse, repeat. Reliability is hard. And if you stop looking once you get to the point where you can safely toggle the power switch without causing an impact, you're only a very small part of the way there. -- Brett
Current thread:
- Re: FYI Netflix is down, (continued)
- Re: FYI Netflix is down Kyle Creyts (Jul 04)
- Re: FYI Netflix is down Randy Bush (Jul 04)
- Re: FYI Netflix is down George Herbert (Jul 02)
- Re: FYI Netflix is down Jon Lewis (Jul 03)
- Re: FYI Netflix is down AP NANOG (Jul 02)
- Re: FYI Netflix is down Joly MacFie (Jul 02)
- Re: FYI Netflix is down James Downs (Jul 02)
- Re: FYI Netflix is down AP NANOG (Jul 02)
- Re: FYI Netflix is down Grant Ridder (Jul 02)
- RE: FYI Netflix is down Dan Golding (Jul 02)
- Re: FYI Netflix is down Brett Frankenberger (Jul 02)
- Message not available
- Re: FYI Netflix is down Greg D. Moore (Jul 02)
- RE: FYI Netflix is down Dan Golding (Jul 02)
- Re: FYI Netflix is down George Herbert (Jul 02)
- Message not available
- Re: FYI Netflix is down Greg D. Moore (Jul 02)
- Re: FYI Netflix is down Steven Bellovin (Jul 02)
- Re: FYI Netflix is down Jay Ashworth (Jul 03)
- Re: FYI Netflix is down George Herbert (Jul 03)
- Re: FYI Netflix is down Jon Lewis (Jul 03)