nanog mailing list archives

Re: FYI Netflix is down

From: AP NANOG <nanog () armoredpackets com>
Date: Mon, 02 Jul 2012 12:31:26 -0400

This is an excellent example of how tests "should" be ran, unfortunatelyfar too many places don't do this...


--

Thank you,

Robert Miller
http://www.armoredpackets.com

Twitter: @arch3angel

On 7/2/12 12:09 PM, Leo Bicknell wrote:

In a message written on Mon, Jul 02, 2012 at 11:30:06AM -0400, Todd Underwood wrote:

from the perspective of people watching B-rate movies:  this was a
failure to implement and test a reliable system for streaming those
movies in the face of a power outage at one facility.

I want to emphasize _and test_.

Work on an infrastructure which is redundant and designed to provide
"100% uptime" (which is impossible, but that's another story) means
that there should be confidence in a failure being automatically
worked around, detected, and reported.

I used to work with a guy who had a simple test for these things,
and if I was a VP at Amazon, Netflix, or any other large company I
would do the same.  About once a month he would walk out on the
floor of the data center and break something.  Pull out an ethernet.
Unplug a server.  Flip a breaker.

Then he would wait, to see how long before a technician came to fix
it.

If these activities were service impacting to customers the engineering
or implementation was faulty, and remediation was performed.  Assuming
they acted as designed and the customers saw no faults the team was
graded on how quickly the detected and corrected the outage.

I've seen too many companies who's "test" is planned months in advance,
and who exclude the parts they think aren't up to scratch from the test.
Then an event occurs, and they fail, and take down customers.

TL;DR If you're not confident your operation could withstand someone
walking into your data center and randomly doing something, you are
NOT redundant.

Current thread:

Re: FYI Netflix is down, (continued)
- - - Re: FYI Netflix is down Rodrick Brown (Jul 03)
    - Re: FYI Netflix is down david raistrick (Jul 03)
    - Re: FYI Netflix is down Randy Bush (Jul 03)
    - Re: FYI Netflix is down Kyle Creyts (Jul 04)
    - Re: FYI Netflix is down Randy Bush (Jul 04)
    - Re: FYI Netflix is down George Herbert (Jul 02)
    - Re: FYI Netflix is down Jon Lewis (Jul 03)
    - Re: FYI Netflix is down AP NANOG (Jul 02)
    - Re: FYI Netflix is down Joly MacFie (Jul 02)
    - Re: FYI Netflix is down James Downs (Jul 02)
    - Re: FYI Netflix is down AP NANOG (Jul 02)
    - Re: FYI Netflix is down Grant Ridder (Jul 02)
    - RE: FYI Netflix is down Dan Golding (Jul 02)
    - Re: FYI Netflix is down Brett Frankenberger (Jul 02)
  - Re: FYI Netflix is down AP NANOG (Jul 02)
- Re: FYI Netflix is down George Herbert (Jul 02)
  - Message not available
    - Re: FYI Netflix is down Greg D. Moore (Jul 02)
  - Message not available
    - RE: FYI Netflix is down Dan Golding (Jul 02)
  - Message not available
    - Re: FYI Netflix is down George Herbert (Jul 02)
    - Message not available
    - Re: FYI Netflix is down Greg D. Moore (Jul 02)
  - Message not available
    - Re: FYI Netflix is down Steven Bellovin (Jul 02)

(Thread continues...)