nanog mailing list archives

Re: FYI Netflix is down


From: Rodrick Brown <rodrick.brown () gmail com>
Date: Mon, 2 Jul 2012 22:19:18 -0400


On Jul 2, 2012, at 7:03 PM, James Downs <egon () egon cc> wrote:


On Jul 2, 2012, at 1:20 PM, david raistrick wrote:

Amazon resources are controlled (from a consumer viewpoint) by API - that API is also used by amazon's internal 
toolkits that support ELB (and RDS..).   Those (http accessed) API interfaces were unavailable for a good portion of 
the outages.

Right, and other toolkits like boto. Each AZ has a different endpoint (url), and as I have no resources running in 
East, I saw no problems with the API endpoints I use. So, as you note, US-EAST Region was "not controllable".

I know nothing of the netflix side of it - but that's what -we- saw. (and that caused all us-east RDS instances in 
every AZ to appear 


And, if you lose US-EAST, you need to run *somewhere*. Netflix did not cutover www.netflix.com to another Region. Why 
not is another question.

At which point are you guys going to realize that no matter how much resiliency, redundancy and fault tolerance you 
plan into an infrastructure there are always the unforeseen that just doesn't make any sense to plan for. 

Four major decision factors are cost, complexity, time and failure rate. At some point a business need to focus on its 
core business. IT like any other business resource has to be managed efficiently and its sole purpose is for the 
enablement of said business nothing more. 

Some of the post here are highly laughable and so unrealistic. 

People are acting as if Netflix is part of some critical service they stream movies for Christ sake.  Some acceptable 
level of loss is fine for 99.99% of Netflix's user base just like cable, electricity and running water I suffer a few 
hours of losses each year from those services it suck yes, is it the end of the world no.. 

This horse is dead! 




Current thread: