nanog mailing list archives
Re: Quick question.
From: "Robert E. Seastrom" <rs () seastrom com>
Date: 01 Aug 2004 18:37:36 -0400
"Michel Py" <michel () arneill-py sacramento ca us> writes:
The dead processor still has to be replaced, but this is scheduled maintenance, not outage. A little extra ammo when you have to hunt five or six nines.
MTTR on a single box is irrelevant when you are off playing Ponce de Leon, hunting the Fountain of Five or Six Nines. Even when your architecture doesn't depend on any one particular machine (or even whole big sets of machines) being available, you don't get to "five or six nines"... just ask Google, Akamai, or Microsoft - there are other things beyond your control that spoil the picnic first. As has been observed time and time again, the tried and true way to make five or six nines of reliability in a system of more than trivial complexity is to take a lesson from the telcos (the progenitors of the "five nines" lie) and build a framework and evaluation methodology that excludes broad classes of unavailability-causing events or prorates them in such a way as to make them non-reportable. Add to that list incrementally, until the remaining time listed shows your target number of nines of reliability. Presto, five nines. ---Rob
Current thread:
- Re: Quick question., (continued)
- Re: Quick question. Alexei Roudnev (Aug 03)
- Re: Quick question. Alexei Roudnev (Aug 03)
- RE: Quick question. Michel Py (Aug 01)
- Re: Quick question. Colm MacCarthaigh (Aug 01)
- Re: Quick question. Alexei Roudnev (Aug 03)
- RE: Quick question. Paul Jakma (Aug 01)
- Re: Quick question. John Underhill (Aug 01)
- Re: Quick question. Paul Jakma (Aug 01)
- Re: Quick question. Alexei Roudnev (Aug 03)
- Re: Quick question. Paul Jakma (Aug 03)
- Re: Quick question. Colm MacCarthaigh (Aug 01)
- Re: Quick question. Robert E. Seastrom (Aug 01)