nanog mailing list archives
Re: Operate until failure
From: Shawn McMahon <smcmahon () eiv com>
Date: Mon, 8 Jan 2001 10:11:39 -0500
On Mon, Jan 08, 2001 at 08:49:17AM -0600, Eric Whitehill wrote:
We've had issues here with power outages and usually the UPS' will hold. The one time they didn't, we went and brought all the machines down gracefully as we didn't have the auto-shutdown installed on the systems.
We don't shut anything down with a management call, unless it's going to fail and break something in the next 15 minutes. We have a generator, but we have had two amazing coincidences cause it to fail. The first time, the generator was fine, but the switch didn't switch. The person who was signing off (erroneously) that he was checking that switch monthly lost his job shortly before we stopped using his company entirely. We discovered the problem when the batteries reached the point where it was supposed to cut over, and the entire data center went dark. That was a very, very bad day. The second time, an o-ring blew out, and we dumped so much oil on the ground, we were told that if it'd been a tiny bit more we'd have had to call the EPA. This one gave us enough warning to shut things down, but we had to hustle and a few things were triaged as "let it die, we don't have time." In general, however, we start planning for a controlled shutdown the minute we know there's a problem, and we attempt to schedule that shutdown for our scheduled weekly outage window if possible. If not, we try to make it after peak processing time for the affected components.
Attachment:
_bin
Description:
Current thread:
- Operate until failure Sean Donelan (Feb 24)
- Re: Operate until failure Nathan Stratton (Feb 24)
- Re: Operate until failure Eric Whitehill (Feb 24)
- Re: Operate until failure Shawn McMahon (Feb 24)
- Re: Operate until failure Andy Walden (Feb 24)
- Re: Operate until failure Shawn McMahon (Feb 24)
- Re: Operate until failure Shawn McMahon (Feb 24)
- Re: Operate until failure Eric Whitehill (Feb 24)
- Re: Operate until failure Nathan Stratton (Feb 24)
- Re: Operate until failure Henry Yen (Feb 24)
- <Possible follow-ups>
- Re: Operate until failure Sean Donelan (Feb 24)
- Re: Operate until failure David Lesher (Feb 24)
- Re: Operate until failure Dalvenjah FoxFire (Feb 24)
- Re: Operate until failure bmanning (Feb 24)