nanog mailing list archives

Re: recommendations for external montioring services?

From: Robert Brockway <robert () timetraveller org>
Date: Wed, 14 Dec 2011 11:10:05 +1000 (EST)

On Mon, 12 Dec 2011, Eric J Esslinger wrote:

I'm not looking to monitor a massive infrastructure: 3 web sites, 2 mailservers (pop,imap,submission port, https webmail), 4 dns servers(including lookups to ensure they're not listening but not talking), andone inbound mx. A few network points to ping to ensure connectivitythroughout my system. Scheduled notification windows (for example,during work hours I don't want my phone pinged unless it's everythinggoing offline. Off hours I do. Secondary notifications if problempersists to other users, or in the event of many triggers. That sort ofthing). Sensitivity settings (If web server 1 shows down for 5 min,that's not a big deal. Another one if it doesn't respond to repeatedqueries within 1 minute is a big deal) A Weekly summary of issues wouldbe nice. (especially the 'well it was down for a short bit but we didn'tnotify as per settings') I don't have a lot of money to throw at this. I

Hi Eric. The feature set you are describing should be in any monitoringsystem worthy of the name. I've used Nagios to good effect for the bestpart of the last 12 years or so. Before that I used Big Brother, whichsucked in various ways.

I did an evaluation on a wide variety of FOSS monitoring systems 2-3 yearsago and Nagios won at the time (again). Generally I found thealternatives had problems that I considered to be quite serious (such asbeing overly complicated or doing checks so frequently that they loadedthe systems they were supposed to be monitoring[1]).


I'm currently trialing Icinga, a fork of Nagios.

Puppet can be set up to manage Nagios/Icinga config which cuts down on theadmin overhead.

Nagios/Icinga can be hooked up to Collectd to provide performance data aswell as alert monitoring.

One concern about external monitoring services is the level of visibilitythey need to have in to your network to adequately monitor them.

My recommendation is to do a proper risk assessment on the availableoptions.

DO have detailed internal monitoring of our systems but sometimes thatis not entirely useful, due to the fact that there are a few 'singlepoints of failure' within our network/notification system, not tomention if the monitor itself goes offline it's not exactly going to beable to tell me about it. (and that happened once, right before the mailserver decided to stop receiving mail).

There are a couple of ways to deal with this. Some monitoringapplications can fail-over to a standby server if the primary fails. Butthis isn't even really necessary. You will arguably gain higherreliability by running multiple _independent_ monitors and have themmonitor each other[2]. I have often used this approach.

The principal aim here is to guarantee that you are alerted to any singlefailure (a production service, system or a monitor). Multiplesimultaneous failures could still produce a blackspot. It is possible todesign a system that will discover multiple simultaneous failures, but ittakes more effort and resources.

[1] Sometimes I wonder if the people developing certain systems have anyoperational experience at all.

[2] A system designed to fail-over on certain conditions may fail tofail-over, ah, so to speak.


Cheers,

Rob

--
Email: robert () timetraveller org              Linux counter ID #16440
IRC: Solver (OFTC & Freenode)
Web: http://www.practicalsysadmin.com
Director, Software in the Public Interest (http://spi-inc.org/)
Free & Open Source: The revolution that quietly changed the world
"One ought not to believe anything, save that which can be proven by nature and the force of reason" -- Frederick II 
(26 December 1194 – 13 December 1250)

Current thread:

recommendations for external montioring services? Eric J Esslinger (Dec 12)
- Re: recommendations for external montioring services? Edward Dore (Dec 12)
  - Re: recommendations for external montioring services? Jim Richardson (Dec 12)
- Re: recommendations for external montioring services? Ryan Rawdon (Dec 12)
- Re: recommendations for external montioring services? Derrick H. (Dec 12)
  - RE: recommendations for external montioring services? Express Web Systems (Dec 12)
    - RE: recommendations for external montioring services? Scott Berkman (Dec 12)
- Re: recommendations for external montioring services? Michiel Klaver (Dec 13)
  - Re: recommendations for external montioring services? David Miller (Dec 13)
- Re: recommendations for external montioring services? Robert Brockway (Dec 13)
  - Re: recommendations for external montioring services? Mark Gauvin (Dec 13)