nanog mailing list archives

Re: OS, Hardware, Network - Logging, Monitoring, and Alerting


From: Phil Regnauld <regnauld () catpipe net>
Date: Thu, 26 Jun 2008 11:31:54 +0200

Rev. Jeffrey Paul (sneak) writes:

1) Is SNMP the best way to do this?  Obviously some of the data (service
checks) will need to be collected other ways.

        SNMP, the vendor MIBs + SNMP extensions for monitoring hardware specifics
        (PSU, etc...), and something like Nagios to do the TCP/network checks.

2) Is there any good solution that does both logging/trending of this
data and also notification/monitoring/alerting?  I've used both Nagios
and Cacti in the past, and, due to the number of individual things being
monitored (3-5 items per OS instance, 5-10 items per physical server,
10-50 things per network device), setting them both up independently
seems like a huge pain.  Also, I've never really liked Nagios that much.

        Well, you could look at Zabbix, Hyperic, ZenOSS, OpenNMS and see if
        they cut it better for you, but the trick with Nagios is to use
        a DB and generate the include files automatically, then have some
        other more user friendly tools to populate the DB.  Or use templates
        extensively.
        
        Then make sure your plugins output performance data for perf.data
        monitoring, and use something like NagiosGraph
        http://nagiosgraph.wiki.sourceforge.net/ or PNP4Nagios:

        http://www.pnp4nagios.org/pnp/about#system_requirements
        http://nagiosplug.sourceforge.net/developer-guidelines.html#AEN203
        http://www.pnp4nagios.org/pnp/screenshots


I recently entertained the idea of writing a CGI that output all of this
information in a standard format (csv?), distributing and installing it, then
collecting it periodically at a central location and doing all the
rrd/notification myself, but then realized that this problem must've
been solved a million times already.

        Yes :)  But check out the above links, and with a bit of planning
        and a small amount of coding/adapting existing components, it will
        work out.

There's got to be a better way.  What do you guys use?

        We rewrote our own NMS from scratch :)

(I'm not opposed to non-free solutions, provided they work better.)

        We sell our solution, so I'm biased, but do check out the Nagios
        route, it works well enough for small to medium, and larger installations
        with careful planning (problem with Nagios is how to make it perform
        with thousands of hosts).

        Hth,
        Phil



Current thread: