Over the years at work I’ve toyed with a few monitoring solutions. It all started with a few issues on the network and Bash scripts to check on them, and notify me via XMPP when there were issues, this grew into something somewhat unmanageable, and eventually I turned to Nagios. I went through and configured a few hosts and services, and wrote an XMPP plug-in (that I’m still yet to publish – and for that matter, finish!) and an SMS plug-in that worked with my VOIP Provider at the office. so now I get full notifications on XMPP and SMS for the more critical systems.
Next Management want to see some nice graphs, so I threw something up with RRD and it was OK, I looked at a few graphing solutions for Nagios, including Cacti, nagiosgrapher, and nagiosgraph, I’ve also looked at swapping out the Nagios install for something else, of which I have tried Groundwork Open Source, Centreon, and more recently thanks to a recommendation, Opsview.
All of these tools appear very good, and are all built around Nagios, but each have their own issues, but also advantages. I’m going to go through a few of them here.
Ok, First up, Pure Nagios, this is configured from text files, and the way you arrange them is really up to you, so for me I had directories of “Servers”, “Development”, “Switches”, etc. within these I had a file for each host, that contained the host information and services associated with it. for me this made it very easy to add a new host as I just created a new file for the host, and added the information. There’s no Web configuration on Nagios (though there are 3rd party ones that I didn’t really look into), and there’s no graphing, without 3rd party apps to do that too, though, there’s nothing wrong with doing one yourself with RRD and linking to the graphs from within Nagios.
Groundwork Open Source
Next I tried Groundwork, This is a very good application, and has a very nice web interface for configuring hosts, including a “Auto Discovery” tool that will go and find all hosts within a range of IP address’s, Limitations of this are that it only does IPv4 address’s, though at least it gives you a start point on your network. Adding hosts and services on this is manually also very easy, it sets a few graphs up for you, and creating more or customizing the graphs is also very easy. Unfortunately, it seems the Groundwork team are not doing any work on their Open-Source version anymore, which is a little disheartening as they’re using Open-Source software under the hood to do most of it, it’s also very heavily orientated with Java, and requires a bit of CPU grunt to do a lot of the processing.
Next I tried Centreon, again this is a very good tool, with very nice web GUI for configuring hosts, it lacks an Auto-Discovery, but it does allow you to import your configuration from Nagios, and for me that worked perfectly, I didn’t have to configure much outside of that other than the graphing data. Adding new hosts and services is very easy with the web GUI, and Nagios is still accessible along-side Centreon so you can still access the views that Nagios has should you wish to. The downside I have found with Centreon is the graphing data. For the most part, ping times, etc it’s absolutely fine, however, when it comes to network traffic, that’s a whole different ball-game, Here’s what it does with Network Statistics.
As you can see, that’s not the most useful data one could get. this is due to settings within RRD tool, and I have not managed to yet find out a way to change this within Centreon, Keep in mind, I don’t want to change scripts, this should be do able from within the web interface. MRTG is very good at these, and I don’t mind plugging the MRTG graphs into this, however, it would be nice if there was a single point to get all of this data. The graphs that are in-place also appear to show far more information than they need to, and again I’ve not yet worked out how to solve this. Everything else works well, but the graphing seems to feel somewhat unfinished.
The last few days I’ve been playing with Opsview thanks to a recommendation from a fellow geek. The first stumble I hit was the fact that in Nagios I’m monitoring upwards of 100 hosts and around 1500 services including CPU Load, Memory usage, HTTP Response times, etc, each network service is on both IPv4 and IPv6, and the entire config can be a bit of a nightmare. Opsview does have a tool to import Nagios configurations, however, this was not easy, it complained about custom plug-ins that I have written and it didn’t know about, easy fix for that, just copy them in-place, it complained about a few other things, too, and eventually I bailed out of doing this and just started to add hosts manually. Opsview have taken a very different approach to your plug-ins or commands, instead of having a service, that points to a command, that points to the actual script, you have a service that points to the script with the arguments assigned, well, I use things like $USER7$ for my SNMP Community, could I work out where you can set these? no. it also means you can’t have say a single command with 2 services assigned for things like “local-ping” and “long-distance-ping”, though you can still do this, it’s a different way of doing it, and this is what broke a lot of my Nagios importing, and why I took the Manual route. It does make sense the way they’ve chosen to do it, it’s just a knot in the head when you’re used to the other way. As for the Graphing, it plugs into MRTG for the network interface stats, which is good, though currently mine is saying “No Data” – I’ll look into this at some point, I know how to configure MRTG, so it might involve a small amount of tinkering under the hood, the rest of the graphs are great, clean and tidy, only display the information that you want, and does what it says on the tin. Other than the learning curve of differences and things that are not quite working out of the box (MRTG) it’s looking good. I’ll stick with it for a while and see if I can make sense of the broken bits. I also need to import my XMPP and SMS notification scripts, this should be fun, as the notification system is rather different too, I’ll have to do some working out on this.
The conclusion I’ve come up with, is Nagios is very very good at monitoring your network, it does exactly what you tell it to do, but if you want easier configurations, graphing, etc, there’s a lot of options, and I’m still yet to find one that ticks all the boxes, Hopefully once I’ve configured Opsview a little more it will tick those missing boxes. Groundwork did in-fact tick all the boxes, but the fact that they’re not publishing the Open-Source version anymore bothers me. I understand companies have to make money, and I’m happy to support them, but don’t call yourself “Open-Source” when you’re not.
Hopefully with Opsview, other than the initial configuration of devices, the search will be over, Maybe I’m just attacking it wrong and should have Nagios for the monitoring and something else for the graphing (Cacti?) and leave it be, let each do their own job, We’ll see when I’ve played with Opsview a little more.