Wednesday, June 10, 2009

Opsview->Nagios - Is simpler better?

I'm on the cusp of implementing my new Nagios install at the backup site. It's going to be very similar in terms of configuration to the primary site. At the same time, I'm looking at alternate configuration methods, mostly to see what's out there and available.

Since the actual configuration of Nagios is...labyrinthine, I was looking to see if an effective GUI had been created since the last time I looked. I searched on ServerFault and found that someone had already asked the question for me. The majority of the votes had been thrown toward Opsview, a pretty decent looking interface with lots of the configuration directives available via interface elements. Someone obviously put a lot of work into this, from the screen shots.

It turns out that opsview has a VM image available for testing, so I downloaded it and tried it out. I have to say, the interface is as slick as the screenshots make it out to be. Very smooth experience, with none of the "check the config, find the offending line, fix the typo, check the config..." that editing Nagios configurations by hand tend to produce.

As I was clicking and configuring, I thought back to Michael Janke's excellent post, Ad Hoc -vs- Structured Systems Management (really, go read it. I honestly believe it's his magnum opus). One of the most important lessons is that to maintain the integrity and homogeneity of configuration, you don't click and configure by hand, you use scripts to perform repeatable actions, becayse they're infinitely more accurate than a human clicking and typing.

The ease of access provided by Opsview is tempting, and I can't say that I don't trust it, but I can say that I don't trust myself to click the right boxes all the time. My scripts won't do that. Therefore, I'm going to continue to use my scripts.

Remember, if you can script it, script it. If you can't script it, make a checklist.