We’re currently using Solarwinds to monitor our network infrastructure. While it does a decent job at monitoring the basics, there are certain things I feel it could do better right out of the box. I have this preconceived idea of what the ideal NMS should have, looking at your config and then strongly recommending what you should monitor based on what it finds. Unfortunately I haven’t found that yet. Solarwinds is great if you want the pretty GUI, NAGIOS is very powerful if you know what you want to monitor and don’t mind a little scripting, but theres nothing I’ve found (yet) that does it all the way I want. One of the things I think are important is tracking your HSRP failover events. In a typical old school (no VSS or other newer technologies) redundant environment you would have two switches running HSRP between them, where one takes over if the other fails. It’s possible you could go months or years without monitoring HSRP without any issues. It’s also possible you could be having constant HSRP events happening that you never even see unless you are specifically looking for them. These could cause brief interruptions to your user traffic depending on how your timers are tuned, and probably just get reported as general ‘the network is slow/horrible/never works/etc’. To monitor these events without getting too deep into Solarwinds or other monitoring solutions I decided to turn to EEM.
EEM is a pretty powerful and flexible component of IOS/NX-OS that allows you to track or alert on certain events on Cisco devices. I won’t go into all of the features that EEM has as it is well documented. If you are interested in finding out more about EEM I’d recommend looking through Cisco’s site, starting here: https://supportforums.cisco.com/docs/DOC-27996.
The applet we are going to write is referred to as a syslog collector script. It simply monitors the syslog messages that are generated by the router/switch and performs some action based upon the detection of a certain string.Basic, but powerful. Here’s the script I created:
ip name-server 188.8.131.52
event manager environment _mail_smtp smtp.yourcompany.com
event manager environment _mail_rcpt email@example.com
event manager session cli username “yourusername”
event manager applet HSRPEvent
event syslog pattern “HSRP-5-STATECHANGE”
action 1.0 info type routername
action 2.0 mail server “$_mail_smtp” to “$_mail_rcpt” from “firstname.lastname@example.org” subject “HSRP State Change on $_info_routername” body “$_syslog_msg”
Some notes on what the script is doing:
- ip name-server– Required if you are using a DNS name for your SMTP server.
- event manager environment – both of these lines are setting variables _mail_smtp and _mail_rcpt, which are used later in the script to send mail
- event manager session cli username – used to set the username you want to run the script as. You do not need a password. IOS uses the username only for authorization purposes, not actual authentication. It will check the authorization locally or against a AAA server like ACS.
- event syslog pattern – is telling the applet to search the syslogs for the specified pattern
- action 1.0 info type routername – just stores the router’s current router’s name into a variable. This is useful so when I get the email I’ll know which switch it’s coming from
- action 2.0 – Is sending an email using some of the variables from above. The subject has the text “HSRP State Change on <ROUTERNAME>”. The body will contain the actual Syslog text, which will contain the state change. As an example, the body of the text would look something like this: “3348414: Jan 27 13:51:19.687 EST: %HSRP-5-STATECHANGE: Vlan181 Grp 181 state Standby -> Active”
The Big Picture
This applet is just one small example of what you can do with EEM. The possibilities are nearly endless. While this focuses on one specific aspect of the network, I think it’s important to always be looking at your own network and seeing how you can improve, both in the network itself and in monitoring the network. ‘Ignorance is bliss’ doesn’t really apply to networking. Problems will eventually catch up to you, usually snowballing into some type of bigger issue and it’s always better to get ahead of them early on. Monitoring is a big part of that and EEM is a quick way to achieve it. What type of EEM applets have you found to be useful?