Automation: Making Better networks

As part of the same project I wrote this python script for, I created an Excel/VBA script to allow our team to quickly and consistently input all of the data required for the VPN hardware we were shipping out to over 450 locations.  The output of this Excel spreadsheet would later serve as the input to the Python script I wrote, and combined they are working out very well.

Why Bother?

Before I dive into the Excel/VBA code I’d like to give a little bit of background on my thoughts on why this was worth getting into.  From my experience, it’s fairly easy to come up with the configuration for a single site, or even a couple of sites.  You have the time to verify everything is correct, put everything in by hand, and really dedicate the time to check everything is how you want it.  This gets more difficult as you scale in size.  Even at 10 or 20 sites you start to increase your margin of error for a typo here or there, or you might run out of time and not be able to check everything as well as you would like.  Once you start to get into hundreds of devices to configure it makes things that much more complex.  It’s now very difficult, if not impossible, for a single person to manually configure each location and requires a very large amount of time dedicated to a single project.  If you take the time to automate a process, whether it is with a script, Excel, or some other combination of tools you can reduce the number of human errors as well as reduce the time and resources that would otherwise need to be dedicated to the project.

The problem

The project that led to this particular Excel VBA script required shipping out VPN hardware to over 450 locations.  Each VPN appliance was shipped to us by the carrier and was already assigned to a specific site.  There were a few different unique pieces of information, all in different spreadsheets that all needed to be tied together:

  1. Spreadsheet including Site ID and MAC address of VPN appliance
  2. Spreadsheet including Public IP address information from various broadband providers
  3. Spreadsheet including Internal IP addresses and identifying the type of configuration each site would get

The different types of configurations for each site were important as they dictated what equipment and information would need to be set up for each site.  The three options were:

  1. Broadband as a primary connection with a cellular USB as the backup
  2. Combination of broadband and T1
  3. T1 with cellular backup

If you had a site that fell under ‘option 1’ then it required entering in the public IP address information for that site, as well as keeping track of the cellular SIM ICCID and IMEI numbers.  If you had a site that was under ‘option 2’ you would only enter in the public broadband IP address, but would not need to package any cellular USB sticks.  If you had a site with ‘option 3’ then you would not have any broadband IP information to enter, but would need to package a cellular USB stick and record that information down.

Doing any of the above manually would be very labor intensive, flipping between multiple spreadsheets to check the type of setup the site would have, figure out which information to record and ship out. So, we automate.

VBA

Prior to this project I hadn’t written a VBA script since middle school.  I ended up having to re-learn a number of things to write this script but in the end it was worth it.  It has a lot of similarity to a pivot table, with some added extras.  The idea behind the script is this:

  • Every VPN hardware appliance has a barcode on the box that includes the MAC address of the device.  We can use a barcode scanner to scan that box and do a lookup of the MAC address in Spreadsheet #1 which will give us the site ID we are working with.
  • Then do another lookup of that site ID in another spreadsheet and pull the type of configuration(Broadband,Cellular,T1 combinations)
  • Prompt the user for the appropriate required information, depending on which type of configuration the site will get.  For example, if the site will need Broadband and cellular, then display the fields for broadband IP address and cellular information.  If the site will be getting a T1 with cell backup, don’t prompt the user for any broadband information.
  • At the end of the script, tell the user which instructions to package with the device before it gets shipped out.

Some screenshots

Here’s some screenshots of what the tool looks like when run with a Broadband/Cell Site:

1) Start off by scanning the barcode of the VPN Appliance

Image

2)Prompt the user for the appropriate information so it can be saved to a database.

Detected this was a broadband/cellular site. Prompt the user for the appropriate information.

Detected this was a broadband/cellular site. Prompt the user for the appropriate information.

3)Present the user with all of the necessary information for that store so they can enter it into the VPN gateway.

Present the Broadband IP address information to the user for this specific site so they can enter it into the VPN hardware appliance.

Present the Broadband IP address information to the user for this specific site so they can enter it into the VPN hardware appliance.

Advertisement

Using Python to generate Cisco configs

As part of a project to migrate from one MPLS carrier to a new one we were faced with the challenge of deploying a consistent, correct configuration to each of 450 remote sites.  Each of these remote sites is similar in terms of the number of subnets, IP addressing schemes, and router models.  Unfortunately there are are also a number of differences. While the IP schemes follow the same format for each location, each one has it’s own subnet assigned, the interfaces can change between FastEthernet or Gigabit depending on the router type, and the number of hosts in each location that are restricted via ACL varies widely.  As part of this new project there were a total of four different configurations that a remote location could receive.  While it may have been possible to configure each remote site manually, choosing the correct template to follow as we went along, it opened the doors for a huge amount of error.  A typo in an IP address on an interface that went unnoticed may not be caught until later on, a typo in an ACL could open us up to security issues, and applying the wrong template to a location could cause wasted time on troubleshooting.  Enter scripting.

Some background on scripting

I went to an engineering school and majored in Computer Engineering. As part of the standard curriculum I got to take a few programming classes.  I hated them. With a passion. “I’ll never use this”… “Im getting into networking, why waste my time on this”.  I kept that mentality through school, graduated with my BE without any issues. Got my first job as a network technician, got my CCNA shortly after, never having touched programming.  My boss at the time was extremely into Perl.  At the time this is what it came across as in my head:

You need to learn Perl!

Perl! I’ll write a Perl Script!

He had written Perl scripts to accomplish pretty much anything you could think of.  While I could appreciate what they did, I just didn’t have the interest to jump into it, my main focus was networking and there was plenty of networking to learn.Eventually some project came up where I dove into some Perl and came up with a script that did something (I don’t actually remember what it did, it didn’t cure cancer, but it did something to solve a problem), and while it was a good feeling, still didn’t stick with me.

Eventually I changed jobs and saw more and more problems come up that needed solutions.  Call Detail Records that needed reporting without the money for a professional product. I wrote (struggled through) a Perl script to search and spit out call records.  Would not win any programming competitions but it got the job and I felt good about it.  At the end of the day though it was still a struggle to write and I never fully felt like I *got* Perl.  Fast forward another year or so without any real scripting and more problems/challenges were starting to present themselves.  Requirements to push out changes to 450 remote sites, each of which has different IP addressing or requirements. Doing it manually would take an enormous amount of man hours (even at 5 minutes per location, that’s almost an entire full 40 hour work week, and no one actually had a full work week to dedicate to this. Not to mention it’s mind numbingly boring and open to error).  There was no software we owned(or that I found) that would accomplish what I was looking to do so I searched to see what other programming options were out there, and I discovered Python.  It was actually a two part Google Python class on Youtube.  It immediately clicked with me. It made sense, seemed easy and I was able to jump right in.  I was by no means an expert from the start, and still am not, but the amount of resources available on the Internet for Python, and the way the language worked just made sense to me.  Since those two videos I’ve gotten deep into Python and it’s allowed me to build the script this blog post is about.

 

The Problem

This project had four possible templates that could be applied to a site depending on which types of connectivity were available at each site(DMVPN with broadband, DMVPN with cellular, T1, and a combination of those).  Each site was identified by a unique number, and had a corresponding IP scheme.  For example site 123 might have an IP address scheme 10.1.23.0/24, which was then broken down into smaller subnets.  Another difference was the number of hosts each site had that were controlled by access lists.  One site may have 2 servers that needed to be controlled by an ACL, while another may have 3 or 4, each with different IP addresses. Since each site was unique, it wasn’t as easy as using our normal change deployment tool to push out a single change everywhere

 

The Script

Without diving into every line of the script I’ll give you some of the pseudocode and key functions/modules. If you want to know more about how any one piece works feel free to contact me.

Input

Prior to this project starting we already had an Excel file that contained every site, with subnets and IP addressing broken out across a number of different columns, almost 30 in total. In the past this was used mostly for reference and documentation, but would serve as the key input for my script to work.The excel file had a header row of variable names in brackets like [Loopback0] or [DataVlanGW]. Every other row below that is associated with a single site, identified by a unique site ID in one of the columns.  I added a new column to this spreadsheet called “TemplateID”. This column had a possible value of 1 – 4, each number corresponding to a certain configuration template.Other columns are things like site id, loopback address ,gateways, subnet masks, etc.  Filling this spreadsheet out initially was a manual process but it’s important to remember that it’s a one time effort to fill it out.  Once you have this it becomes very powerful for future applications.  I exported this from Excel to a CSV and then copied it to our linux server where I’d be building the Python script. Here’s a snippet of what the Excel headers look like:

excel

Templates

I had four possible Templates I’d be using.  I built the entire router configuration in Notepad++ and replaced any piece of the config that would differ between stores with a variable name in brackets. In the example below I’m using the variable [Loopback0], which is also a column in the spreadsheet:

interface Loopback0

description Network Management

Interface ip address [Loopback0] 255.255.255.255

no shut

exit

So, four of these templates, each slightly different and then saved as Template#.conf in a directory.

templates

You’ll notice there are more than four templates here, one ‘a’ template, and one ‘b’ template.  This was because we had two different router models(2800 and 2900), each of which had slightly different interfaces and IPS configuration.  In my script I check the model of the router and choose the appropriate template.

Processing the input

I use the built in Python function csv.reader to read through every row of the CSV file, storing each column(separated by commas in a CSV) as a variable in an array. The technical term for this in Python is a dictionary. So for example, the first column gets stored as “TemplateId”, the second column stored as “Model”, and so on until I store every column for that row into the dictionary.  I then use ‘if – then’ logic to check the value of the first column.  If templateid = 1 then I load my Template1.conf file, if templateid = 2, then load Template2.conf, etc.  Once the Template file is loaded into a variable I run this one command which basically runs a ‘find/replace’ on any variable surrounded by brackets in the template file and replace it with the values currently in the array for that row.

output = replace_words(tempstr, values)

I found the above function ‘replace_words’ after a little Googling.  ‘tempstr’ in the above function is the base template file I loaded in, and ‘values’ is the name of the dictionary I stored every column for this site ID in the CSV.  I then write this to a new file with the unique file name of the site id.  I know the site ID because it was a column in the CSV that I processed, so it’s trivial to save a file called site-###.txt, pulling ### from the array. Because this entire thing is in a ‘for’ loop, it will repeat this logic for every row in the CSV file until it reaches the end.The best part of this is it processes all 450 rows of this file in about 20 seconds.  450 consistent, full, unique Cisco router configs in 20 seconds.  I’m not automatically copying them or applying them to every device, but I have the config files on hand at this point.  I’ll talk about how I automated pushing them out to each site in the next blog post.

Here’s a snippet of the code accomplishing this first part. This only shows one set of the ‘if-then’ logic for Template #1, but there are 3 other if-thens in the real script to accommodate the other templates.:

script, inputcsvfile = argv

with open(inputcsvfile, “rb”) as infile:

reader = csv.reader(infile)

next(reader, None) #Skip the header line of the CSV

for row in reader:

values = { ‘[Hostname]’:row[2], ‘[Loopback0]’:row[5], ‘[DataVlanNet]’:row[6], ‘[DataVlanGW]’:row[7], ‘[Data2VlanNet]’:row[8], ‘[Data2VlanGW]’:row[9], ‘[WifiVlanNet]’:row[10], ‘[WifiVlanGW]’:row[11], ‘[IPSVlanNet]’:row[16], ‘[IPSVlanGW]’:row[17], ‘[IPSVlanIP]’:row[18], ‘[SwitchIP]’:row[19], ‘[SerialIP]’:row[20], ‘[BGPNeigh]’:row[21], ‘[SerialDesc]’:row[22] } #truncated

outputfile = “outputs/site-“+row[2]+”-NEWCONFIG.txt”

storecopycommandsfile = “StoreCopyCommands/site-“+row[2]+”-r1.txt”

#Next line will check to see what the template ID is set to, and load the appropriate template into variable ‘t’

if row[0] == “1”: #Open the correct template

if row[1] == “2811”:

t = open(“Templates/Template1a.conf”, “r”)

elif row[1] == “2911”:

t = open(“Templates/Template1b.conf”, “r”)

#Store the template into a temp string

tempstr = t.read()

t.close()

#Rip through the template and do a find/replace for the current store number

output = replace_words(tempstr, values)

#Write out the new config file

fout = open(outputfile,”w”)

fout.write(output)

fout.close()

But wait! There’s more!

file7

I mentioned one of the challenges of generating these configs was that each one had an access list with varying numbers of hosts that needed to be included.  Since my CSV file didn’t have columns for every host in an ACL I had to turn to an alternative method.  For this I relied on three things:

  • The current running configurations for each site
  • Regular Expressions
  • More Python

Current configs

Each site already contained existing ACL entries for each host. Since these were already in production, they were known to be working.Here’s an example of something I might be working with:

permit tcp host 10.10.10.10 10.1.1.0 0.0.0.255 eq 9100
permit tcp host 10.10.10.11 10.1.1.0 0.0.0.255 eq 9100

In this case the host that is unique per site is the source host(10.10.10.10 or 10.10.10.11 in this example), and the destination network and port stay consistent from site to site. This example shows two source hosts, other sites may have less or more. It doesn’t matter as you’ll see below.

Regular Expressions (Regex)

If you aren’t familiar with regular expressions I’d strongly recommend you research them and start to practice.  Regular expressions allow you to search for any number of patterns in text and not only match on them, but also store them for further processing.  For learning regular expressions, there is an excellent site I found called www.pythex.org.  This lets you enter in the text you want to  search in one box, and the regular expression you are testing in another, and will highlight what matches and what doesn’t.  It’s an excellent tool for when you are starting out, or when you just need to troubleshoot why something isn’t matching correctly.

In the example ACL above I need to identify the unique source hosts from each line and then do something with them.  The regular expression I used to match this is:

permit tcp host (\d+.\d+.\d+.\d+) 10.1.1.0 0.0.0.255 eq 9100

If you’ve never looked at a regular expression before it can be intimidating, but if you break it down piece by piece it isn’t too bad. Here is what it is searching for:

  • Search for the text ‘permit tcp host ‘
  • Now look for a digit(\d) that repeats one or more times (+) followed by a ‘.’, repeated 3 more times. This should look like an IP address to you.
  • I want to put parentheses around this entire IP address to save it to a variable for further use
  • The IP address should be followed by the text ‘10.1.1.0 0.0.0.255 eq 9100’

The regex I used isn’t perfect, but it isn’t necessarily wrong either.  There are a number of different ways you could write it, some much more accurate than others. For example, I am just checking that there is one or more repeating digits before the ‘.’ but I don’t check that they are valid for use in an IP address, so it would match things like 999.999.999.999.  For my specific use case it doesn’t matter because I’m relying on the current running config(which Cisco already validated when it was entered into the router) as a valid IP address. Here is what this looks like on the Pythex site:

Screen Shot 2014-03-31 at 9.15.00 PM

Python

Now that I searched for the occurrence of the source host, I need to tie it all together with Python. Here’s the pseudo code for this piece:

  • Read every line of the current running config, searching for the regular expression from above
  • If you find a match, store the matching piece(The IP address in () ) into a list in Python, until you get to the end of the file
  • Open the new router config we generated in the previous “Processing the input” section
  • Loop through this file and search for the string “!Inserted ACL”. This was text I included in each template to server as a marker for a place I want to insert these site specific access list entries.Since it has an ‘!’ at the beginning, it doesn’t interfere with the Cisco config, but still allows me to search for it.
  • If you find the string “!Inserted ACL” in the file, then replace it with the site specific access list entries we just found using the regular expressions
  • Repeat this for each match we had for the regular expression
  • Save the file

Here’s a snippet of the actual code:

existingconfig = open(myfilename)

serverips = []

for line in existingconfig:

servermatch = re.search(r’permit tcp host (\d+.\d+.\d+.\d+) 10.1.1.0 0.0.0.255 eq 9100′,line) #Match the regex

if servermatch:

serverips.append(servermatch.group(1)) #If we get a match add it to the list called serverips

serverips = list(set(serverips)) #This just gets the unique values from the list and then saves back to the list

checkforinsert = False #initialize this variable

for line in fileinput.input(outputfile, inplace=1): #loop through our initial saved config file

if line.startswith(‘!InsertedObjectGroups’): #Search each line for the string “!InsertedObjectGroups”

checkforinsert = True #If we find the string, then change the variable checkforinsert to true

else:

if checkforinsert: #if checkforinsert is true, then let’s print out the new object group, using the matches from our regex

print “\nobject-group network Server_Ips”

print “description Server IPs”

for num in serverips: #loop through the Python list for each server IP found

print ” host “,num

print line, #continue to print the lines of text

Disclaimer: Python is very strict about indentation. I’m still working on finding a good way to include pieces of code in the blog, and all of the indents in the above snippet probably don’t line up correctly and would error out, but it should give you an idea of what I’m doing. This is what we end up with after it writes the config:

!InsertedObjectGroups

object-group network Server_Ips

description Server IPs

host 10.10.10.10

host 10.10.10.11

Since the entire thing is in a for loop it doesn’t matter if the site had 1 host or 1,000 hosts. It will just read through the entire file, saving the matches it finds, and then spit them back out into the new config.  Again, this entire piece combined with the previous section only took about 20 seconds to run for 450 sites.  If you had to read through every ACL at the time of a cutover to search for unique hosts per site you would almost be guaranteed to miss one here and there, or type it incorrectly.  This greatly reduces that margin of error, and saves you huge amounts of headache.

Testing

Make sure you test everything you are writing.  Run your script, check the output, apply it to a lab router and see where it errors out. Run it against multiple test cases to make sure you’ve accounted for any anomalies that may come up.  Don’t try to write the entire thing in one shot. The worst thing you could do is try to conquer a large problem with your first script and have it blow up, creating more damage than if you had just configured your devise manually.  Small steps are good and as you become more confident it will become much easier. The idea is to make your life easier, not to get yourself fired because you took on a problem too large for your current skill set.

Wrapping it up

While this post didn’t cover every detail of coding in Python, I hope it gave you enough of a taste to see what is possible once you start to get into it.  I think it’s important to note that I didn’t jump into this as my first script. A number of the pieces in this script came from other previous scripts that were much simpler.  Once you learn how to do each small piece, it becomes easier to combine them all together and really build something that can give you results.  One of the biggest hang ups I had when starting out was I wanted everything to be ‘perfect code’, the most efficient, streamlined code ever written.  It doesn’t work like that though, it’s a process.  What I wrote can definitely be cleaned up, optimized, and just generally improved, but at the end of the day it solved the problem that I needed to solve and I consider it a success. I took a task that previously would have taken 40 hours + and reduced it to 20 seconds.  Going forward I can review the code I wrote, research some more Python and optimize it, but for right now it got the job done and I’m happy.

 

Excel IP Functions

I was working on a project the other day where I had a list of ~ 500 /30 subnets in Excel that I needed to break out so each host in the /30 was in it’s own column.  Seemed pretty straight forward, but there was no built in function to handle this.  With all of the other math functions Excel has built in, you would think that in 2014 some functions that deal with IP addresses would be standard.  Guess not.  Rather than spend the time to write one from scratch, a quick Google search came up with an entire set of functions written by Rajeev Bhardwaj. He put together an excellent presentation that shows the usage of each of the functions located here:   http://www.slideshare.net/rajivss/ip-functions-presentationgen-v11 .  This not only helped me solve my current problem but all of the other tools included in the Add-In will definitely get used in the future.

Here’s a quick summary of the included tools:

  • IP_Bits2Mask – Converts the number of bits from a / notation to an expanded subnet mask
  • IP_ErrChk – Checks to see if a value is a valid IP address or not
  • IP_Hosts – Calculates the number of hosts from a specific subnet mask
  • IP_IP2Mbits – Calculates the subnet mask required to obtain a certain number of host addresses
  • IP_Mod – Takes any octet of the IP address and increments/decrements it by the set value
  • IP_NextSub – Calculates the next subnet
  • IP_Bcast – Calculates the broadcast address for a given IP
  • IP_Count – Counts the occurrence of an IP in a range of subnets
  • IP_IsExist – Check if an IP address exists in a subnet
  • IP_Mask2Bits – Calculates the mask bits from a subnet mask in dotted notation
  • IP_Subnet – Calculates the subnet address for a given host IP

I would recommend you visit Rajeev’s site and download the Excel add-in to be able to take advantage of all these functions.  You can visit his site to download the tools here: http://rajivbhardwaj.com/download/

 

Monitoring HSRP Failover with EEM

Background

We’re currently using Solarwinds to monitor our network infrastructure. While it does a decent job at monitoring the basics, there are certain things I feel it could do better right out of the box.  I have this preconceived idea of what the ideal NMS should have, looking at your config and then strongly recommending what you should monitor based on what it finds.  Unfortunately I haven’t found that yet.  Solarwinds is great if you want the pretty GUI, NAGIOS is very powerful if you know what you want to monitor and don’t mind a little scripting, but theres nothing I’ve found (yet) that does it all the way I want.  One of the things I think are important is tracking your HSRP failover events.  In a typical old school (no VSS or other newer technologies) redundant environment you would have two switches running HSRP between them, where one takes over if the other fails.  It’s possible you could go months or years without monitoring HSRP without any issues. It’s also possible you could be having constant HSRP events happening that you never even see unless you are specifically looking for them.  These could cause brief interruptions to your user traffic depending on how your timers are tuned, and probably just get reported as general ‘the network is slow/horrible/never works/etc’. To monitor these events without getting too deep into Solarwinds or other monitoring solutions I decided to turn to EEM.

EEM

EEM is a pretty powerful and flexible component of IOS/NX-OS that allows you to track or alert on certain events on Cisco devices.  I won’t go into all of the features that EEM has as it is well documented.  If you are interested in finding out more about EEM I’d recommend looking through Cisco’s site, starting here: https://supportforums.cisco.com/docs/DOC-27996.

HSRP Applet

The applet we are going to write is referred to as a syslog collector script.  It simply monitors the syslog messages that are generated by the router/switch and performs some action based upon the detection of a certain string.Basic, but powerful. Here’s the script I created:

ip name-server 1.2.3.4

event manager environment _mail_smtp smtp.yourcompany.com

event manager environment _mail_rcpt networkteamdistro@yourcompany.com

event manager session cli username “yourusername”

event manager applet HSRPEvent

event syslog pattern “HSRP-5-STATECHANGE”

action 1.0 info type routername

action 2.0 mail server “$_mail_smtp” to “$_mail_rcpt” from “fromemail@yourcompany.com” subject “HSRP State Change on $_info_routername” body “$_syslog_msg”

Some notes on what the script is doing:

  • ip name-server– Required if you are using a DNS name for your SMTP server.
  • event manager environment – both of these lines are setting variables _mail_smtp and _mail_rcpt, which are used later in the script to send mail
  • event manager session cli username – used to set the username you want to run the script as. You do not need a password. IOS uses the username only for authorization purposes, not actual authentication. It will check the authorization locally or against a AAA server like ACS.
  • event syslog pattern – is telling the applet to search the syslogs for the specified pattern
  • action 1.0 info type routername – just stores the router’s current router’s name into a variable.  This is useful so when I get the email I’ll know which switch it’s coming from
  • action 2.0 – Is sending an email using some of the variables from above. The subject has the text “HSRP State Change on <ROUTERNAME>”. The body will contain the actual Syslog text, which will contain the state change.  As an example, the body of the text would look something like this: “3348414: Jan 27 13:51:19.687 EST: %HSRP-5-STATECHANGE: Vlan181 Grp 181 state Standby -> Active”

The Big Picture

This applet is just one small example of what you can do with EEM.  The possibilities are nearly endless.  While this focuses on one specific aspect of the network, I think it’s important to always be looking at your own network and seeing how you can improve, both in the network itself and in monitoring the network.  ‘Ignorance is bliss’ doesn’t really apply to networking.  Problems will eventually catch up to you, usually snowballing into some type of bigger issue and it’s always better to get ahead of them early on.  Monitoring is a big part of that and EEM is a quick way to achieve it.  What type of EEM applets have you found to be useful?