Cellular Routing Backup with the Cisco 819

I’ve been involved in a few international projects where we used a Cisco 819 router to handle a small network, using a broadband connection as the primary connection to the Internet, and the built in cellular connection as the backup.  I thought I’d talk about the basic config for this, some options you have with the cellular backup configuration, and other things I noticed in working on these projects that were notable.

The hardware

First things first, if you are planning on deploying one of these, make sure you order the right hardware.  There is a different model depending on which type of cellular technology your carrier supports.Check out this Cisco link and look at Table 2.  There are three models you have to choose from:

  • C819(H)G-4G-V-K9 – Works with Verizon networks, hence the “V” in the product sku
  • C819(H)G-4G-A-K9 – Works with ATT networks, hence the “A” in the product sku
  • C819(H)G-4G-G-K9 – Works everywhere else, “G” for global in the sku

The (H) in parenthesis is optional depending if you require a hardened version of the hardware for harsh environments.

Pick your carrier

If you’re trying to install this in a location you aren’t familiar with it can be hard to determine who the best cell carriers in that area are.  I found this great website www.opensignal.com that collects and pools data from droid and iphone device apps to determine signal strength and performance across the globe.  You can type in a location to search and it will show you both the top carriers in that area and coverage on a map.  If you check this page out you can get a list of every country they have data for, the top available carriers in that country, and clicking any carrier will give you a coverage map and statistics on performance like download/upload speeds and latency.

 

Get your SIM Card

After you have the hardware you will need to place an order with your cell carrier for the SIM card.  SIM Cards come in different sizes, as seen in this wikipedia article.  The Cisco 819 supports the Mini SIM (Size 2FF).  I think I read in one of the support docs that you could use a Micro SIM( Size 3FF) if you had an adapter, but I didn’t test it out, and just decided to use the default supported size.When you order your SIM card you should ask your carrier what the APN(Access Point Name) you should be using is.  When your cellular router makes a connection to the mobile network it will present an APN which will help determine the type of network connection that will be created.  This varies per carrier so you’ll just need to ask your sales or technical team for that info.  Also keep in mind how much data you might be sending over this connection when sizing your plan.  Overages in cellular data use can get expensive very quickly so order appropriately.  Some carriers offer M2M plans which allow you to share data allowances between SIM  cards which becomes very useful if you have a large number of sites using this technology as a backup. Going with this type of deployment you could get away with a small data cap, inexpensive monthly plan of some number of megabytes per site, which will mostly go unused(hopefully if your primary connection is reliable), but when it does need to be used at any one site you’ll have the power of that pool of data allowances to tap into(for example maybe each site is 25MB per month but you have 100 sites, that gives you 2.5GB total to use).

Router Config

As mentioned in the intro, my config will use broadband as the primary connection and cellular as the backup.  I’ll show two different options for the cellular backup as well: Triggered/dialing on demand and always on.

Triggered Cellular Backup

In this design traffic will get sent to the broadband connection under normal circumstances and will get sent towards the cellular interface if there is a problem with the broadband.  In order to add a little bit of intelligence to detecting when there is an issue with the broadband connection we’ll use a combination of IP SLA and tracking to ping a host (Google’s Public DNS 8.8.8.8 in this example, although I would not recommend you use that in production since you have no control over Google’s uptime, may be better to test pinging the default gateway of the Broadband provider instead).

To start off we need to create a cellular data profile on the router. This contains information such as the APN, authentication type, and username/password if applicable.  All of this info will come from your provider.  To create the profile you’ll enter something like this:

router#cellular 0 lte profile create 1 <APN_GOES_HERE> pap <USERNAME_HERE> <PASSWORD_HERE>

After you create the profile we can continue with the config. This is how the config will work:

  • Traffic normally is NATTED and sent out the Gi0 broadband interface to the internet. This is sent to the Gi0 interface using a default route with a track object on it.  An IP SLA check is running over the Gi0 interface as well, testing ICMP to 8.8.8.8.
  • If the IP SLA object fails,the track on the default route towards the Gi0 interface will cause that default route to be removed, leaving a second default route that points towards the cellular interface, but had a higher admin distance(so wasn’t used).
  • Once any IP traffic is detected by the dialer list, it will cause the chat script referenced under the cell0 interface to run the ATDT commands and dial the modem.
  • The modem will start and establish a cellular connection. This connection will stay up as long as there is traffic detected.  If there is no traffic detected in the idle timeout(currently set at a default, can be modified under the cell0 interface using dialer idle-time), the connection will go down.  This is good in that it helps conserve data usage costs for your cellular.
  • In the background, the IP SLA traffic is still trying to reach 8.8.8.8 via the Gi0 interface. Once the IP SLA is successful, the original default route will go back into effect, and traffic will start going out the Gi0 interface again.

 

chat-script lte “” “AT!CALL1” TIMEOUT 60 “OK” !Chat script called ‘lte’ used to dial the modem with ATDT commands

 

ip sla auto discovery

ip local policy route-map sla-route !Local policy for traffic originating from the router going to 8.8.8.8 over the Gi0 interface (which is set using the route-map sla-route)

ip sla 100

 icmp-echo 8.8.8.8 source-interface GigabitEthernet0 !IP SLA ICMP test to 8.8.8.8 sourced from Gig0

 frequency 10

ip sla schedule 100 life forever start-time now

track 100 ip sla 100 reachability !Tracking object that is tracking the state of IP SLA object 100

 

delay down 10 up 20

 

dialer-list 1 protocol ip permit !Dialer list that defines what traffic is allowed to start the cellular connection.In this case any IP traffic will trigger the connection.

 

ip dhcp pool testpool !DHCP Pool for clients

network 192.168.0.0 255.255.255.0
default-router 192.168.0.1
dns-server <DNS_1> <DNS_2>
!
interface GigabitEthernet0
description Broadband Connection
ip address <Broadband_IP> <Broadband)Mask>
ip nat outside !Outside interface
duplex auto
speed auto
!
!

vlan 10

interface Vlan10
ip address 192.168.0.1 255.255.255.0
ip nat inside !Inside NAT Interface
ip virtual-reassembly in

 

interface Cellular0

ip address negotiated

ip nat outside !Outside interface of NAT

 ip virtual-reassembly in

 encapsulation slip

 ip tcp adjust-mss 1000

 dialer in-band

 dialer string lte !The chat script to use when dialing the cellular interface. This needs to reference the same name you used in the chat script of the first line of this config.

 dialer-group 1 !This is tied to the dialer-list 1, which defined what type of traffic should trigger the cellular connection

 async mode interactive

 

ip nat inside source route-map wan-backup interface Cellular0 overload !NAT statement used for cellular

ip nat inside source route-map wan-primary interface GigabitEthernet0 overload !NAT statement used for DSL

ip route 0.0.0.0 0.0.0.0 <Broadband_Default_Gateway> track 100 !Main default static route pointing towards the DSL, with tracking

ip route 0.0.0.0 0.0.0.0 Cellular0 10!Second default route for secondary connection(Cellular), with a higher AD. This will only get installed in the routing table if the primary route gets removed.

 

 

access-list 100 permit ip 192.168.0.0 0.0.0.255 any !NAT ACL, NAT anything with a 192.168.0.x IP address, which is all PDQs

 

!

route-map sla-route permit 10 !This is a special route-map to make sure the IP SLA traffic goes out the Gi0 interface and not the cellular.

 match ip address sla-packets !Match the ACL for sla packets

 set ip next-hop <Broadband_Default_Gateway> !This is important. We want the SLA traffic to always be routed out the broadband interface so we have a true indication of when there is an issue with the broadband.  if we don’t specify the next hop the SLA traffic will bounce between broadband and cellular as they go up/down, and will constantly change states.

 set interface Null0

!

route-map wan-primary permit 10 !Route map used to define the source IP addresses when on DSL

 match ip address 100

 match interface GigabitEthernet0

!

route-map wan-backup permit 10!Route map used to define the source IP addresses when on cellular

 match ip address 100

 match interface Cellular0

!This EEM applet came up after testing this and have it not work perfectly.  What I saw happening was when the connection failed over the old NAT entries were still pointed towards the previous interface(broadband – when it failed over to cell) causing traffic to not flow correctly.  This EEM applet looks for the Tracking state to change and then clears the existing NAT translations, which allows them to be rebuilt using the new cellular outside IP.  I’d be interested in hearing how other people have handled this without using the applet.  

event manager session cli username “admin”
event manager applet CLEAR_NAT
event syslog pattern “TRACKING-5-STATE”
action 1 cli command “enable”
action 2 cli command “clear ip nat translation forced”
action 3 cli command “exit”

 

Always-on Cellular Backup

If you prefer the cellular connection to stay on all the time then you can make these changes to the above config:

 

Router#config t

Router(config)#int cell 0

Router(config-if)#no dialer-group 1 !Remove the dialer-group 1

Router(config-if)#dialer watch-group 1 !Add a dialer watch-group

Router(config-if)#exit

Router(config)#no dialer-list 1 protocol ip permit !Remove the dialer list

Router(config)# dialer watch-list 1 ip 5.6.7.8 0.0.0.0 !This is a fake IP the router uses to generate traffic and keep the interface up.

Router(config)# dialer watch-list 1 delay route-check initial 60

Router(config)# dialer watch-list 1 delay connect 1

!With the configuration above this point, the cellular would stay up all the time(even with broadband up). These EEM applets will follow the state of the tracking object and perform a ‘shutdown’ of the cell0 interface if the track is ‘UP’ (broadband working), and will perform a ‘no shut’ on the cell0 interface if the track is ‘DOWN’ (broadband down).

event manager applet Cellular_Activate

event track 100 state down

action 1.0 cli command “enable”

action 1.1 cli command “configure terminal”

action 1.2 cli command “interface Cellular0”

action 1.3 cli command “no shutdown”

action 1.4 cli command “end”

!

event manager applet Cellular_Deactivate

event track 100 state up

action 1.0 cli command “enable”

action 1.1 cli command “configure terminal”

action 1.2 cli command “interface Cellular0”

action 1.3 cli command “shutdown”

action 1.4 cli command “end”

Wrapup

Hopefully this was a good overview of how to configure one of the Cisco cellular routers.  There are other ways you could approach the primary/backup configuration, this was just one config I had come up with that gets the job done.  What type of cellular configs are you using?

Advertisement

Troubleshooting with Wireshark IO Graphs : Part 2

In part 1 of this blog post I covered the basics of using IO Graphs.  In this post I’ll cover one additional feature: functions.

Functions

There are 6 functions available for use in the IO Graphs:

  • SUM(*) – Adds up and plots the value of a field for all instances in the tick interval
  • MIN(*) – Plots the minimum value seen for that field in the tick interval
  • AVG(*) – Plots the average value seen for that field in the tick interval
  • MAX(*) – Plots the maximum value seen for that field in the tick interval
  • COUNT(*) – Counts the number of occurrences of the field seen during a tick interval
  • LOAD(*) – Used for response time graphs

Let’s look at a few of these functions in action.

Min(), Avg(), and Max() Function Example

For the first example we’ll look at the minimum, average, and maximum times between frames that are sent.  This is useful to see latency between individual frames/packets.  We can combine these functions with the filter ‘frame.time_delta’ to get a visual representation of time between frames and make increases in round trip latency more visible.  If you were looking at a capture that contained multiple conversations between different hosts and wanted to focus on only one pair of hosts you could combine the ‘frame.time_delta’ filter with the source and destination hosts in a filter like ‘ip.addr==x.x.x.x && ip.addr==y.y.y.y’.  I’ll use this in the example below:

Screen Shot 2014-04-13 at 6.23.38 PM

Here’s a breakdown of what we did:

  • Set the Y-Axis Unit to “Advanced” to make the Calculation fields visible.  If you don’t set this you’ll never see the option to perform calculations.
  • The time interval for the x-axis is 1 second, so each bar you see on the graph represents the calculations for that 1 second interval
  • Filtered on only the HTTP communication between two specific IP addresses using the filter ‘(ip.addr==192.168.1.4 && ip.addr==128.173.87.169) && http’
  • Used three different graphs each with a different calculation – Min(),Avg(), and Max()
  • Applied each calculation on the filter criteria ‘frame.time_delta’Set the style to ‘FBar’ because it helps display the data the best
  • Set the style to ‘FBar’ because it displays the data nicely

Looking at the graph we can see that at 90 seconds the MAX frame.delta_time for traffic in the capture was almost .7 seconds, which is pretty awful and a result of the latency and packet loss I introduced to this example.  If we wanted to zoom into that specific frame and see what was going on we can just click on the point in the graph and it will jump to that frame in the window in the background, which is frame #1003 in the capture if you are looking at the capture.  This capture had latency and packet loss purposely introduced to exaggerate the types of data you might be able to gather from these graphs but apply to any type of capture you are troubleshooting.  If you see relatively low average times between frames and then a sudden jump at one point in time you can click that frame and narrow in to see what happened at that point in time.

 Count() Function Example

The count() function is useful for graphing some of the TCP analysis flags that we looked at in the first blog post such as retransmissions.  Here’s a sample graph:

SnipImage

Sum() Function Examples

The Sum() function adds up the value of a field.  Two common use cases for this are to look at the amount of TCP data in a capture and to examine TCP sequence numbers.  Let’s look at the TCP length example first.  We’ll setup two graphs, one using the client IP 192.168.1.4 as the source, and the other using the client IP as a destination.  For each graph we will apply the Sum() function with the tcp.len filter. By breaking these out into two different graphs we can see the amount of data traveling in a single direction.

Screen Shot 2014-04-13 at 8.47.14 PM

Looking at the graph we can see that the amount of data going towards the client (ip.dst==192.168.1.4 filter) is much higher than the amount of data coming from the client. This is indicated in the red color of the graph.  The black bars show the amount of data traveling from client to server, which is very small in comparison.  This makes sense since the client is simply requesting the file and acknowledging data as it receives it, while the server is sending the large file.  It’s important to note that if you swapped the order of these graphs, putting the client IP as the destination for graph 1 and the client IP as the source in graph 2 that you might not see all of the correct data when using the ‘FBar’ style for both, because the lower the graph number means that graph ends up in the foreground, covering up any higher graph number.

Now let’s look at the TCP sequence number graph for the sample capture that had packet loss and latency.

Screen Shot 2014-04-13 at 9.20.16 PM

 

We can see a number of spikes and drops in the graph indicating problems with the TCP transmission.  Let’s compare that to a ‘good’ TCP transfer:

Screen Shot 2014-04-13 at 9.19.21 PM

In this capture graph we can see a fairly steady increase in the TCP sequence numbers indicating this transfer was fairly smooth without many retransmissions or lost packets.

Wrap-up

I hope this gave a good overview of the type of advanced graphs you can generate using the built-in Wireshark functions.  The filters shown in this post were some of the more common ones and ones that are highlighted in the excellent Wireshark Network Analysis book by Laura Chappell.  There are a number of other graphs you could use with the functions, it really comes down to understanding how your data transfer should look in an ideal situation and what types of things you know will be missing or different in a ‘bad’ capture.  If you don’t understand the underlying technology like TCP or UDP, it will be difficult to know what to graph and look for when an issue does come up.  Let me know if there are any common filters you use with the IO graph feature, and how they have been useful for you.

Troubleshooting with Wireshark IO Graphs : Part 1

One of the lesser used functions of Wireshark is it’s ability to graph different data.  When troubleshooting a problem using a packet capture the amount of data can be overwhelming.  Scrolling through hundreds or thousands of packets trying to follow a conversation or find a problem you don’t know exists can be frustrating.  Wireshark comes with a number of built in graphs that help make these issues become much more obvious.  In this post I’ll cover IO graphs.

Basic IO Graphs

The basic Wireshark IO graph will show you the overall traffic seen in a capture file, usually in a per second rate (either packets or bytes).    By default the X axis will set the tick interval to one second, and the Y axis will be packets per tick.  If you prefer to see the bytes or bits per second, just click the “Unit:” dropdown under “Y Axis” and select which one you want to look at.  Using our example, we can see the overall rate of traffic for all captured traffic.  At the most basic level, this can be useful for seeing spikes ( or dips) in your traffic and taking a closer look into that traffic.  To look into the traffic closer, just click any point on the graph and it will focus on that packet in the background packet list window.  If you want to get a more granular view of the traffic, just click the ‘Tick interval” dropdown under “X-Axis” and select a smaller time interval.  Let’s take a look at the basic components of the IO graph window.

  • Graphs – There are 5 different graph buttons, allowing you to graph up to 5 different things at one time.  Each Graph button is linked to a different color graph (not changeable).  We will go into some further examples using multiple graphs in a little bit.
  • Filters – Each graph can have a filter associated with it. This filter box uses any of the same display filters you would use in the main Wireshark window.
  • Styles – There are four different styles you can use: Line, Impulse, Fbar, and Dots.  If you are graphing multiple items, you might want to choose different styles for each graph to make sure everything is visible and one graph doesn’t cover up another. Graph 1 will always be the foreground layer.
  • X and Y Axis – Wireshark will automatically define both axis’ based on traffic being plotted.  The default for the x axis is 1 second.The X axis default is usually OK for looking at most traffic, but if you are trying to look at bursty traffic you may need to use a smaller X-Axis tick interval. Pixels per tick allows you to alter the spacing of the ticks on the graph.  The default for the y axis is packets per tick. Other options include bytes/tick, bits/tick, or Advanced.  We’ll touch on the Advanced features later on.  The scale is set to auto by default.

Basic Traffic Rate Graph

To start, open up  this sample packet capture, or your own in Wireshark and click on Statistics – IO Graphs.  This capture is an HTTP download that encountered packet loss. I also have a constant ping going to the host. Let’s stop for a second and just point out the obvious. Screen Shot 2014-04-08 at 10.35.39 PM

  • The graph color is black because the default graph is Graph 1, and Graph 1 is always tied to the black color
  • The graph is showing all traffic because the filter box is blank.
  • The default view will show us packets per second.

While the default view of packets/second is OK, it’s not super useful for most troubleshooting I’ve run into.  Let’s change the Y Axis to bits/tick so we can see a traffic rate in bits per second and get a rate of traffic. We can see that the peak of traffic is somewhere around 300kbps.  If you had a capture you were looking at that had places where the traffic rate dropped to zero, that might be a reason to dive further into those time periods and see what is going on.  This is a case where it would be very easy to spot on the graph, but might not be as obvious just scrolling through the packet list.Screen Shot 2014-04-08 at 10.36.13 PM

Filtering

Each graph allows you to apply a filter to it.  There aren’t really any limitations on what you can filter here. Anything that is a display filter is fair game and can help you with your analysis.  Let’s start off with something basic. I’ll create two different graphs, one graphing HTTP traffic and one graphing ICMP.  We can see Graph 1(Black Line style) is filtered using ‘http’ and Graph 2(Red Fbar style) is filtered using ‘icmp’. You might notice there are some gaps in the red Fbar lines which are filtered on ICMP traffic. Let’s have a closer look at those.Screen Shot 2014-04-08 at 10.39.08 PM

 

I’ll set up two Graphs, one showing ICMP Echo(Type=8) and one showing ICMP Reply(Type=0).  If everything were working correctly I would expect to see a constant stream of replies for every echo request.  Let’s see what we have: Screen Shot 2014-04-08 at 10.51.25 PM We can see that the red impulse lines for Graph2(icmp type==0 – ICMP Reply) have gaps and aren’t consistently spread across the graph, while the ICMP requests are pretty consistent across the whole graph.  This indicates that some replies were not received.  In this example I had introduced packet loss to cause these replies to drop.  This is what the ping looked like on the CLI: Screen Shot 2014-04-08 at 10.55.08 PM

Common Troubleshooting Filters

For troubleshooting slow downloads/application issues there are a handful of filters that are especially helpful:

  • tcp.analysis.lost_segment – Indicates we’ve seen a gap in sequence numbers in the capture.  Packet loss can lead to duplicate ACKs, which leads to retransmissions
  • tcp.analysis.duplicate_ack – displays packets that were acknowledged more than one time.  A high number of duplicate ACKs is a sign of possible high latency between TCP endpoints
  • tcp.analysis.retransmission – Displays all retransmissions in the capture.  A few retransmissions are OK, excessive retransmissions are bad. This usually shows up as slow application performance and/or packet loss to the user
  • tcp.analysis.window_update – this will graph the size of the TCP window throughout your transfer.  If you see this window size drop down to zero(or near zero) during your transfer it means the sender has backed off and is waiting for the receiver to acknowledge all of the data already sent.  This would indicate the receiving end is overwhelmed.
  • tcp.analysis.bytes_in_flight – the number of unacknowledged bytes on the wire at a point in time.  The number of unacknowledged bytes should never exceed your TCP window size (defined in the initial 3 way TCP handshake) and to maximize your throughput you want to get as close as possible to the TCP window size.  If you see a number consistently lower than your TCP window size, it could indicate packet loss or some other issue along the path preventing you from maximizing throughput.
  • tcp.analysis.ack_rtt – measures the time delta between capturing a TCP packet and the corresponding ACK for that packet. If this time is long it could indicate some type of delay in the network (packet loss, congestion, etc)

Let’s apply a few of these filters to our capture file: Screen Shot 2014-04-08 at 10.57.45 PM   In this graph we have 4 things going on:

  • Graph 1 (Black Line) is the overall traffic filtered on HTTP, being displayed in packets/tick, the tick interval is 1 second so we are looking at packets/second
  • Graph 2 (Red FBar Style) is the TCP Lost segments
  • Graph 3 (Green Fbar Style) is the TCP Duplicate Acks
  • Graph 4 (Blue Fbar Style) is the TCP Retransmissions

From this capture we can see that there are a fairly large number of retransmissions and duplicate ACKs compared to the amount of overall HTTP traffic(black line). Looking at the packet list alone, you may be able to get some idea that there are a number of duplicate acks and retransmissions going on but it’s hard to get a grasp of when they are occurring throughout the capture and in what proportion they occur compared to overall traffic.  This graph makes it a little clearer.

 

In the next post I’m going to go into using some of the more advanced features of IO graphs such as functions and comparing multiple captures in one graph. Hope this was helpful to get you started with IO graphs.

Using Python to generate Cisco configs

As part of a project to migrate from one MPLS carrier to a new one we were faced with the challenge of deploying a consistent, correct configuration to each of 450 remote sites.  Each of these remote sites is similar in terms of the number of subnets, IP addressing schemes, and router models.  Unfortunately there are are also a number of differences. While the IP schemes follow the same format for each location, each one has it’s own subnet assigned, the interfaces can change between FastEthernet or Gigabit depending on the router type, and the number of hosts in each location that are restricted via ACL varies widely.  As part of this new project there were a total of four different configurations that a remote location could receive.  While it may have been possible to configure each remote site manually, choosing the correct template to follow as we went along, it opened the doors for a huge amount of error.  A typo in an IP address on an interface that went unnoticed may not be caught until later on, a typo in an ACL could open us up to security issues, and applying the wrong template to a location could cause wasted time on troubleshooting.  Enter scripting.

Some background on scripting

I went to an engineering school and majored in Computer Engineering. As part of the standard curriculum I got to take a few programming classes.  I hated them. With a passion. “I’ll never use this”… “Im getting into networking, why waste my time on this”.  I kept that mentality through school, graduated with my BE without any issues. Got my first job as a network technician, got my CCNA shortly after, never having touched programming.  My boss at the time was extremely into Perl.  At the time this is what it came across as in my head:

You need to learn Perl!

Perl! I’ll write a Perl Script!

He had written Perl scripts to accomplish pretty much anything you could think of.  While I could appreciate what they did, I just didn’t have the interest to jump into it, my main focus was networking and there was plenty of networking to learn.Eventually some project came up where I dove into some Perl and came up with a script that did something (I don’t actually remember what it did, it didn’t cure cancer, but it did something to solve a problem), and while it was a good feeling, still didn’t stick with me.

Eventually I changed jobs and saw more and more problems come up that needed solutions.  Call Detail Records that needed reporting without the money for a professional product. I wrote (struggled through) a Perl script to search and spit out call records.  Would not win any programming competitions but it got the job and I felt good about it.  At the end of the day though it was still a struggle to write and I never fully felt like I *got* Perl.  Fast forward another year or so without any real scripting and more problems/challenges were starting to present themselves.  Requirements to push out changes to 450 remote sites, each of which has different IP addressing or requirements. Doing it manually would take an enormous amount of man hours (even at 5 minutes per location, that’s almost an entire full 40 hour work week, and no one actually had a full work week to dedicate to this. Not to mention it’s mind numbingly boring and open to error).  There was no software we owned(or that I found) that would accomplish what I was looking to do so I searched to see what other programming options were out there, and I discovered Python.  It was actually a two part Google Python class on Youtube.  It immediately clicked with me. It made sense, seemed easy and I was able to jump right in.  I was by no means an expert from the start, and still am not, but the amount of resources available on the Internet for Python, and the way the language worked just made sense to me.  Since those two videos I’ve gotten deep into Python and it’s allowed me to build the script this blog post is about.

 

The Problem

This project had four possible templates that could be applied to a site depending on which types of connectivity were available at each site(DMVPN with broadband, DMVPN with cellular, T1, and a combination of those).  Each site was identified by a unique number, and had a corresponding IP scheme.  For example site 123 might have an IP address scheme 10.1.23.0/24, which was then broken down into smaller subnets.  Another difference was the number of hosts each site had that were controlled by access lists.  One site may have 2 servers that needed to be controlled by an ACL, while another may have 3 or 4, each with different IP addresses. Since each site was unique, it wasn’t as easy as using our normal change deployment tool to push out a single change everywhere

 

The Script

Without diving into every line of the script I’ll give you some of the pseudocode and key functions/modules. If you want to know more about how any one piece works feel free to contact me.

Input

Prior to this project starting we already had an Excel file that contained every site, with subnets and IP addressing broken out across a number of different columns, almost 30 in total. In the past this was used mostly for reference and documentation, but would serve as the key input for my script to work.The excel file had a header row of variable names in brackets like [Loopback0] or [DataVlanGW]. Every other row below that is associated with a single site, identified by a unique site ID in one of the columns.  I added a new column to this spreadsheet called “TemplateID”. This column had a possible value of 1 – 4, each number corresponding to a certain configuration template.Other columns are things like site id, loopback address ,gateways, subnet masks, etc.  Filling this spreadsheet out initially was a manual process but it’s important to remember that it’s a one time effort to fill it out.  Once you have this it becomes very powerful for future applications.  I exported this from Excel to a CSV and then copied it to our linux server where I’d be building the Python script. Here’s a snippet of what the Excel headers look like:

excel

Templates

I had four possible Templates I’d be using.  I built the entire router configuration in Notepad++ and replaced any piece of the config that would differ between stores with a variable name in brackets. In the example below I’m using the variable [Loopback0], which is also a column in the spreadsheet:

interface Loopback0

description Network Management

Interface ip address [Loopback0] 255.255.255.255

no shut

exit

So, four of these templates, each slightly different and then saved as Template#.conf in a directory.

templates

You’ll notice there are more than four templates here, one ‘a’ template, and one ‘b’ template.  This was because we had two different router models(2800 and 2900), each of which had slightly different interfaces and IPS configuration.  In my script I check the model of the router and choose the appropriate template.

Processing the input

I use the built in Python function csv.reader to read through every row of the CSV file, storing each column(separated by commas in a CSV) as a variable in an array. The technical term for this in Python is a dictionary. So for example, the first column gets stored as “TemplateId”, the second column stored as “Model”, and so on until I store every column for that row into the dictionary.  I then use ‘if – then’ logic to check the value of the first column.  If templateid = 1 then I load my Template1.conf file, if templateid = 2, then load Template2.conf, etc.  Once the Template file is loaded into a variable I run this one command which basically runs a ‘find/replace’ on any variable surrounded by brackets in the template file and replace it with the values currently in the array for that row.

output = replace_words(tempstr, values)

I found the above function ‘replace_words’ after a little Googling.  ‘tempstr’ in the above function is the base template file I loaded in, and ‘values’ is the name of the dictionary I stored every column for this site ID in the CSV.  I then write this to a new file with the unique file name of the site id.  I know the site ID because it was a column in the CSV that I processed, so it’s trivial to save a file called site-###.txt, pulling ### from the array. Because this entire thing is in a ‘for’ loop, it will repeat this logic for every row in the CSV file until it reaches the end.The best part of this is it processes all 450 rows of this file in about 20 seconds.  450 consistent, full, unique Cisco router configs in 20 seconds.  I’m not automatically copying them or applying them to every device, but I have the config files on hand at this point.  I’ll talk about how I automated pushing them out to each site in the next blog post.

Here’s a snippet of the code accomplishing this first part. This only shows one set of the ‘if-then’ logic for Template #1, but there are 3 other if-thens in the real script to accommodate the other templates.:

script, inputcsvfile = argv

with open(inputcsvfile, “rb”) as infile:

reader = csv.reader(infile)

next(reader, None) #Skip the header line of the CSV

for row in reader:

values = { ‘[Hostname]’:row[2], ‘[Loopback0]’:row[5], ‘[DataVlanNet]’:row[6], ‘[DataVlanGW]’:row[7], ‘[Data2VlanNet]’:row[8], ‘[Data2VlanGW]’:row[9], ‘[WifiVlanNet]’:row[10], ‘[WifiVlanGW]’:row[11], ‘[IPSVlanNet]’:row[16], ‘[IPSVlanGW]’:row[17], ‘[IPSVlanIP]’:row[18], ‘[SwitchIP]’:row[19], ‘[SerialIP]’:row[20], ‘[BGPNeigh]’:row[21], ‘[SerialDesc]’:row[22] } #truncated

outputfile = “outputs/site-“+row[2]+”-NEWCONFIG.txt”

storecopycommandsfile = “StoreCopyCommands/site-“+row[2]+”-r1.txt”

#Next line will check to see what the template ID is set to, and load the appropriate template into variable ‘t’

if row[0] == “1”: #Open the correct template

if row[1] == “2811”:

t = open(“Templates/Template1a.conf”, “r”)

elif row[1] == “2911”:

t = open(“Templates/Template1b.conf”, “r”)

#Store the template into a temp string

tempstr = t.read()

t.close()

#Rip through the template and do a find/replace for the current store number

output = replace_words(tempstr, values)

#Write out the new config file

fout = open(outputfile,”w”)

fout.write(output)

fout.close()

But wait! There’s more!

file7

I mentioned one of the challenges of generating these configs was that each one had an access list with varying numbers of hosts that needed to be included.  Since my CSV file didn’t have columns for every host in an ACL I had to turn to an alternative method.  For this I relied on three things:

  • The current running configurations for each site
  • Regular Expressions
  • More Python

Current configs

Each site already contained existing ACL entries for each host. Since these were already in production, they were known to be working.Here’s an example of something I might be working with:

permit tcp host 10.10.10.10 10.1.1.0 0.0.0.255 eq 9100
permit tcp host 10.10.10.11 10.1.1.0 0.0.0.255 eq 9100

In this case the host that is unique per site is the source host(10.10.10.10 or 10.10.10.11 in this example), and the destination network and port stay consistent from site to site. This example shows two source hosts, other sites may have less or more. It doesn’t matter as you’ll see below.

Regular Expressions (Regex)

If you aren’t familiar with regular expressions I’d strongly recommend you research them and start to practice.  Regular expressions allow you to search for any number of patterns in text and not only match on them, but also store them for further processing.  For learning regular expressions, there is an excellent site I found called www.pythex.org.  This lets you enter in the text you want to  search in one box, and the regular expression you are testing in another, and will highlight what matches and what doesn’t.  It’s an excellent tool for when you are starting out, or when you just need to troubleshoot why something isn’t matching correctly.

In the example ACL above I need to identify the unique source hosts from each line and then do something with them.  The regular expression I used to match this is:

permit tcp host (\d+.\d+.\d+.\d+) 10.1.1.0 0.0.0.255 eq 9100

If you’ve never looked at a regular expression before it can be intimidating, but if you break it down piece by piece it isn’t too bad. Here is what it is searching for:

  • Search for the text ‘permit tcp host ‘
  • Now look for a digit(\d) that repeats one or more times (+) followed by a ‘.’, repeated 3 more times. This should look like an IP address to you.
  • I want to put parentheses around this entire IP address to save it to a variable for further use
  • The IP address should be followed by the text ‘10.1.1.0 0.0.0.255 eq 9100’

The regex I used isn’t perfect, but it isn’t necessarily wrong either.  There are a number of different ways you could write it, some much more accurate than others. For example, I am just checking that there is one or more repeating digits before the ‘.’ but I don’t check that they are valid for use in an IP address, so it would match things like 999.999.999.999.  For my specific use case it doesn’t matter because I’m relying on the current running config(which Cisco already validated when it was entered into the router) as a valid IP address. Here is what this looks like on the Pythex site:

Screen Shot 2014-03-31 at 9.15.00 PM

Python

Now that I searched for the occurrence of the source host, I need to tie it all together with Python. Here’s the pseudo code for this piece:

  • Read every line of the current running config, searching for the regular expression from above
  • If you find a match, store the matching piece(The IP address in () ) into a list in Python, until you get to the end of the file
  • Open the new router config we generated in the previous “Processing the input” section
  • Loop through this file and search for the string “!Inserted ACL”. This was text I included in each template to server as a marker for a place I want to insert these site specific access list entries.Since it has an ‘!’ at the beginning, it doesn’t interfere with the Cisco config, but still allows me to search for it.
  • If you find the string “!Inserted ACL” in the file, then replace it with the site specific access list entries we just found using the regular expressions
  • Repeat this for each match we had for the regular expression
  • Save the file

Here’s a snippet of the actual code:

existingconfig = open(myfilename)

serverips = []

for line in existingconfig:

servermatch = re.search(r’permit tcp host (\d+.\d+.\d+.\d+) 10.1.1.0 0.0.0.255 eq 9100′,line) #Match the regex

if servermatch:

serverips.append(servermatch.group(1)) #If we get a match add it to the list called serverips

serverips = list(set(serverips)) #This just gets the unique values from the list and then saves back to the list

checkforinsert = False #initialize this variable

for line in fileinput.input(outputfile, inplace=1): #loop through our initial saved config file

if line.startswith(‘!InsertedObjectGroups’): #Search each line for the string “!InsertedObjectGroups”

checkforinsert = True #If we find the string, then change the variable checkforinsert to true

else:

if checkforinsert: #if checkforinsert is true, then let’s print out the new object group, using the matches from our regex

print “\nobject-group network Server_Ips”

print “description Server IPs”

for num in serverips: #loop through the Python list for each server IP found

print ” host “,num

print line, #continue to print the lines of text

Disclaimer: Python is very strict about indentation. I’m still working on finding a good way to include pieces of code in the blog, and all of the indents in the above snippet probably don’t line up correctly and would error out, but it should give you an idea of what I’m doing. This is what we end up with after it writes the config:

!InsertedObjectGroups

object-group network Server_Ips

description Server IPs

host 10.10.10.10

host 10.10.10.11

Since the entire thing is in a for loop it doesn’t matter if the site had 1 host or 1,000 hosts. It will just read through the entire file, saving the matches it finds, and then spit them back out into the new config.  Again, this entire piece combined with the previous section only took about 20 seconds to run for 450 sites.  If you had to read through every ACL at the time of a cutover to search for unique hosts per site you would almost be guaranteed to miss one here and there, or type it incorrectly.  This greatly reduces that margin of error, and saves you huge amounts of headache.

Testing

Make sure you test everything you are writing.  Run your script, check the output, apply it to a lab router and see where it errors out. Run it against multiple test cases to make sure you’ve accounted for any anomalies that may come up.  Don’t try to write the entire thing in one shot. The worst thing you could do is try to conquer a large problem with your first script and have it blow up, creating more damage than if you had just configured your devise manually.  Small steps are good and as you become more confident it will become much easier. The idea is to make your life easier, not to get yourself fired because you took on a problem too large for your current skill set.

Wrapping it up

While this post didn’t cover every detail of coding in Python, I hope it gave you enough of a taste to see what is possible once you start to get into it.  I think it’s important to note that I didn’t jump into this as my first script. A number of the pieces in this script came from other previous scripts that were much simpler.  Once you learn how to do each small piece, it becomes easier to combine them all together and really build something that can give you results.  One of the biggest hang ups I had when starting out was I wanted everything to be ‘perfect code’, the most efficient, streamlined code ever written.  It doesn’t work like that though, it’s a process.  What I wrote can definitely be cleaned up, optimized, and just generally improved, but at the end of the day it solved the problem that I needed to solve and I consider it a success. I took a task that previously would have taken 40 hours + and reduced it to 20 seconds.  Going forward I can review the code I wrote, research some more Python and optimize it, but for right now it got the job done and I’m happy.

 

NAT logging

Recently had a cutover where we copied a basic NAT configuration from one router to a new one.  The configuration was very straight forward, similar to the below:

access-list 3 permit any log

ip nat inside source list 3 interface Loopback2 overload

During the cutover I wasn’t seeing any NAT translations building. Reviewed the config, seemed straight forward. Inside interface defined, outside interface defined, access list and an interface to overload.  After staring at it for awhile I noticed that the ACL had the ‘log’ statement on the permit statement.  Something told me this might be causing the issue. I removed the ‘log’ keyword off the ACL and the translations built immediately.

So what happened – what’s the big deal about logging the matching hits for the NAT ACL? When you put the ‘log’ keyword on an ACL it makes the router process switch that traffic. When you process switch the traffic, NAT does not handle the traffic.  This is well documented in this Cisco FAQ: http://www.cisco.com/c/en/us/support/docs/ip/network-address-translation-nat/26704-nat-faq-00.html

I’m still not sure how this config worked on the old router, I still need to look at the entire config further as there were some other differences(IOS version, IPS enabled, and some others) but its clear that you definitely shouldn’t use the log statement on your NAT ACL’s.

Excel IP Functions

I was working on a project the other day where I had a list of ~ 500 /30 subnets in Excel that I needed to break out so each host in the /30 was in it’s own column.  Seemed pretty straight forward, but there was no built in function to handle this.  With all of the other math functions Excel has built in, you would think that in 2014 some functions that deal with IP addresses would be standard.  Guess not.  Rather than spend the time to write one from scratch, a quick Google search came up with an entire set of functions written by Rajeev Bhardwaj. He put together an excellent presentation that shows the usage of each of the functions located here:   http://www.slideshare.net/rajivss/ip-functions-presentationgen-v11 .  This not only helped me solve my current problem but all of the other tools included in the Add-In will definitely get used in the future.

Here’s a quick summary of the included tools:

  • IP_Bits2Mask – Converts the number of bits from a / notation to an expanded subnet mask
  • IP_ErrChk – Checks to see if a value is a valid IP address or not
  • IP_Hosts – Calculates the number of hosts from a specific subnet mask
  • IP_IP2Mbits – Calculates the subnet mask required to obtain a certain number of host addresses
  • IP_Mod – Takes any octet of the IP address and increments/decrements it by the set value
  • IP_NextSub – Calculates the next subnet
  • IP_Bcast – Calculates the broadcast address for a given IP
  • IP_Count – Counts the occurrence of an IP in a range of subnets
  • IP_IsExist – Check if an IP address exists in a subnet
  • IP_Mask2Bits – Calculates the mask bits from a subnet mask in dotted notation
  • IP_Subnet – Calculates the subnet address for a given host IP

I would recommend you visit Rajeev’s site and download the Excel add-in to be able to take advantage of all these functions.  You can visit his site to download the tools here: http://rajivbhardwaj.com/download/

 

Determining Power and Cooling requirements for the Cisco Catalyst (Part 2)

More Power

In the last post I went over determining how much power your Cisco Switch is going to need to run, options for power supply redundancy, and what information your electrician will need to make all of this happen.  This post will expand on that a little more and go over the types of outlets available, building in some resiliency to your electrical design and determining your cooling needs.

Outlets

The type of outlet you choose will be decided by the number of volts required, country you are in, and available cords/plugs from the manufacturer.  I can’t cover every possible receptacle across the world so I’m going to focus on common receptacles in the US when installing network equipment.

NEMA

NEMA (National Electrical Manufacturers Association) is a group that develops standards for electrical manufacturers.  A small piece of what they cover includes the plugs and receptacles used throughout North America.  Plugs and receptacles can be sorted into two broad categories : straight-blade and locking.  Straight blade plugs/receptacles are seen on the majority of general use electrical devices while locking plugs/receptacles are seen more in commercial settings where the locking plug helps prevent accidental disconnection.  For single RU or stackable switches it is common to see a straight-blade plug.  In the chassis based switches and in the datacenter I prefer the locking style plugs and receptacles.  You could use a straight-blade plug or twist-lock plug on either type of switch, a lot of it comes down to preference and what is more common/available. The naming convention for NEMA plugs and receptacles is pretty straight forward. Here’s a diagram I made to help.

NEMA Nomenclature

The common NEMA outlets seen for networking equipment are in the table below:

CommonOutletsNote: L5-20R and L6-30R plugs/outlets look similar, but are not.  It is normally imprinted on the plug and receptacle.

Levels of Redundancy

Designing your switch with redundant power supplies is only one piece of the bigger redundancy picture.  If you don’t fully think through how power is being supplied to your switch you might run into some unexpected outages down the line.  How redundant you go depends a lot on your requirements.  If this is a core switch in a mission critical environment you would probably want to look into a generator, UPS and multiple electrical feeds.  If this is an access-layer switch servicing end users, you may want a UPS, redundant power feeds, or both.  If you don’t think you need any redundancy, you might not even include a UPS (this should be the rare case, always get a UPS when possible).  The more redundancy you add in, the more money you can expect to spend.  Let’s look at a few examples, going from least redundant to most redundant.

Utility Power (Single circuit, Single Power Supply)

power_red1

In this example we have a single power supply in the switch, being serviced by a single outlet.  If we lose power from the outlet or the power supply dies, there will be an outage.

Utility Power (Single Circuit, Dual Power Supplies)

power_red2

In this example we have dual power supplies, each being fed from a separate outlet.  In this case we can tolerate a single power supply failure and still be up, however we have a single point of failure at the power panel since both of these outlets are fed from the same panel and same circuit.  If the power panel has an issue(the single breaker trips for example), the entire switch would be down, regardless of the redundant power supplies.

Utility Power(Dual Circuits, Dual Power Supplies)

power_red3

This shows dual power supplies, each being fed from a separate outlet, and each outlet is fed with a dedicated circuit.  Here we can tolerate a single power supply failure, as well as one of the two dedicated circuit breakers tripping.  If you need redundancy but aren’t getting a UPS for some reason(again, you should be getting a UPS), this would be the design I would want to go with.

UPS(Dual Power Supplies)

power_red4This starts to provide us some more resiliency. We still have redundant power supplies in the event that one dies.  We’re no longer relying on just the utility power to keep the switch up.  We add in a UPS which will give us an additional runtime should the utility power go down for a short time.  It’s important to note that the UPS generally isn’t meant to provide long term back up power.  Usually it is designed to handle short term failures, and while you can add additional batteries it starts to become costly.   The UPS does end up being a single point of failure, bringing us to the next scenario.

Dual UPS(Dual Power Supplies)

power_red5

This provides the same redundancy as the previous example, but adds in an additional UPS, eliminating the UPS as the single point of failure.

Generator, UPS, Dual Power Supplies

power_red6This is the last example I’ll go into, providing the most redundancy so far.  In this case you have Utility power and a generator feeding an Auto Transfer Switch(ATS).  The ATS will switch to generator power if it detects the loss of utility power.  This feeds a UPS which provides cleaner power to the end networking or server equipment.  The UPS also allows you to keep the network switch up in the time it takes the generator to start up and provide electricity.  You would see this setup in the datacenter or environments that can’t tolerate any downtime.

And so on….

There are probably an infinite amount of redundant power configurations I could go into.  This shows the more common ones that I’ve run across, but is definitely not every one out there( you could have two separate utilities feed power, multiple generators, etc).  The main point I hope this gets across is that just having redundant power supplies in your piece of equipment doesn’t mean much unless you consider all of the components that come before electricity makes it to your switch.

 

 

Determining Power and Cooling requirements for the Cisco Catalyst (Part 1)

Speaking with a coworker the other day it dawned on me that there are a number of areas of networking that just aren’t covered as part of the ‘normal’ course of learning.  Things like IGPs,EGPs, spanning tree, FHRPs, and access lists are covered in depth and can be found everywhere from books to blogs to week long training classes.  Then there’s ‘the other stuff’ – Best practices for cabling, UPS design, monitoring, scripting, and power and cooling, just to name a few. The stuff that isn’t documented as well but is still an absolute necessity for most people at some point in their network engineering career.  In the next few blog posts I’m going to run through the basic design process for the power and cooling of your switches.

Power Basics

Before we can jump into determining the power requirements for your switch we need to gain a basic understanding of how power is calculated.  The formula to calculate power is P = VI, or spelled out: Power = Voltage x Current. Power is measured in watts, usually abbreviated by W, Voltage measured in volts abbreviated by V, and Current is measured in Amps, abbreviated by A.

A common analogy that is often used to explain each of the terms is that of a garden hose or plumbing system.  In this analogy the Voltage is equivalent to the pressure in the system, the Current is equivalent to the rate of flow, and the Wattage is the total amount of water that comes out of the hose or pipe.  Let’s say you need to fill up a swimming pool using the hose.  We can turn the water on using the normal faucet and it will fill slowly (low pressure), or if we used some type of external pump we could fill the pool quickly (high pressure).  Similarly, if the network switch we are working with needs more watts(maybe you have PoE line cards with attached phones you need to power), we can increase the voltage which will generate more watts.  For more details on the basic power formula or the water analogy check out this site.

Calculating Power Requirements

At some point in buying the new switch you’ll need to provide the power requirements to someone like an Electrician to make sure you have enough power to run the switch.  Every linecard, supervisor, and PoE device you put into your switch will have some power draw and you need to make sure you have enough power available.  If you’re working with a switch in the Cisco Catalyst(and Nexus 7K) line, they have a very useful tool called the Cisco Power Calculator (Free,CCO Login Required) that lets you build your switch with specific linecards and will output the power and cooling requirements.  Let’s take a look.

After you launch the tool you select the product family from the dropdown.

CPC - Switch Selection

After selecting the product family, you start to populate the switch with the specific chassis model, supervisor, redundant supervisor if applicable, and input voltage.  For input voltage, we have four choices. The first two choices, -48 Volts DC and -60 Volts DC  are used if you’ll be connecting this directly to a battery as the source.  DC voltage  is popular for phone system/telecom equipment as it doesn’t introduce noise (audible hum) into the line like AC would, and provides assurance against power outages.  Bonus fact: The DC voltage is negative to reduce corrosion. The second two options in the dropdown are 100 – 120 Volts and 200 – 240 Volts.  Every deployment of a Cisco switch I’ve done so far has used AC power to supply the switch, either straight from a wall outlet or outputted from a UPS, and the example I’ll run through will use AC voltage.  How do you know which AC voltage you should pick?  There are a couple factors:

  • Country you are deploying the switch in. Some countries have a standard voltage of 100-120 Volts and some countries have a standard of 200-240 Volts.  Find out what voltage is standard in the country the switch will be installed and choose accordingly.  Note that these are the standard voltages, but there may be other options (for example, while 110 Volts is common in houses in the US, it is not uncommon to have 220V circuits installed in offices, datacenters, or other locations that may need additional power). Work with your electrician.
  • Existing electrical wiring or new electrical wiring.  If you are installing this switch and need to utilize the existing outlets or UPS, you can work with the electrician or building management to determine what voltage you have available to you.  If it is a brand new install and you have the ability to tell the electrician what to install, then you can run through the Cisco Power calculator using both voltages (if available in your country) and see which works better for you.

For this example I’m going to select 200-240 Volts, as that’s what I’ve typically worked with for the bigger chassis based switches with multiple linecards.  I also selected redundant Sup720s.

On the next page you can select which linecard will go in each slot.  If you select a PoE linecard it will ask you to specify the number and type of PoE devices attached to that linecard.  It’s important to keep in mind what your future growth may look like.  For example, if you only will have two PoE linecards today, each partially filled with some PoE devices, you may be able to get away with a small power supply and a lower voltage circuit from your electrician today.  If you know that you will be expanding sometime in the near future, it’s much easier to get the bigger power supply and have your electrician wire the electrical circuit appropriately now, then need to revisit this in 6 months or a year when the electrician needs to install all new circuits and you need to buy bigger power supplies.  Each situation will be different so there is no one size fits all, use your judgement.

I’m going to populate this chassis with 4 PoE linecards and tell the wizard that I’ll be connecting 96 Cisco 7941-G-GE phones. The calculator knows the power draw for every PoE device(phones, access points, etc) in the list so you just need to know how many of each you have now (or will have in the future) and give it some good estimate.

Phones

Now we’re ready to look at the results. There’s a lot going on in this page but it’s not that bad if you break it down. The top sections show you how you’ve configured your switch and the MINIMUM power supply and number of power inputs you could get away with to adequately power this switch and connected devices.  It’s important to note the number of inputs. Most of the bigger power supplies may offer the option for multiple inputs.  This is good in that it allows you to supply more power to the switch, but you’ll need to keep in mind that for every additional input into the power supply it means an additional circuit/outlet that your electrician will need to install if this is a new install, or that you would need to already have available if it will be utilizing existing outlets.  The “Percentage Used” section with colored meter will show you what percent of the listed power supply you will be using from the start.  You typically would want to include room for growth.  In my example, if we chose the minimum power 8700 Watt power supply with a single 220 Volt input, we would already be at 96% utilization.  If I’m asked to install an additional line card a few months from now, we will need additional power, most likely in the form of an additional electrical circuit/input.

CPC - Results

Scrolling further down the page you’ll see the calculator gives you your other power supply options.  In general I recommend going for something with the green color bar(<80% utilization), as this will allow you room for future growth.  The power supply you pick will depend on what you can afford, if your company has a standard power supply they use, and how many inputs (circuits/outlets) you have available in the location you will be installing the switch. One other important thing to note is the power supply mode, which I’ll cover in the next section.

CPC - Results 2

Power Redundancy Modes

There are three power modes you can configure most Catalyst chassis based switches:

  • Single
  • Redundant
  • Combined

Single

In this configuration you only have one power supply.  There is no redundancy and you only have the power available from this one supply. I haven’t seen many of these deployed as people usually want redundancy, but if cost is an issue this would be the bare minimum supply you could use.

Redundant

In this configuration you will have two of the same wattage power supply installed.  The total power drawn will never exceed the capability of one of the power supplies.  This ensures that if a power supply were to ever fail you still have enough power remaining for the entire switch.  The failover is designed to be immediate, with no downtime. The way the switch draws power from the supplies will depend on the model of the switch. As an example, the 4500 series will draw from the primary power supply only, and only use the secondary in the event of a failure.  The 6500 series on the other hand will try to balance the power draw across both supplies, but still without ever exceeding the capacity of a single supply.

Combined

With this configuration you are getting the most available power for the switch, at the cost of no redundancy.  The total available power will be the sum of each of the power supplies.  In the event of a power supply failure, there are two scenarios:

  1. The total power consumed never exceeded the power supplied by a single power supply, so you will still have enough to run the switch.  For example, if you have 2 x 6000 W power supplies (12000 W total in combined mode) but your power usage was only 4000W, one power supply could fail and you would still have enough (6000W).
  2. The total power consumed exceeded the power supplied by a single power supply. In this case the switch will need to power down linecards (usually starting from the bottom to the top, with the supervisors last) until it has enough power.  For example, if you have 2 x 6000 W power supplies (12000 W total in combined mode) but your usage was 10,000 W, one power supply fails and you now only have 6000 W available.  Since 8,000 W  > 6000 W available, the switch can’t run as it was previously.  For the 6500,the switch will start shutting off power to PoE devices in descending order, starting with the highest port on the highest numbered module.  If it still doesn’t have enough power it will start powering off the line cards themselves, starting with the highest numbered.  If you run a ‘show power’ command when the switch is in this state you would see power-deny listed in the Operational column for the line cards that were shutdown.

Working with your Electrician

Now that you ran the power calculator and determined which power supplies and how many inputs will be required for each you can work with your Electrician to have the correct outlets installed or verify existing outlets are adequate.  There are four things your Electrician will need to know:

  1. Number of outlets
  2. Voltage for each outlet
  3. Max Amps for each outlet
  4. Outlet type

Number of outlets

Typically there will be one outlet per input of the power supply.  If you have two power supplies, each with 2 inputs, you’ll need a total of four outlets installed.

Voltage for each outlet

The voltage you need will come from the output of the Power Calculator.  If you ran the calculator using 100-120 V then you would tell the Electrician you need a 120V outlet, if you ran the calculator using 220 V, then you need 220V outlets.  Likewise, if you are connecting this to a UPS, make sure the output of the UPS has the correct output voltage.

Amps per outlet

The Amps that each outlet will draw will depend on the power supply you select.  This can usually be found in the spec sheet for the power supply on Cisco’s website under the section “Input Current (per input)”.  See below for an example of a 6000W power supply that has 16A(max) per input.  This is important for the Electrician so they can size the circuit connected to each outlet appropriately.

Input Current

Outlet Type

There are a number of different outlet types available.  The country you are installing in can affect the outlet type as can the requirement for a locking outlet vs non-locking outlet.  I will write another post on the different options for outlets but the most important thing is to make sure the power cords you order with your switch will plug into the outlets the Electrician has installed.  If the room has a certain type of outlet and you order a different power cord, you’ll be out of luck.  Work with your distributor and the Electrician to make sure they match.

 

To be continued…

At this point we ran through the calculator, determined what the power requirements for our switch are, and the basics that need to be communicated to the Electrician in order to get you up and running.  In the next post I’m going to go into further detail on some of the types of outlets that are common, best practices for power redundancy, determining how much cooling you will need for your switch, and some basic commands you can run to check the status of power on your switch.

Monitoring HSRP Failover with EEM

Background

We’re currently using Solarwinds to monitor our network infrastructure. While it does a decent job at monitoring the basics, there are certain things I feel it could do better right out of the box.  I have this preconceived idea of what the ideal NMS should have, looking at your config and then strongly recommending what you should monitor based on what it finds.  Unfortunately I haven’t found that yet.  Solarwinds is great if you want the pretty GUI, NAGIOS is very powerful if you know what you want to monitor and don’t mind a little scripting, but theres nothing I’ve found (yet) that does it all the way I want.  One of the things I think are important is tracking your HSRP failover events.  In a typical old school (no VSS or other newer technologies) redundant environment you would have two switches running HSRP between them, where one takes over if the other fails.  It’s possible you could go months or years without monitoring HSRP without any issues. It’s also possible you could be having constant HSRP events happening that you never even see unless you are specifically looking for them.  These could cause brief interruptions to your user traffic depending on how your timers are tuned, and probably just get reported as general ‘the network is slow/horrible/never works/etc’. To monitor these events without getting too deep into Solarwinds or other monitoring solutions I decided to turn to EEM.

EEM

EEM is a pretty powerful and flexible component of IOS/NX-OS that allows you to track or alert on certain events on Cisco devices.  I won’t go into all of the features that EEM has as it is well documented.  If you are interested in finding out more about EEM I’d recommend looking through Cisco’s site, starting here: https://supportforums.cisco.com/docs/DOC-27996.

HSRP Applet

The applet we are going to write is referred to as a syslog collector script.  It simply monitors the syslog messages that are generated by the router/switch and performs some action based upon the detection of a certain string.Basic, but powerful. Here’s the script I created:

ip name-server 1.2.3.4

event manager environment _mail_smtp smtp.yourcompany.com

event manager environment _mail_rcpt networkteamdistro@yourcompany.com

event manager session cli username “yourusername”

event manager applet HSRPEvent

event syslog pattern “HSRP-5-STATECHANGE”

action 1.0 info type routername

action 2.0 mail server “$_mail_smtp” to “$_mail_rcpt” from “fromemail@yourcompany.com” subject “HSRP State Change on $_info_routername” body “$_syslog_msg”

Some notes on what the script is doing:

  • ip name-server– Required if you are using a DNS name for your SMTP server.
  • event manager environment – both of these lines are setting variables _mail_smtp and _mail_rcpt, which are used later in the script to send mail
  • event manager session cli username – used to set the username you want to run the script as. You do not need a password. IOS uses the username only for authorization purposes, not actual authentication. It will check the authorization locally or against a AAA server like ACS.
  • event syslog pattern – is telling the applet to search the syslogs for the specified pattern
  • action 1.0 info type routername – just stores the router’s current router’s name into a variable.  This is useful so when I get the email I’ll know which switch it’s coming from
  • action 2.0 – Is sending an email using some of the variables from above. The subject has the text “HSRP State Change on <ROUTERNAME>”. The body will contain the actual Syslog text, which will contain the state change.  As an example, the body of the text would look something like this: “3348414: Jan 27 13:51:19.687 EST: %HSRP-5-STATECHANGE: Vlan181 Grp 181 state Standby -> Active”

The Big Picture

This applet is just one small example of what you can do with EEM.  The possibilities are nearly endless.  While this focuses on one specific aspect of the network, I think it’s important to always be looking at your own network and seeing how you can improve, both in the network itself and in monitoring the network.  ‘Ignorance is bliss’ doesn’t really apply to networking.  Problems will eventually catch up to you, usually snowballing into some type of bigger issue and it’s always better to get ahead of them early on.  Monitoring is a big part of that and EEM is a quick way to achieve it.  What type of EEM applets have you found to be useful?

Work smarter, not harder : Excel Concatenate

As a network engineer there’s often times when you need to deal with repetitive tasks such as creating configurations for network devices at multiple sites.  Usually these configs are based on some template, with only minor differences such as IP addresses.  If there are only one or two sites it isn’t bad to create these configurations manually, but if its 10 sites, 20 sites or, hundreds of sites it becomes much worse.  While there a number of different tools or scripts you can program to create these configs, one of the simplest that almost everyone has is Microsoft Excel.

The Functions

Concatenate()

The concatenate function in Excel allows you to take a number of different cells and/or strings of text and concatenate them together.  For example:

=Concatenate(“This is “,”an example”)

Will generate the text “This is an example”. Not the best way to use the function.  A better way would be to refer to cells in the function.

For example, this function:

produces this output, pulling the values from cells ‘A1’ and ‘B1’ and combining them with text

Char()

The Char() function in Excel returns a character based on the Ascii value.  You can pass any ASCII value to the function and it will output the corresponding character.  The main one we are interested in is the “New Line” character.  When we use this in combination with Concatenate() we can put text on different lines.  We can generate this by using ‘CHAR(10)’.

Here’s an example:

will produce this output:

char-ex2

Note: In order for Excel to display the text on different lines you would need to highlight the cell and click the wraptext button.

Building Network Configlets

Now that you understand the basic function we can apply it to networking.  Let’s say you have a spreadsheet listing the various office subnets in 10 different states.  The network was designed so each subnet is a /27 in size with a mask of 255.255.255.224.

spreadsheet

Each site will need it’s own BGP configuration with a unique AS number and advertising all of the subnets at that site.  One way to do this would be to do it manually. Another would be to use the concatenate function in Excel.  Let’s say the config you want to create looks like this:

router bgp <as>

network <wired_network> mask 255.255.255.224

network <wireless_network> mask 255.255.255.224

network <voip_network> mask 255.255.255.224

Using the concatenate() and char() functions we can generate a config per site in only a couple of minutes. This function: concatnetwork-ex1generates this configlet: concatnetwork-ex2

The funtion can now be copied down to the other rows to quickly generate the rest of the sites BGP config:

concatnetwork-ex3

Exporting the Configlets

The last step in the process is to get the configlets from Excel into a text editor or straight into the Cisco CLI.  If you try and copy and paste the cell you used the concatenate function in you will wind up with all of the text between two double quotes.  Since the cell contains the function itself, you can’t double click the cell and copy the text itself.  There are two ways I’ve found to deal with this.

Option 1

One option is to click on the cells you want to copy and paste them into a text editor like Notepad++.  This will give you each configlet surrounded by double quotes.  You can then do something simple like find/replace the double quote with nothing.  See below for a Notepad++ example of this:

notepadplusplus

Option 2

For option 2 you can select the entire column containing the concatenate function outputs and paste them into another column, using the “Paste Special – values only” option.  Once pasted, this will allow you to select and copy / paste the text within the cell(not the cell itself) directly into the CLI without quotes.

paste special

Conclusion

This was one example of how to use Excel to make building multiple configs easier.  It helps to reduce human error as well as decrease the amount of time spent manually building configs.  Other use cases I’ve found for this are building Access List Entries.  It may take some time to get the fields you want to work with in Excel, but once they are in there it becomes a very powerful tool.