Managing Packet Captures

Packet captures are an important part of the network engineers toolkit.  They provide a look into what is really going on in your network and help get to the bottom of troubleshooting an issue very quickly.  In addition to getting to the bottom of a problem, they also serve as a great learning tool to get a better understanding of how different protocols work, and more importantly how they work in your network.  A company called QA cafe has a really great product called Cloudshark, that allows you to manage and analyze your packet captures without installing any software like Wireshark locally. Everything is handled in the web browser.  I wanted to write a quick post to take a look at the available options from Cloudshark and how they might work best for you.

Overview

Cloudshark was intended to be used as a hardware or VM appliance within a company.  Employees could then upload packet captures to the appliance for storage and analysis.  They currently offer a Solo, Professional, and Enterprise version, with the biggest difference being the number of accounts you can create on each and an ability to integrate with Active Directory for the enterprise version.  I recently setup the enterprise VM appliance and it was extremely quick to get going, requiring barely any input from me.  If you aren’t sure if you want to commit to spending money on the product and want to try it out, or need to send someone a packet capture (that doesn’t contain sensitive information) for further review, they do have a page that allows you to upload up to 10MB of a capture, and then will generate a URL you can send off to someone else.  I encourage you to check it out here:https://appliance.cloudshark.org/upload/

Features

Cloudshark really worked to get as many features from Wireshark into the web based product, to the point that sometimes you forget that you are working in a web browser.  When you first login to the product you are presented with a page that has a list of your currently uploaded files, as well as a place to upload new files, or search for a saved capture. The interface is clean, and easy to find what you’re looking for.

 

Advertisements

Increase Cisco TFTP speed

I was recently copying a fairly large 400 MB IOS image to one of our ASR routers and it was taking forever via TFTP.  I had seen this before but never really took any time to look into it further. I always switched to FTP, the transfer went faster, and I never looked back.  This time I decided to go to Wireshark and take a deeper look. In this post I’ll show you why it’s slow and how to improve the speed, but perhaps more importantly, how to get to the bottom of something like this using Wireshark. 

Default TFTP Setting

I performed a packet capture on a TFTP session using the default Cisco router and TFTP server settings.  It immediately became clear what the issue was.  Here is a screenshot as well as a link to part of the capture file on Cloudshark.

tftpdefaultscreencap

The length of each of the frame is ~500 bytes.  This was being transferred over Ethernet, which has a max frame size of 1518 bytes.  This means we weren’t fully taking advantage of our available frame size.  It’d be the equivalent if I told you to empty a swimming pool and you had the option to use a small plastic cup or a 5 gallon bucket for each trip you took to the pool.  The 5 gallon bucket would require far less trips back and forth and decrease the total time needed to empty the pool.

According to the RFC for TFTP, TFTP will transfer data in blocks of 512 bytes at a time, which is what we were seeing with our default settings.

Make it faster

So how do we make this go faster? Well, besides using one of the other TCP based alternatives like SCP or FTP, there is an option in IOS available to increase the TFTP blocksize.  In my case I am using an ASR router and the option was there. I didn’t look into seeing which other platforms/ IOS versions this is supported in. 

The command you are interested in is: ip tftp blocksize <blocksize> In my case I chose to set the blocksize to 1200 bytes because I have the Cisco VPN client installed which changes your MTU size to 1300 bytes and I didn’t want to deal with fragmentation.  Here’s a screenshot of the transfer with the updated block size and link to capture on Cloudshark.org.

tftpincreasedblocksize

Confirming the increase

Besides seeing the bigger blocksize in the capture and noticing the speed was faster, let’s back it up with some real data.  If you click the Statistics – Summary menu you can see an average rate for each capture.

Here’s the ‘before’ rate with the default block size:

defaultblocksizesummary

And here is the summary using the increased block size of 1200 bytes:

increasedblocksizesummaryThat’s almost a 2.5 time increase in performance just by changing the block size for TFTP! Depending on your MTU you may be able to increase this even further, above the 1200 bytes I chose for this example.

Wrapup

Hope this was helpful in not only seeing how you can increase the speed of your transfers with TFTP, but also to see how to troubleshoot what causes issues like this and use tools like Wireshark to get to the bottom of it.  One thing to note, TFTP is often the go to default for transferring files to routers and switches but depending on your use case there may be other options that are better.  If you are using an unreliable link you may be better off going with the TCP based FTP option, or if you need to securely transfer something SCP is a solid bet.  It all depends on what your requirements are.

Microburst Detection with Wireshark

I recently ran into an issue that was new to me, but after some further research proved to be fairly well known phenomenon that can be difficult to track down. We had a Cisco linecard with some servers connected to it that were generating a fairly large number of output drops on an interface, while at the same time having a low average traffic utilization.  Enter the microburst.

Microbursts are patterns or spikes of traffic that take place in a relatively short time interval(generally sub-second) causing network interfaces to temporarily become oversubscribed and drop traffic. While bursty traffic is fairly common in networks and in most cases is handled by buffers, in some cases the spike in traffic is more than the buffer and interface can handle.  In general, traffic will be more bursty on edge and aggregation links than on core links.

Admitting you have a problem

One of the biggest challenges of microbursts is that you may not even know they are occurring.  Typical monitoring systems(Solarwinds,Cacti,etc) pull interface traffic statistics every one or five minutes by default.  In most cases this gives you a good view into what is going on in your network on the interface level for any traffic patterns that are relatively consistent.  Unfortunately this doesn’t allow you to see bursty traffic that occurs in any short time interval less than the one you are graphing.

Since it isn’t practical to change your monitoring systems to poll interfaces every second, the first place you might notice you are experiencing bursty traffic is in the interface statistics on your switch under “Total Output Drops”. In the output below, we are seeing output drops even though the 5 minute output rate is ~2.9Mbps (much less than the 10Mbps the interface supports).

Switch#sh int fa0/1 | include duplex|output drops|rate
  Full-duplex, 10Mb/s, media type is 10/100BaseTX
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 7228
  5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 2921000 bits/sec, 477 packets/sec

    

Output drops are caused by congested interfaces.  If you are consistently seeing output drops increment in combination with reaching the line speed of the interface your best option is to look into increasing the speed of that link.  In this case you would most likely see this high traffic utilization on a graph.  If you are seeing output drops increment, but the overall traffic utilization of that interface is otherwise low, you are most likely experiencing some type of bursty traffic. One quick change you can make to the interface is to shorten the load interval from the default 5 minutes to 30 seconds using the interface-level command ‘load-interval 30’.  This will change the statistics shown in the output above to report over a 30 second interval instead of 5 minutes and may make the bursty traffic easier to see.  There’s a chance that even 30 seconds may be too long, and if that is the case we can use the Wireshark to look for these bursts.

Using Wireshark to identify the bursty traffic

I setup a simple test in the lab to show what this looks like in practice.  I have a client and server connected to the same Cisco 2960 switch, with the client connected to a 100Mbps port and a server connected to a 10Mbps port. Horribly designed network, and most likely would not be seen in production, but will work to prove the point. The client is sending a consistent 3Mbps stream of traffic to the server using iperf for 5 minutes.  Approximately 60 seconds into the test I start an additional iperf instance and send an additional 10 Mbps to the server for 1 second.  For that 1 second interval the total traffic going to the server is ~13Mbps, greater than it’s max speed of 10Mbps.
While running the tests I used SNMP to poll the interface every 1 minute to see what type of traffic speeds were being reported
throughputgraph
From the graph you can see that it consistently shows ~ 3Mbps of traffic for the entire 5 minute test window. If you were recording data in 1 minute intervals with Solarwinds or Cacti, everything would appear fine with this interface. We don’t see the spike of traffic to 14Mbps that occurs at 1 minute.
I setup a span port on the server port and sent all the traffic to another port with a Wireshark laptop setup.  Start your capture and let it run long enough to capture the suspected burst event (if the output drops seem to increase at the same time each day this may help you in narrowing in on the issue).  Once your capture is done open it up in Wireshark.  The feature we want to use is the “IO Graph”. You can get there via “Statistics” -> “IO Graph”
Wireshark IOGraph

Wireshark IOGraph

Next we need to look at our X and Y Axis values.  By default the X Axis is set to a “Tick Interval” of 1 second.  In my case 1 second is a short enough interval to see the burst of traffic. If your spike in traffic is less then 1 second (millisecond for example), you may need to change the Tick Interval to something less than 1 second like .01 or .001 seconds. The shorter the time interval you choose, the more granular of a view you will have. Change the “Y-Axis” Unit from the default “Packets/Tick” to “Bits/Tick” because that is the unit the interface is reporting.  It immediately becomes obvious on the graph that we do have a spike in traffic, right around the 60 second mark.
IOGraph Burst
In my case the only traffic being sent was test iPerf traffic, but in a real network you would likely have a number of different hosts communicating.  Wireshark will let you click on any point in the graph to view the corresponding packet in the capture  If you click on the top of the spike in the graph,  the main Wireshark window will jump to that packet. Once you identify the hosts causing the burst you can do some further research into what application(s) are causing the spike and continue your troubleshooting.

Conclusion

Identifying microbursts or any bursty traffic is a good example of why it’s important to ‘know what you don’t know’.  If someone complains about seeing issues on a link it’s important not to immediately dismiss the complaint and do some due diligence.  While monitoring interface statistics via SNMP in 1 or 5 minute intervals is an excellent start, it’s important to know that there may be things going on in the network that aren’t showing up in those graphs.  By utilizing a number of different tools you can trace down problems. Reducing the interface load-interval to 30 seconds and tracking your output drops is a good start.Using Wireshark allows you to dive further into the problem and figure out what traffic is causing or contributing to the drops.