To SPAN or to TAP – That is the question!

 

Ixia Network Visibility Solutions welcomes a guest blogger today, Tim O’Neill from LoveMyTool.

Network engineers and managers need to think about today’s compliance requirements and the limitations of conventional data access methods. This article is focused on TAPs versus port mirroring/SPAN technology.

SPAN is not all bad, but one must be aware of its limitations. As managed switches are an integral part of the infrastructure, one must be careful not to establish a failure point. Understanding what can be monitored is important for success. SPAN ports are often overused, leading to dropped frames due to the fact that LAN switches are designed to groom data (change timing, add delay) and extract bad frames as well as ignore all layer 1 & 2 information. Furthermore, typical implementations of SPAN ports cannot handle FDX monitoring and analysis of VLAN can also be problematic.

Moreover, when dealing with data security compliance, the fact that SPAN ports limit views and are not secure transporting monitored traffic through the production network could prove itself to be unacceptable in the court of law.

When used within its limits and properly focused, SPAN is a valuable resource to managers and monitoring systems. However, for 100% guaranteed views of network traffic, passive network TAPs are a necessity for meeting many of today’s access requirements as we approach larger deployments of 10 Gigabit and up. It’s in this realm that SPAN access limitations become more of an issue.

SPANs vs. TAPs

Until the early 1990s, using a TAP or test access port from a switch patch panel was the only way to monitor a communications link. Most links were WAN so an adaptor like the V.35 adaptor from Network General or an access balum for a LAN was the only way to access a network. In fact, most LAN analyzers had to join the network to really monitor it.

As switches and routers developed, along came a technology we call a SPAN port or mirroring port; and with this monitoring was off and running. SPAN generally stands for Switch Port for Analysis and was a great way to effortlessly and non-intrusively acquire data for analysis. By definition, a SPAN Port usually indicates the ability to copy traffic from any or all data ports to a single unused port but also usually disallows bidirectional traffic on that port to protect against backflow of traffic into the network.

Analyzers and monitors no longer had to be connected to the network. Engineers could use the SPAN (mirror) port and direct packets from their switch or router to the test device for analysis.

Is a SPAN port a passive technology?  No!

Some call a SPAN port a passive data access solution – but passive means “having no effect” and spanning (mirroring) does have measurable effect on the data. Let’s look at the facts.

  • Spanning or mirroring changes the timing of the frame interaction (what you see is not what you get).
  • The spanning algorithm is not designed to be the primary focus or the main function of the device, like switching or routing, so the first priority is not spanning and if replicating a frame becomes an issue, the hardware will temporally drop the SPAN process.
  • If the speed of the SPAN port becomes overloaded, frames are dropped.
  • Proper spanning requires that a network engineer configure the switches properly and this takes away from the more important tasks required by network engineers. Many times configurations can become a political issue (constantly creating contention between the IT team, the security team and the compliance team).
  • The SPAN port drops all packets that are corrupt or those that are below the minimum size, so all frames are not passed on. All of these events can occur and no notification is sent to the user; so there is no guarantee that one will get all the data required for proper analysis.

 

In summary, the fact that SPAN ports are not a truly passive data access technology, or even entirely non-intrusive, can be a problem for data security compliance monitoring or lawful intercept. Since there is no guarantee of absolute fidelity, it is possible or even likely that evidence gathered by this monitoring process will be challenged in the court of law.

Are SPAN ports a scalable technology?  No!

When we had only 10Mbps links and a robust switch (like ones from Cisco), one could almost guarantee they could see every packet going through the switch. With 10Mbps fully loaded at around 50% to 60% of the maximum bandwidth, the switch backplane could easily replicate every frame. Even with 100Mbps one could be somewhat successful at acquiring all the frames for analysis and monitoring, and if a frame or two here and there were lost, it was no big problem.

This has all changed with Gigabit and 10 Gigabit technologies, starting with the fact that maximum bandwidth is now twice the base bandwidth – so a Full Duplex (FDX) Gigabit link is now 2 Gigabits of data and a 10 Gigabit FDX link is now 20 Gigabits of potential data.

No switch or router can handle replicating/mirroring all this data, plus handle its primary job of switching and routing. It is difficult if not impossible to pass all frames (good and bad one) including FDX traffic at a full-time rate, in real time at non-blocking speeds.

Adding to this FDX need, we must also consider the VLAN complexity and finding the origin of a problem once the frames have been analyzed and a problem detected.

From Cisco’s own white paper, “On SPAN Port Usability and Using the SPAN Port for LAN Analysis,” the company warns “the switch treats SPAN data with a lower priority than regular port-to-port data.” In other words, if any resource under load must choose between passing normal traffic and SPAN data, the SPAN loses and the mirrored frames are arbitrarily discarded. This rule applies to preserving network traffic in any situation. For instance, when transporting remote SPAN (RSPAN) traffic through an Inter Switch Link (ISL), which shares the ISL bandwidth with regular network traffic, the network traffic takes priority. If there is not enough capacity for the remote SPAN traffic, the switch drops it. Knowing that the SPAN port arbitrarily drops traffic under specific load conditions, what strategy should users adopt so as not to miss frames? According to Cisco, “the best strategy is to make decisions based on the traffic levels of the configuration and when in doubt to use the SPAN port only for relatively low-throughput situations.”

Hubs? How about them?

Hubs can be used for 10/100 access but they have several issues that one needs to consider. Hubs are really half duplex devices and only allow one side of the traffic to be seen at a time. This effectively reduces the access to 50% of the data.

The half duplex issue often leads to collisions when both sides of the network try to talk at the same time. Collision loss is not reported in any way and the analyzer or monitor does not see the data. The big problem is if a hub goes down or fails, the link it is on is lost. As such, hubs no longer fit as an acceptable, reliable access technology and do not support Gigabit or above access and should not be considered.

Today’s “REAL” Data Access Requirements

To add more complexity and challenges to SPAN port as a data access technology, consider the following:

  • We have entered a much higher utilization environment with many times more frames in the network.
  • We have moved from 10Mbps to 10Gbps Full Duplex – today many have even higher rates of 40 and 100Gbps.
  • We have entered into the era of data security, legal compliance and lawful intercept, which require that we monitor all of the data and not just “sample” the data – with the exception of certain very focused monitoring technologies (e.g., application performance monitoring).

These demands will continue to grow, as we have become a very digitally focused society. With the advent of VoIP and digital video we now have revenue-generating data that is connection-oriented and sensitive to bandwidth, loss and delay. The older methods need reviewing and the aforementioned added complexity requires that we change some of the old habits to allow for “real” 100% Full Duplex real-time access to the critical data.

In summary, being able to provide “real” access is not only important for data compliance audits and lawful intercept events; it is the law. Keeping our bosses out of jail has become very high priority these days; but I guess it depends on how much you like your boss.

When is SPAN port methodology “OK”?

Many monitoring products can and do successfully use SPAN as an access technology. These are effective for low-bandwidth application layer events like conversation analysis, application flows and connection information, and for access to reports from call managers, etc., where time based or frame flow analysis is not needed.

These monitoring requirements utilize a small amount of bandwidth and grooming does not affect the quality of the reports and statistics. The reason for their success is that they keep within the parameters and capability of the SPAN port and do not need every frame for successful reporting and analysis. In other words, a SPAN port is a very usable technology if used correctly and, for the most part, the companies that use mirroring or SPAN are using it in well-managed and tested methodologies.

Conclusion

Spanning (mirroring) technology is still viable for some limited situations, but as one migrates to FDX Gigabit and 10 Gigabit networks, and with the demands of seeing all frames for data security, compliance and lawful intercept, one must use “real” access TAP technology to fulfill the demands of today’s complex analysis and monitoring technologies. With today’s large bandwidths the TAP should feed an advanced and proactive filtering technology for the clearest of view!

If the technology demands are not enough, network engineers can focus their infrastructure equipment on switching and routing and not spend their valuable resources and time setting up SPAN ports or rerouting data access.

In summary, the advantages of TAPs compared to SPAN/mirror ports are:

  • TAPs do not alter the time relationships of frames – spacing and response times are especially important with RTPs like VoIP and Triple Play analysis including FDX analysis.
  • TAPs do not introduce any additional jitter or distortion nor do they groom the flow, which is very important in all real-time flows like VoIP/video analysis.
  • VLAN tags are not normally passed through the SPAN port so this can lead to false issues detected and difficulty in finding VLAN issues.
  • TAPs do not groom data nor filter out physical layer errored packets.
  • Short or large frames are not filtered/dropped.
  • Bad CRC frames are not filtered.
  • TAPs do not drop packets regardless of the bandwidth.
  • TAPs are not addressable network devices and therefore cannot be hacked.
  • TAPs have no setups or command line issues so getting all the data is assured and saves users time.
  • TAPs are completely passive and do not cause any distortion even on FDX and full bandwidth networks.
  • TAPs do not care if the traffic is IPv4 or IPv6; it passes all traffic through.

 

So should you use a TAP to gain access to your network frames? Now you know the differences, and it is up to you to decide based on your goals!

The four main Types of TAPs – provided by Garland Technologies are:

Breakout TAPs are the simplest type of TAP. In their most basic form they have four ports. The network traffic travelling in one direction comes in port A and is sent back out port B unimpeded. Traffic coming from the other direction arrives in port B and is sent back out port A, also unimpeded. The network segment does not “see” the TAP. At the same time the TAP sends a copy of all the traffic to monitoring ports C & D of the TAP. Traffic travelling from A to B in the network is sent to one monitoring port and the traffic from B to A is sent out the other, both going to the attached tool.

IMPORTANT: Make sure the TAP incorporates a failsafe feature. This will ensure that if the TAP were to lose power or fail, the network will not be brought down as a result.

Aggregating TAPs provide the ability to take network traffic from multiple network segments and aggregate, or link bond, all of the data to one monitoring port. This is important because you can now use just one monitoring tool to see all of your network traffic. With the addition of filtering capability in the TAP you can further enhance your tools efficiency by only sending the data it needs to see.

Regeneration TAPs facilitate taking traffic from a single network segment and sending it to multiple ports. This allows you to take traffic from just one point in the network and send it to multiple tools. Therefore different teams in your company like security, compliance, or network troubleshooting can see all the data at the same time for their own requirements. This leads to no team contention over available network monitoring point availability.

Bypass TAPs allow you to place network devices like IPS/IDS, data leakage prevention (DLP), firewall, content filtering and security devices, that need to be installed inline, into the network while removing the risk of introducing a point of failure. With a bypass TAP, failure of the inline device, reboots, upgrades, or even removal and replacement of the device can be accomplished without taking down the network. In applications requiring inline tools, bypass TAPs save time, money and network downtime.

In the next part I will review and compare VACLs, RSPAN and Cloud TAPs.

Want more on TAPs, SPAN ports, even comparative tests and Sharkfest classes? Visit www.lovemytool.com.

To read an excellent paper on Full Duplex TAP basics go to: http://www.networkinstruments.com/assets/pdf/nTAP_FullDuplex_wp.pdf

Here’s a little bit about Tim:

Tim O’Neill - The “Oldcommguy™”
Technology Website - www.lovemytool.com
Committee Chairman for Cyber Law Enforcement training and Cyber Terrorism
For Georgia State Senator John Albers
Please honor and support our Troops, Law Enforcement and First Responders!
All Gave Some – Some Gave All – All deserve our Respect and Support!

High Speed Ethernet (40G Networks) and You

 

We hear from customers that the transition to 40G/100G networks is driven purely from cost savings and ease of management.  It seems no one really wants to run x8 10G connections (especially fiber) in an ether channel when you can simply run x2 40G connections – the idea is to get away from 8 port link aggregation groups and to a more manageable number.  In addition, 40G networks have reduced physical space requirements, less cable to run, less transceivers to buy and maintain, and less power to operate the data center versus their predecessors.

But moving from 1G or 10G to 40G introduces all kinds of interesting nuances people haven’t really thought through yet.

40G is more than just a bigger pipe when it comes to monitoring network data.  The luxury of dumping a whole bunch of network data on security and performance monitoring tools and letting them sort it out goes away.  There’s just too much data.

In this brave new world, network data must be delivered to tools to suit their specific needs in order to get the best results.  Strike that, to get the tools to work at all, never mind good performance, they have specific data dietary requirements.

There really aren’t security and performance monitoring tools that exist for 40G networks.  They haven’t been built yet.  This makes life pretty tricky if you are implementing 40G, or plan to do so in the next couple of years.

Another issue is that security and monitoring tools are often implemented in appliances that simply don’t have the processing power to handle 40G bandwidth.  So even if the software is modified to handle 40G, the box isn’t ready.  In addition, monitoring tools are typically processor and disk bound, which raises an interesting point: Processing capacity improvement follows Moore’s law, doubling every 24 months.  Bandwidth consumption has been doubling every 12 to 18 months, according to the Ethernet Alliance.

What you need to do is deliver just the data that your network tools need.  The first step is being able to filter out the data the tools do not need.

In Anue terms, this includes ingress filtering, which filters at the input of the network monitoring switch (Anue’s NTO) and discard packets that are not of interest to any monitoring tool.

Then you need to need center stage filters, which Anue calls Dynamic Filtering.  Dynamic Filtering addresses problems that occur when some packets meet the filter criteria of multiple tools and must be sorted out properly for each tool to do its job.

Finally, you need egress filtering, which filters at the output of the network monitoring switch.  For example, the egress filter could drop HTTP traffic for a particular tool and have no impact on other tool ports.  Filtering is going to solve a lot of your problems, but not all.

You’ll also need to do some load balancing to spread the analysis work across your tools, since they are not capable of “drinking from the 40G firehose.”  In Anue vernacular, load balancing is not the traditional splitting of network traffic into equal loads across multiple tool ports.  That approach can hinder data packet analysis by session-level: for example with VOIP monitoring, you need to analyze data collectively based on session.

Anue’s approach to load balancing is achieved by using layer 2, 3 and 4 packet header information to identify and deliver related traffic to the same physical tool port, maintaining the integrity of the sessions.

Here’s how load balancing might work.  Say I have four 10G Computer Associates Web Monitors that I need to use to monitor a 40G network.  I set up a load balancing port group of the four 10G web monitors as below.

Intelligent Load Balancing with Anue Systems NTO - the network monitoring switch

Now I’m set up to monitor 40G, using my 10G tools.  And, instead of simplistically dividing the network traffic across the four web monitors, I can set up criteria, such as dividing up IP addresses across the monitors, or setting VLAN address ranges for each tool, as required to keep the session integrity as discussed above.

 Intelligent Load Balancing with Anue Systems NTO - the network monitoring switch