You are here
Performance Assurance with Infinite Resolution Microburst Analysis – Part 1
Financial, government, mobile and other traffic flows are often bursty when transported across networks or in data centers. This leads to intermittent performance issues, particularly with applications that require reliable, high-speed, low-latency data transmissions. Accurately detecting and addressing these bursts, especially when they have a very short duration, is difficult and costly.
Traditional network monitoring tools offer service verification and multi-layer performance monitoring for SLA assurance purposes, but typically lack the fine granularity measurement and reporting required to analyze microbursts. These can significantly degrade application performance.
Traffic microbursts. What are they?
Generally speaking, packet networks, with their efficient statistical multiplexing capabilities, are much more cost-effective than TDM based alternatives. However, as is typical in life, these benefits come at a cost which in this case is uncertainty.
In TDM networks, you could send your data at exactly 10:45:32.00 AM, knowing it will be received, in full, at the destination exactly X microseconds later. Repeating the same procedure, a few hours/days/months later, would still yield the same behavior. TDM delivered certainty.
Packet networks, on the other hand, are excellent when it comes to bandwidth performance per operational cost, but, in general, delivering certainty is not one of their strong points. Hence, sending your data at a given time does not mean the data will arrive in full at the destination with some fixed delay, and the mechanism to blame here is buffering.
Buffering is the very mechanism that makes packet networks so adaptable, agile and easy to manage in the first place. The fact that you can temporarily store excess data, rather than coordinating a dedicated path through the entire network, is what makes packet networks so appealing. However, very much like at home, as soon as you start storing stuff, you are inherently at risk of clutter or congestion.
Microbursts, as the name suggests, are rapid bursts of data transmitted in quick succession. They are usually characterized by burst size, packet size, and burst density. These rapid bursts can lead to periods of full line-rate transmission that exceed allocated network bandwidth profile i.e. committed burst sizes (CBS/EBS).
Traffic Microbursts - Definitions
Traffic flows often display bursty behavior when transported over Ethernet or IP networks. This burstiness is typically caused by a combination of:
- Application (exchange trades during IPO, TCP incast in data centers, IPTV, etc.)
- Packetization and packet queuing processes in switches and routers – especially when aggregating traffic from several bursty traffic sources.
This bursty behavior is characterized by network congestion events lasting over short time periods. Hence, they are usually referred to as microbursts.
The impact of microbursts
Discussing the effect of microbursts mandates some level of expectation setting for the average reader. If you believe, at this point, that microbursts are also accountable for global warming, you might want to think again. In essence, microburst are usually a relatively mild network phenomenon, which most applications don’t care about. Nevertheless, if not discovered and tackled correctly, microburst could cause severe performance degradation for some applications that are extremely sensitive to it.
Put in simple terms, microbursts cause a temporary increase in network latency, jitter, and packet loss. For example, if the network doesn’t handle microbursts well, excessive packet loss will occur. As TCP traffic is very loss-sensitive, this will also degrade network throughput. As a result, the end application might experience severe performance degradation because it will either have to wait until all the data is received or simply complete the job with partial data fetched. Examples of the impact of these TCP/IP retransmits include:
- Delayed trades for financial trading transactions resulting in great investment losses.
- Dropped VoLTE calls on 5G mobile networks.
The bottom line is that microbursts need to be accurately and efficiently detected and measured to ensure they don’t degrade applications running over the network. Furthermore, if a problem is detected, it must be clear what corresponding actions are required to address it.
Why standard network PM techniques won’t do?
There are quite a lot packet network PM techniques out there. Whether these are Layer 2 Ethernet PM solutions such as the famous ITU-T Y.1731 or Layer 3 protocols such as the RFC 5357 TWAMP, they all have the same common denominator:
- They are all measuring network KPIs, such as loss and delay, using synthetic dedicated PM packets, hence, they have a relatively poor statistical span that is not suitable for catching relatively infrequent events.
- They are all measuring the KPIs over relatively long time periods (1 second and higher). Thus, inherently fail to observe very rapid network events.
As a result, alternative methods have been devised in order to ‘zoom-in’ and allow network measurements that are both statistically sound and characterize network behavior over very short periods of time.
The existing “standard” for microburst measurement
If network congestion is characterized by a momentary significant increase in the amount of traffic that is pushed into the network over a given period of time, then continuously measuring the average network throughput over that given period of time could yield a possible way for measuring microbursts. However, as already discussed in the previous section, “regular” bandwidth utilization measurements, even at fine resolution, such as 1sec, will not reveal microburst situations. The measurement time spans need to shrink.
There are a number of solutions that perform microburst sampling at shorter measurements intervals, such as 10 or 100ms which may provide some hints. However, even they cannot differentiate microbursts from “normal” momentary traffic increases. Even finer resolution sampling does not guarantee correct results and can actually increase the number of false positives – especially when you take into consideration that packets are always carried at line rate which will look like a microburst if your observation time intervals is short enough.
Another problem is that these methods do not take into account the amount of memory of the underlying packet network, usually characterized by the Committed Burst Size - CBS. Thus, even if the measured throughput exceeds the Committed Information Rate – CIR, it might not create a problem due to the network’s absorption capacity (the CBS). Hence, the existing methods tend to yield an excess of false-alarms (false-positive indications).
In an attempt to bridge this gap and bring a complete microbursts solution, RAD has developed the Infinite Resolution microBurst Analysis (IRuBA) technology. We will discuss it in part 2 of this blog post.
 The value of X itself was in many cases unknown (at least to a precise level) but it was always fixed.
 An exception to that is the Y.1731’s ETH-LM that can be used to measure non-synthetic single/dual ended frame loss.
About RAD's Blog
We’ll be blogging on a wide range of hot topics affecting service providers and critical infrastructure network operators. Our resident experts will be discussing vCPE, Cyber Security, 5G, Industrial IoT and much, much more.