Results 1 - 10
of
35
Towards highly reliable enterprise network services via inference of multi-level dependencies
- In SIGCOMM
, 2007
"... Localizing the sources of performance problems in large enterprise networks is extremely challenging. Dependencies are numerous, complex and inherently multi-level, spanning hardware and software components across the network and the computing infrastructure. To exploit these dependencies for fast, ..."
Abstract
-
Cited by 82 (7 self)
- Add to MetaCart
Localizing the sources of performance problems in large enterprise networks is extremely challenging. Dependencies are numerous, complex and inherently multi-level, spanning hardware and software components across the network and the computing infrastructure. To exploit these dependencies for fast, accurate problem localization, we introduce an Inference Graph model, which is welladapted to user-perceptible problems rooted in conditions giving rise to both partial service degradation and hard faults. Further, we introduce the Sherlock system to discover Inference Graphs in the operational enterprise, infer critical attributes, and then leverage the result to automatically detect and localize problems. To illuminate strengths and limitations of the approach, we provide results from a prototype deployment in a large enterprise network, as well as from testbed emulations and simulations. In particular, we find that taking into account multi-level structure leads to a 30 % improvement in fault localization, as compared to two-level approaches.
Loss and delay accountability for the internet
- In Proc. IEEE International Conference on Network Protocols. IEEE
, 2007
"... Abstract — The Internet provides no information on the fate of transmitted packets, and end systems cannot determine who is responsible for dropping or delaying their traffic. As a result, they cannot verify that their ISPs are honoring their service level agreements, nor can they react to adverse n ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
Abstract — The Internet provides no information on the fate of transmitted packets, and end systems cannot determine who is responsible for dropping or delaying their traffic. As a result, they cannot verify that their ISPs are honoring their service level agreements, nor can they react to adverse network conditions appropriately. While current probing tools provide some assistance in this regard, they only give feedback on probes, not actual traffic. Moreover, service providers could, at any time, render their network opaque to such tools. We propose AudIt, an explicit accountability interface, through which ISPs can pro-actively supply feedback to traffic sources on loss and delay, at administrative-domain granularity. Notably, our interface is resistant to ISP lies and can be implemented with a modest NetFlow modification. On our Click-based prototype, playback of real traces from a Tier-1 ISP reveals less than 2% bandwidth overhead. Finally, our proposal benefits not only end systems, but also ISPs, who can now control the amount and quality of information revealed about their internals. I.
Path-quality monitoring in the presence of adversaries
- In ACM SIGMETRICS
, 2008
"... Edge networks connected to the Internet need effective monitoring techniques to drive routing decisions and detect violations of Service Level Agreements (SLAs). However, existing measurement tools, like ping, traceroute, and trajectory sampling, are vulnerable to attacks that make a path look bette ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
Edge networks connected to the Internet need effective monitoring techniques to drive routing decisions and detect violations of Service Level Agreements (SLAs). However, existing measurement tools, like ping, traceroute, and trajectory sampling, are vulnerable to attacks that make a path look better than it really is. In this paper, we design and analyze path-quality monitoring protocols that robustly raise an alarm when packet-loss rate and delay exceeds a threshold, even when adversary tries to bias monitoring results by selectively delaying, dropping, modifying, injecting, or preferentially treating packets. Despite the strong threat model we consider in this paper, our protocols are efficient enough to run at line rate on high-speed routers. We present a secure sketching protocol for identifying when packet loss and delay degrade beyond a threshold. This protocol is extremely lightweight, requiring only 250–600 bytes of storage and periodic transmission of a comparably sized IP packet. We also present secure sampling protocols that provide faster feedback and more accurate round-trip delay estimates, at the expense of somewhat higher storage and communication costs. We prove that all our protocols satisfy a precise definition of secure pathquality monitoring and derive analytic expressions for the trade-off between statistical accuracy and system overhead. We also compare how our protocols perform in the clientserver setting, when paths are asymmetric, and when packet marking is not permitted. 1.
Accurate and Efficient SLA Compliance Monitoring
- In To appear, Proceedings of ACM SIGCOMM
, 2007
"... Service level agreements (SLAs) define performance guarantees made by service providers, e.g, in terms of packet loss, delay, delay variation, and network availability. In this paper, we describe a new active measurement methodology to accurately monitor whether measured network path characteristics ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
Service level agreements (SLAs) define performance guarantees made by service providers, e.g, in terms of packet loss, delay, delay variation, and network availability. In this paper, we describe a new active measurement methodology to accurately monitor whether measured network path characteristics are in compliance with performance targets specified in SLAs. Specifically, (1) we describe a new methodology for estimating packet loss rate that significantly improves accuracy over existing approaches; (2) we introduce a new methodology for measuring mean delay along a path that improves accuracy over existing methodologies, and propose a method for obtaining confidence intervals on quantiles of the empirical delay distribution without making any assumption about the true distribution of delay; (3) we introduce a new methodology for measuring delay variation that is more robust than prior techniques; and (4) we extend existing work in network performance tomography to infer lower bounds on the quantiles of a distribution of performance measures along an unmeasured path given measurements from a subset of paths. We unify active measurements for these metrics in a discrete time-based tool called SLAM. The unified probe stream from SLAM consumes lower overall bandwidth than if individual streams are used to measure path properties. We demonstrate the accuracy and convergence properties of SLAM in a controlled laboratory environment using a range of background traffic scenarios and in one- and two-hop settings, and examine its accuracy improvements over existing standard techniques.
Detection and Localization of Network Black Holes
- In Proceedings of IEEE Infocom
, 2007
"... Abstract — Internet backbone networks are under constant flux, struggling to keep up with increasing demand. The pace of technology change often outstrips the deployment of associated fault monitoring capabilities that are built into today’s IP protocols and routers. Moreover, some of these new tech ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Abstract — Internet backbone networks are under constant flux, struggling to keep up with increasing demand. The pace of technology change often outstrips the deployment of associated fault monitoring capabilities that are built into today’s IP protocols and routers. Moreover, some of these new technologies cross networking layers, raising the potential for unanticipated interactions and service disruptions that the built-in monitoring systems cannot detect. In such instances, failures may cause data packets to be silently dropped inside the network without triggering any alarms or responses (e.g., the failure is not routed around). So-called “silent failures ” or “black holes” represent a critical threat to today’s rapidly evolving networks. In this paper, we present a simple and effective method to detect and diagnose such silent failures. Our method uses active measurement between edge routers to raise alarms whenever endto-end connectivity is disrupted, regardless of the cause. These alarms feed localization agents that employ spatial correlation techniques to isolate the root-cause of failure. Using data from two real systems deployed on sections of a tier-I ISP network, we successfully detect and localize three known black holes. Further, we present simulation results demonstrating that our system accurately and precisely (both greater than 80 % according to our metrics) localizes a variety of failures classes. I.
The flexlab approach to realistic evaluation of networked systems
- in Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI’07
, 2007
"... Networked systems are often evaluated on overlay testbeds such as PlanetLab and emulation testbeds such as Emulab. Emulation testbeds give users great control over the host and network environments and offer easy reproducibility, but only artificial network conditions. Overlay testbeds provide real ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Networked systems are often evaluated on overlay testbeds such as PlanetLab and emulation testbeds such as Emulab. Emulation testbeds give users great control over the host and network environments and offer easy reproducibility, but only artificial network conditions. Overlay testbeds provide real network conditions, but are not repeatable environments and provide less control over the experiment. We describe the motivation, design, and implementation of Flexlab, a new testbed with the strengths of both overlay and emulation testbeds. It enhances an emulation testbed by providing the ability to integrate a wide variety of network models, including those obtained from an overlay network. We present three models that demonstrate its usefulness, including “application-centric Internet modeling” that we specifically developed for Flexlab. Its key idea is to run the application within the emulation testbed and use its offered load to measure the overlay network. These measurements are used to shape the emulated network. Results indicate that for evaluation of applications running over Internet paths, Flexlab with this model can yield far more realistic results than either PlanetLab without resource reservations, or Emulab without topological information. 1
Every Microsecond Counts: Tracking Fine-Grain Latencies with a Lossy Difference Aggregator
"... Many network applications have stringent end-to-end latency requirements, including VoIP and interactive video conferencing, automated trading, and high-performance computing—where even microsecond variations may be intolerable. The resulting fine-grain measurement demands cannot be met effectively ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
Many network applications have stringent end-to-end latency requirements, including VoIP and interactive video conferencing, automated trading, and high-performance computing—where even microsecond variations may be intolerable. The resulting fine-grain measurement demands cannot be met effectively by existing technologies, such as SNMP, NetFlow, or active probing. We propose instrumenting routers with a hash-based primitive that we call a Lossy Difference Aggregator (LDA) to measure latencies down to tens of microseconds and losses as infrequent as one in a million. Such measurement can be viewed abstractly as what we refer to as a coordinated streaming problem, which is fundamentally harder than standard streaming problems due to the need to coordinate values between nodes. We describe a compact data structure that efficiently computes the average and standard deviation of latency and loss rate in a coordinated streaming environment. Our theoretical results translate to an efficient hardware implementation at 40 Gbps using less than 1 % of a typical 65-nm 400-MHz networking ASIC. When compared to Poisson-spaced active probing with similar overheads, our LDA mechanism delivers orders of magnitude smaller relative error; active probing requires 50–60 times as much bandwidth to deliver similar levels of accuracy.
A machine learning approach to TCP throughput prediction
- In ACM SIGMETRICS
, 2007
"... TCP throughput prediction is an important capability in wide area overlay and multi-homed networks where multiple paths may exist between data sources and receivers. In this paper we describe a new, lightweight method for TCP throughput prediction that can generate accurate forecasts for a broad ran ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
TCP throughput prediction is an important capability in wide area overlay and multi-homed networks where multiple paths may exist between data sources and receivers. In this paper we describe a new, lightweight method for TCP throughput prediction that can generate accurate forecasts for a broad range of file sizes and path conditions. Our method is based on Support Vector Regression modeling that uses a combination of prior file transfers and measurements of simple path properties. We calibrate and evaluate the capabilities of our throughput predictor in an extensive set of lab-based experiments where ground truth can be established for path properties using highly accurate passive measurements. We report the performance for our method in the ideal case of using our passive path property measurements over a range of test configurations. Our results show that for bulk transfers in heavy traffic, TCP throughput is predicted within 10 % of the actual value 87 % of the time, representing nearly a 3-fold improvement in accuracy over prior history-based methods. In the same lab environment, we assess our method using less accurate active probe measurements of path properties, and show that predictions can be made within 10 % of the actual value nearly 50 % of the time over a range of file sizes and traffic conditions. This result represents approximately a 60% improvement over history-based methods with a much lower impact on end-to-end paths. Finally, we implement our predictor in a tool called PathPerf and test it in experiments conducted on wide area paths. The results demonstrate that PathPerf predicts TCP throughput accurately over a variety of paths.
Network Loss Inference with Second Order Statistics of End-to-End Flows
, 2007
"... We address the problem of calculating link loss rates from end-to-end measurements. Contrary to existing works that use only the average end-to-end loss rates or strict temporal correlations between probes, we exploit second-order moments of end-to-end flows. We first prove that the variances of lin ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
We address the problem of calculating link loss rates from end-to-end measurements. Contrary to existing works that use only the average end-to-end loss rates or strict temporal correlations between probes, we exploit second-order moments of end-to-end flows. We first prove that the variances of link loss rates can be uniquely calculated from the covariances of the measured end-to-end loss rates in any realistic topology. After calculating the link variances, we remove the un-congested links with small variances from the first-order moment equations to obtain a full rank linear system of equations, from which we can calculate precisely the loss rates of the remaining congested links. This operation is possible because losses due to congestion occur in bursts and hence the loss rates of congested links have high variances. On the contrary, most links on the Internet are un-congested, and hence the averages and variances of their loss rates are virtually zero. Our proposed solution uses only regular unicast probes and thus is applicable in today’s Internet. It is accurate and scalable, as shown in our simulations and experiments on PlanetLab.
A passive state-machine approach for accurate analysis of TCP out-of-sequence segments
- ACM Computer Communication Review
, 2006
"... In this paper we describe a new tool being made available to the networking research community for passive analysis of TCP segment traces. The purpose of the tool is to provide more complete and accurate classification of out-of-sequence segments than those provided by prior tools. One of the crucia ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
In this paper we describe a new tool being made available to the networking research community for passive analysis of TCP segment traces. The purpose of the tool is to provide more complete and accurate classification of out-of-sequence segments than those provided by prior tools. One of the crucial factors that limits the accuracy of prior tools is that these do not incorporate variations across TCP implementations (for different operating systems) that have different parameters (e.g., timer granularity, minimum RTO, duplicate ACK thresholds, etc.) or algorithms that influence what can be inferred about out-of-sequence segments. Our tool explicitly accounts for implementation-specific details in four prominent TCP stacks (Windows, Linux, FreeBSD/Mac OS-X, and Solaris). We validate our tool through several controlled experiments with instances of all four OS-specific implementations used in the analysis. We then run this tool on packet traces of ¡£ ¢ million Internet TCP connections collected from ¡ different locations and present the results. We also include comparisons with results from running selected prior tools on the same traces.

