Results 1 -
4 of
4
Measurement Methods for Fast and Accurate Blackhole Identification with Binary Tomography
"... Binary tomography—the process of identifying faulty network links through coordinated end-to-end probes—is a promising method for detecting failures that the network does not automatically mask (e.g., network “blackholes”). Because tomography is sensitive to the quality of the input, however, naïve ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Binary tomography—the process of identifying faulty network links through coordinated end-to-end probes—is a promising method for detecting failures that the network does not automatically mask (e.g., network “blackholes”). Because tomography is sensitive to the quality of the input, however, naïve end-to-end measurements can introduce inaccuracies. This paper develops two methods for generating inputs to binary tomography algorithms that improve their inference speed and accuracy. Failure confirmation is a perpath probing technique to distinguish packet losses caused by congestion from persistent link or node failures. Aggregation strategies combine path measurements from unsynchronized monitors into a set of consistent observations. When used in conjunction with existing binary tomography algorithms, our methods identify all failures that are longer than two measurement cycles, while inducing relatively few false alarms. In two wide-area networks, our techniques decrease the number of alarms by as much as two orders of magnitude. Compared to the state of the art in binary tomography, our techniques increase the identification rate and avoid hundreds of false alarms.
California Fault Lines: Understanding the Causes and Impact of Network Failures
"... Of the major factors affecting end-to-end service availability, network component failure is perhaps the least well understood. How often do failures occur, how long do they last, what are their causes, and how do they impact customers? Traditionally, answering questions such as these has required d ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Of the major factors affecting end-to-end service availability, network component failure is perhaps the least well understood. How often do failures occur, how long do they last, what are their causes, and how do they impact customers? Traditionally, answering questions such as these has required dedicated (and often expensive) instrumentation broadly deployed across a network. We propose an alternative approach: opportunistically mining “low-quality ” data sources that are already available in modern network environments. We describe a methodology for recreating a succinct history of failure events in an IP network using a combination of structured data (router configurations and syslogs) and semi-structured data (email logs). Using this technique we analyze over five years of failure events in a large regional network consisting of over 200 routers; to our knowledge, this is the largest study of its kind.
management
"... Networks continue to change to support new applications, improve reliability and performance and reduce the operational cost. The changes are made to the network in the form of upgrades such as software or hardware upgrades, new network or service features and network configuration changes. It is cr ..."
Abstract
- Add to MetaCart
Networks continue to change to support new applications, improve reliability and performance and reduce the operational cost. The changes are made to the network in the form of upgrades such as software or hardware upgrades, new network or service features and network configuration changes. It is crucial to monitor the network when upgrades are made because they can have a significant impact on network performance and if not monitored may lead to unexpected consequences in operational networks. This can be achieved manually for a small number of devices, but does not scale to large networks with hundreds or thousands of routers and extremely large number of different upgrades made on a regular basis. In this paper, we design and implement a novel infrastructure MERCURY for detecting the impact of network upgrades (or triggers) on performance. MERCURY extracts interesting triggers from a large number of network maintenance activities. It then identifies behavior changes in network performance caused by the triggers. It uses statistical rule mining and network configuration to identify commonality across the behavior changes. We systematically evaluate MERCURY using data collected at a large tier-1 ISP network. By comparing to operational practice, we show that MERCURY is able to capture the interesting triggers and behavior changes induced by the triggers. In some cases, MERCURY also discovers previously unknown network behaviors demonstrating the effectiveness in identifying network conditions flying under the radar.
A Cooperative Network Monitoring Overlay
"... Abstract. This paper proposes a flexible network monitoring overlay which resorts to cooperative interaction among measurement points to monitor the quality of network services. The proposed overlay model, which relies on the definition of representative measurement points, the avoidance of measurem ..."
Abstract
- Add to MetaCart
Abstract. This paper proposes a flexible network monitoring overlay which resorts to cooperative interaction among measurement points to monitor the quality of network services. The proposed overlay model, which relies on the definition of representative measurement points, the avoidance of measurement redundancy and a simple measurement methodology as main design goals, is able to articulate intra- and inter-area measurements efficiently. The distributed nature of measurement control and data confers to the model the required autonomy, robustness and adaptiveness to accommodate network topology evolution, routing changes or nodes failure. In addition to these characteristics, the avoidance of explicit addressing and routing at the overlay level, and the low-overhead associated with the measurement process constitute a step forward for deploying large scale monitoring solutions. A JAVA prototype was also implemented to test the conceptual model design.

