Results 1 - 10
of
297
Data Streams: Algorithms and Applications
, 2005
"... In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerg ..."
Abstract
-
Cited by 533 (22 self)
- Add to MetaCart
In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints on space, time, and number of passes. Some of the methods rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity. The applications for this scenario include IP network traffic analysis, mining text message streams and processing massive data sets in general. Researchers in Theoretical Computer Science, Databases, IP Networking and Computer Systems are working on the data stream challenges. This article is an overview and survey of data stream algorithmics and is an updated version of [175].1
Analyzing Peer-to-Peer Traffic Across Large Networks
- IEEE/ACM Transactions on Networking
, 2002
"... Abstract—The use of peer-to-peer (P2P) applications is growing dramatically, particularly for sharing large video/audio files and software. In this paper, we analyze P2P traffic by measuring flowlevel information collected at multiple border routers across a large ISP network, and report our investi ..."
Abstract
-
Cited by 383 (3 self)
- Add to MetaCart
Abstract—The use of peer-to-peer (P2P) applications is growing dramatically, particularly for sharing large video/audio files and software. In this paper, we analyze P2P traffic by measuring flowlevel information collected at multiple border routers across a large ISP network, and report our investigation of three popular P2P systems—FastTrack, Gnutella, and Direct-Connect. We characterize the P2P trafffic observed at a single ISP and its impact on the underlying network. We observe very skewed distribution in the traffic across the network at different levels of spatial aggregation (IP, prefix, AS). All three P2P systems exhibit significant dynamics at short time scale and particularly at the IP address level. Still, the fraction of P2P traffic contributed by each prefix is more stable than the corresponding distribution of either Web traffic or overall traffic. The high volume and good stability properties of P2P traffic suggests that the P2P workload is a good candidate for being managed via application-specific layer-3 traffic engineering in an ISP’s network. Index Terms—File sharing, peer-to-peer, P2P, traffic characterization, traffic measurement.
Diagnosing Network-Wide Traffic Anomalies
- In ACM SIGCOMM
, 2004
"... Anomalies are unusual and significant changes in a network's traffic levels, which can often span multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of ..."
Abstract
-
Cited by 362 (17 self)
- Add to MetaCart
(Show Context)
Anomalies are unusual and significant changes in a network's traffic levels, which can often span multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of high-dimensional, noisy data.
Mining anomalies using traffic feature distributions
- In ACM SIGCOMM
, 2005
"... The increasing practicality of large-scale flow capture makes it possible to conceive of traffic analysis methods that detect and identify a large and diverse set of anomalies. However the challenge of effectively analyzing this massive data source for anomaly diagnosis is as yet unmet. We argue tha ..."
Abstract
-
Cited by 322 (8 self)
- Add to MetaCart
(Show Context)
The increasing practicality of large-scale flow capture makes it possible to conceive of traffic analysis methods that detect and identify a large and diverse set of anomalies. However the challenge of effectively analyzing this massive data source for anomaly diagnosis is as yet unmet. We argue that the distributions of packet features (IP addresses and ports) observed in flow traces reveals both the presence and the structure of a wide range of anomalies. Using entropy as a summarization tool, we show that the analysis of feature distributions leads to significant advances on two fronts: (1) it enables highly sensitive detection of a wide range of anomalies, augmenting detections by volume-based methods, and (2) it enables automatic classification of anomalies via unsupervised learning. We show that using feature distributions, anomalies naturally fall into distinct and meaningful clusters. These clusters can be used to automatically classify anomalies and to uncover new anomaly types. We validate our claims on data from two backbone networks (Abilene and Géant) and conclude that feature distributions show promise as a key element of a fairly general network anomaly diagnosis framework.
Trajectory Sampling for Direct Traffic Observation
, 2001
"... Traffic measurement is a critical component for the control and engineering of communication networks. We argue that traffic measurement should make it possible to obtain the spatial flow of traffic through the domain, i.e., the paths followed by packets between any ingress and egress point of the d ..."
Abstract
-
Cited by 248 (30 self)
- Add to MetaCart
Traffic measurement is a critical component for the control and engineering of communication networks. We argue that traffic measurement should make it possible to obtain the spatial flow of traffic through the domain, i.e., the paths followed by packets between any ingress and egress point of the domain. Most resource allocation and capacity planning tasks can benefit from such information. Also, traffic measurements should be obtained without a routing model and without knowledge of network state. This allows the traffic measurement process to be resilient to network failures and state uncertainty. We propose a method that allows the direct inference of traffic flows through a domain by observing the trajectories of a subset of all packets traversing the network. The key advantages of the method are that (i) it does not rely on routing state, (ii) its implementation cost is small, and (iii) the measurement reporting traffic is modest and can be controlled precisely. The key idea of the method is to sample packets based on a hash function computed over the packet content. Using the same hash function will yield the same sample set of packets in the entire domain, and enables us to reconstruct packet trajectories. I.
Optimizing OSPF/IS-IS Weights in a Changing World
, 2002
"... A system of techniques is presented for optimizing OSPF/IS-IS weights for intradomain routing in a changing world, the goal being to avoid overloaded links. We address predicted periodic changes in traffic as well as problems arising from link failures and emerging hot-spots.
..."
Abstract
-
Cited by 216 (8 self)
- Add to MetaCart
A system of techniques is presented for optimizing OSPF/IS-IS weights for intradomain routing in a changing world, the goal being to avoid overloaded links. We address predicted periodic changes in traffic as well as problems arising from link failures and emerging hot-spots.
BGP Routing Stability of Popular Destinations
, 2002
"... The Border Gateway Protocol (BGP) plays a crucial role in the delivery of traffic in the Internet. Fluctuations in BGP routes cause degradation in user performance, increased processing load on routers, and changes in the distribution of traffic load over the network. Although earlier studies have r ..."
Abstract
-
Cited by 211 (26 self)
- Add to MetaCart
The Border Gateway Protocol (BGP) plays a crucial role in the delivery of traffic in the Internet. Fluctuations in BGP routes cause degradation in user performance, increased processing load on routers, and changes in the distribution of traffic load over the network. Although earlier studies have raised concern that BGP routes change quite often, previous work has not considered whether these routing fluctuations affect a significant portion of the traffic. This paper shows that the small number of popular destinations responsible for the bulk of Internet traffic have remarkably stable BGP routes. The vast majority of BGP instability stems from a small number of unpopular destinations. We draw these conclusions from a joint analysis of BGP update messages and flow-level traffic measurements from AT&T's IP backbone. In addition, we analyze the routing stability of destination prefixes corresponding to the NetRating's list of popular Web sites using the update messages collected by the RouteViews and RIPE-NCC servers. Our results suggest that operators can engineer their networks under the assumption that the BGP advertisements associated with most of the traffic are reasonably stable.
Traffic Matrix Estimation: Existing Techniques and New Directions
, 2002
"... Very few techniques have been proposed for estimating traffic matrices in the context of Internet traffic. Our work on POP-to-POP traffic matrices (TM) makes two contributions. The primary contribution is the outcome of a detailed comparative evaluation of the three existing techniques. We evaluate ..."
Abstract
-
Cited by 208 (14 self)
- Add to MetaCart
Very few techniques have been proposed for estimating traffic matrices in the context of Internet traffic. Our work on POP-to-POP traffic matrices (TM) makes two contributions. The primary contribution is the outcome of a detailed comparative evaluation of the three existing techniques. We evaluate these methods with respect to the estimation errors yielded, sensitivity to prior information required and sensitivity to the statistical assumptions they make. We study the impact of characteristics such as path length and the amount of link sharing on the estimation errors. Using actual data from a Tier-1 backbone, we assess the validity of the typical assumptions needed by the TM estimation techniques. The secondary contribution of our work is the proposal of a new direction for TM estimation based on using choice models to model POP fanouts. These models allow us to overcome some of the problems of existing methods because they can incorporate additional data and information about POPs and they enable us to make a fundamentally different kind of modeling assumption. We validate this approach by illustrating that our modeling assumption matches actual Internet data well. Using two initial simple models we provide a proof of concept showing that the incorporation of knowledge of POP features (such as total incoming bytes, number of customers, etc.) can reduce estimation errors. Our proposed approach can be used in conjunction with existing or future methods in that it can be used to generate good priors that serve as inputs to statistical inference techniques.
Fast Accurate Computation of Large-Scale IP Traffic Matrices from Link Loads
- In ACM SIGMETRICS
, 2003
"... A fundamental obstacle to developing sound methods for network and traffic engineering in operational IP networks today is the inability of network operators to measure the traffic matrix. A traffic matrix provides, for every ingress ¢ point into the network and egress £ point ..."
Abstract
-
Cited by 207 (29 self)
- Add to MetaCart
A fundamental obstacle to developing sound methods for network and traffic engineering in operational IP networks today is the inability of network operators to measure the traffic matrix. A traffic matrix provides, for every ingress ¢ point into the network and egress £ point
Packet-Level Traffic Measurements from the Sprint IP Backbone
- IEEE Network
, 2003
"... Network traffic measurements provide essential data for networking research and network management. In this paper we describe a passive monitoring system designed to capture GPS synchronized packet level traffic measurements on OC-3, OC-12, and OC-48 links. Our system is deployed in four POPs in the ..."
Abstract
-
Cited by 196 (12 self)
- Add to MetaCart
(Show Context)
Network traffic measurements provide essential data for networking research and network management. In this paper we describe a passive monitoring system designed to capture GPS synchronized packet level traffic measurements on OC-3, OC-12, and OC-48 links. Our system is deployed in four POPs in the Sprint IP backbone. Measurement data is stored on a 10 terabyte SAN (Storage Area Network) and analyzed on a computing cluster. We present a set of results to both demonstrate the strength of the system and identify recent changes in Internet traffic characteristics. The results include traffic workload, analyses of TCP flow round-trip times, out-ofsequence packet rates, and packet delay. We also show that some links no longer carry web traffic as their dominant component to the benefit of file sharing and media streaming. On most links we monitored, TCP flows exhibit low out-of-sequence packets rates and backbone delays are dominated by the speed of light.