Results 1 - 10
of
58
Analyzing Peer-to-Peer Traffic Across Large Networks
- IEEE/ACM Transactions on Networking
, 2002
"... Abstract—The use of peer-to-peer (P2P) applications is growing dramatically, particularly for sharing large video/audio files and software. In this paper, we analyze P2P traffic by measuring flowlevel information collected at multiple border routers across a large ISP network, and report our investi ..."
Abstract
-
Cited by 267 (3 self)
- Add to MetaCart
Abstract—The use of peer-to-peer (P2P) applications is growing dramatically, particularly for sharing large video/audio files and software. In this paper, we analyze P2P traffic by measuring flowlevel information collected at multiple border routers across a large ISP network, and report our investigation of three popular P2P systems—FastTrack, Gnutella, and Direct-Connect. We characterize the P2P trafffic observed at a single ISP and its impact on the underlying network. We observe very skewed distribution in the traffic across the network at different levels of spatial aggregation (IP, prefix, AS). All three P2P systems exhibit significant dynamics at short time scale and particularly at the IP address level. Still, the fraction of P2P traffic contributed by each prefix is more stable than the corresponding distribution of either Web traffic or overall traffic. The high volume and good stability properties of P2P traffic suggests that the P2P workload is a good candidate for being managed via application-specific layer-3 traffic engineering in an ISP’s network. Index Terms—File sharing, peer-to-peer, P2P, traffic characterization, traffic measurement.
New Directions in Traffic Measurement and Accounting
, 2001
"... Accurate network traffic measurement is required for accounting, bandwidth provisioning, and detecting DOS attacks. However, keeping a counter to measure the traffic sent by each of a million concurrent flows is too expensive (using SRAM) or slow (using DRAM). The current state-of-the-art (e.g., Cis ..."
Abstract
-
Cited by 267 (10 self)
- Add to MetaCart
Accurate network traffic measurement is required for accounting, bandwidth provisioning, and detecting DOS attacks. However, keeping a counter to measure the traffic sent by each of a million concurrent flows is too expensive (using SRAM) or slow (using DRAM). The current state-of-the-art (e.g., Cisco NetFlow) methods which count periodically sampled packets are slow, inaccurate, and memory-intensive. Our paper introduces a paradigm shift by concentrating on the problem of measuring only "heavy" flows --- i.e., flows whose traffic is above some threshold such as 1% of the link. After showing that a number of simple solutions based on cached counters and classical sampling do not work, we describe two novel and scalable schemes for this purpose which take a constant number of memory references per packet and use a small amount of memory. Further, unlike NetFlow estimates, we have provable bounds on the accuracy of measured rates and the probability of false negatives. We also propose a new form of accounting called threshold accounting in which only flows above threshold are charged by usage while the rest are charged a fixed fee. Threshold accounting generalizes the familiar notions of usage-based and duration based pricing. I.
BGP Routing Stability of Popular Destinations
, 2002
"... The Border Gateway Protocol (BGP) plays a crucial role in the delivery of traffic in the Internet. Fluctuations in BGP routes cause degradation in user performance, increased processing load on routers, and changes in the distribution of traffic load over the network. Although earlier studies have r ..."
Abstract
-
Cited by 141 (18 self)
- Add to MetaCart
The Border Gateway Protocol (BGP) plays a crucial role in the delivery of traffic in the Internet. Fluctuations in BGP routes cause degradation in user performance, increased processing load on routers, and changes in the distribution of traffic load over the network. Although earlier studies have raised concern that BGP routes change quite often, previous work has not considered whether these routing fluctuations affect a significant portion of the traffic. This paper shows that the small number of popular destinations responsible for the bulk of Internet traffic have remarkably stable BGP routes. The vast majority of BGP instability stems from a small number of unpopular destinations. We draw these conclusions from a joint analysis of BGP update messages and flow-level traffic measurements from AT&T's IP backbone. In addition, we analyze the routing stability of destination prefixes corresponding to the NetRating's list of popular Web sites using the update messages collected by the RouteViews and RIPE-NCC servers. Our results suggest that operators can engineer their networks under the assumption that the BGP advertisements associated with most of the traffic are reasonably stable.
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
- ACM Transactions on Computer Systems
, 2003
"... Accurate network traffic measurement is required for accounting, bandwidth provisioning and detecting DoS attacks. These applications see the traffic as a collection of flows they need to measure. As link speeds and the number of flows increase, keeping a counter for each flow is too expensive (usin ..."
Abstract
-
Cited by 100 (7 self)
- Add to MetaCart
Accurate network traffic measurement is required for accounting, bandwidth provisioning and detecting DoS attacks. These applications see the traffic as a collection of flows they need to measure. As link speeds and the number of flows increase, keeping a counter for each flow is too expensive (using SRAM) or slow (using DRAM). The current state-of-the-art methods (Cisco’s sampled NetFlow) which count periodically sampled packets are slow, inaccurate and resourceintensive. Previous work showed that at different granularities a small number of “heavy hitters” accounts for a large share of traffic. Our paper introduces a paradigm shift by concentrating the measurement process on large flows only — those above some threshold such as 0.1 % of the link capacity. We propose two novel and scalable algorithms for identifying the large flows: sample and hold and multistage filters, which take a constant number of memory references per packet and use a small amount of memory. If M is the available memory, we show analytically that the errors of our new algorithms are proportional to 1/M; by contrast, the error of an algorithm based on classical sampling is proportional to 1 / √ M, thus providing much less accuracy for the same amount of memory. We also describe further optimizations such as early removal and conservative update that further improve the accuracy of our algorithms, as measured on real traffic traces, by an order of magnitude. Our schemes allow a new form of accounting called threshold accounting in which only flows above a threshold are charged by usage while the rest are charged a fixed fee. Threshold accounting generalizes usage-based and duration based pricing.
Guidelines for interdomain traffic engineering
- SIGCOMM Comput. Commun. Rev
, 2003
"... Network operators must have control over the flow of traffic into, out of, and across their networks. However, the Border Gateway Protocol (BGP) does not facilitate common traffic engineering tasks, such as balancing load across multiple links to a neighboring AS or directing traffic to a different ..."
Abstract
-
Cited by 69 (11 self)
- Add to MetaCart
Network operators must have control over the flow of traffic into, out of, and across their networks. However, the Border Gateway Protocol (BGP) does not facilitate common traffic engineering tasks, such as balancing load across multiple links to a neighboring AS or directing traffic to a different neighbor. Solving these problems is difficult because the number of possible changes to routing policies is too large to exhaustively test all possibilities, some changes in routing policy can have an unpredictable effect on the flow of traffic, and the BGP decision process implemented by router vendors limits an operator’s control over path selection. We propose fundamental objectives for interdomain traffic engineering and specific guidelines for achieving these objectives within the context of BGP. Using routing and traffic data from the AT&T backbone we show how certain BGP policy changes can move traffic in a predictable fashion, despite limited knowledge about the routing policies in neighboring AS’s. Then, we show how operators can gain greater flexibility by relaxing some steps in the BGP decision process and ensuring that neighboring AS’s send consistent advertisements at each peering location. Finally, we show that an operator can manipulate traffic efficiently by changing the routes for a small number of prefixes (or groups of related prefixes) that consistently receive a large amount of traffic.
Data Streaming Algorithms for Efficient and Accurate Estimation of Flow Size Distribution
, 2004
"... Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and accommodate new traffic demands through better traffic engineering. Previous work on estimating ..."
Abstract
-
Cited by 56 (5 self)
- Add to MetaCart
Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and accommodate new traffic demands through better traffic engineering. Previous work on estimating the flow size distribution has been focused on making inferences from sampled network traffic. Its accuracy is limited by the (typically) low sampling rate required to make the sampling operation affordable. In this paper we present a novel data streaming algorithm to provide much more accurate estimates of flow distribution, using a "lossy data structure" which consists of an array of counters fitted well into SRAM. For each incoming packet, our algorithm only needs to increment one underlying counter, making the algorithm fast enough even for 40 Gbps (OC-768) links. The data structure is lossy in the sense that sizes of multiple flows may collide into the same counter. Our algorithm uses Bayesian statistical methods such as Expectation Maximization to infer the most likely flow size distribution that results in the observed counter values after collision. Evaluations of this algorithm on large Internet traces obtained from several sources (including a tier-1 ISP) demonstrate that it has very high measurement accuracy (within 2%). Our algorithm not only dramatically improves the accuracy of flow distribution measurement, but also contributes to the field of data streaming by formalizing an existing methodology and applying it to the context of estimating the flow-distribution.
Learn More, Sample Less: Control of Volume and Variance in Network Measurement
- IEEE TRANSACTIONS IN INFORMATION THEORY
"... objects 289-43596 . We wish to estimate the sums !#" %$ &('*)+& , of the sizes of objects of a given color , from a sampled subset of objects. How should the sampling distribution be chosen in order to jointly control both the variance of the estimators - ./ and the number of sa ..."
Abstract
-
Cited by 43 (8 self)
- Add to MetaCart
objects 289-43596 . We wish to estimate the sums !#" %$ &('*)+& , of the sizes of objects of a given color , from a sampled subset of objects. How should the sampling distribution be chosen in order to jointly control both the variance of the estimators - ./ and the number of samples taken? This problem is motivated from network measurement, in which the are the byte sizes of traffic flows reported by routers, and the are the common properties of the packet of the flow, e.g., source and destination IP address. In this paper we propose a sampling scheme that optimally controls the volume of the measurements, and the variance of unbiased usage estimates - 0/ , while retaining usage detail down to the finest level of granularity in the colors. We provide algorithms for dynamic control of sample volumes and evaluate them on flow data gathered from a commercial IP network. The algorithms are simple to implement and robust to variation in network conditions. The work reported here has been applied in the measurement infrastructure of the commercial IP network. To not have employed sampling would have entailed an order of magnitude greater capital expenditure to accommodate the measurement traffic and its processing.
Predicting Resource Usage and Estimation Accuracy in an IP Flow Measurement Collection Infrastructure
, 2003
"... This paper describes a measurement infrastructure used to collect detailed IP traffic measurements from an IP backbone. Usage, i.e, bytes transmitted, is determined from raw NetFlow records generated by the backbone routers. The amount of raw data is immense. Two types of data sampling in order to m ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
This paper describes a measurement infrastructure used to collect detailed IP traffic measurements from an IP backbone. Usage, i.e, bytes transmitted, is determined from raw NetFlow records generated by the backbone routers. The amount of raw data is immense. Two types of data sampling in order to manage data volumes: (i) (packet) sampled NetFlow in the routers; (ii) sizedependent sampling of NetFlow records. Furthermore, dropping of NetFlow records in transmission can be regarded as an uncontrolled form of sampling.
Online identification of hierarchical heavy hitters: Algorithms, evaluation, and applications
- In Proceedings of the 4th ACM SIGCOMM Internet Measurement Conference
, 2004
"... In traffic monitoring, accounting, and network anomaly detection, it is often important to be able to detect high-volume traffic clusters in near real-time. Such heavy-hitter traffic clusters are often hierarchical (i.e., they may occur at different aggregation levels like ranges of IP addresses) an ..."
Abstract
-
Cited by 33 (5 self)
- Add to MetaCart
In traffic monitoring, accounting, and network anomaly detection, it is often important to be able to detect high-volume traffic clusters in near real-time. Such heavy-hitter traffic clusters are often hierarchical (i.e., they may occur at different aggregation levels like ranges of IP addresses) and possibly multidimensional (i.e., they may involve the combination of different IP header fields like IP addresses, port numbers, and protocol). Without prior knowledge about the precise structures of such traffic clusters, a naive approach would require the monitoring system to examine all possible combinations of aggregates in order to detect the heavy hitters, which can be prohibitive in terms of computation resources. In this paper, we focus on online identification of 1-dimensional and 2-dimensional hierarchical heavy hitters (HHHs), arguably the two most important scenarios in traffic analysis. We show that the

