## Frequency estimation of internet packet streams with limited space (2002)

### Cached

### Download Links

- [erikdemaine.org]
- [db.uwaterloo.ca]
- [db.uwaterloo.ca]
- [db.uwaterloo.ca]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the 10th Annual European Symposium on Algorithms |

Citations: | 151 - 1 self |

### BibTeX

@INPROCEEDINGS{Demaine02frequencyestimation,

author = {Erik D. Demaine and Ro López-ortiz and J. Ian Munro},

title = {Frequency estimation of internet packet streams with limited space},

booktitle = {In Proceedings of the 10th Annual European Symposium on Algorithms},

year = {2002},

pages = {348--360},

publisher = {Springer-Verlag}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. We consider a router on the Internet analyzing the statistical properties of a TCP/IP packet stream. A fundamental difficulty with measuring traffic behavior on the Internet is that there is simply too much data to be recorded for later analysis, on the order of gigabytes a second. As a result, network routers can collect only relatively few statistics about the data. The central problem addressed here is to use the limited memory of routers to determine essential features of the network traffic stream. A particularly difficult and representative subproblem is to determine the top k categories to which the most packets belong, for a desired value of k and for a given notion of categorization such as the destination IP address. We present an algorithm that deterministically finds (in particular) all categories having a frequency above 1/(m + 1) using m counters, which we prove is best possible in the worst case. We also present a sampling-based algorithm for the case that packet categories follow an arbitrary distribution, but their order over time is permuted uniformly at random. Under this model, our algorithm identifies flows above a frequency threshold of roughly 1 / √ nm with high probability, where m is the number of counters and n is the number of packets observed. This guarantee is not far off from the ideal of identifying all flows (probability 1/n), and we prove that it is best possible up to a logarithmic factor. We show that the algorithm ranks the identified flows according to frequency within any desired constant factor of accuracy. 1

### Citations

1597 | Space/time trade-offs in hash coding with allowable errors
- Bloom
- 1970
(Show Context)
Citation Context ... counter value. Each group represents a collection of equal counters, consisting of two parts: (1) a doubly linked list of counters (in no particular order, because they all have the same value), and =-=(2)-=- the difference in value between these counters and the counters in the previous group, or, for the first group, the value itself. Each “counter” no longer needs to store a value, but rather stores it... |

714 | The space complexity of approximating the frequency moments
- Alon, Matias, et al.
- 1999
(Show Context)
Citation Context ...plemented in a small constant amount of worst-case time per packet. Related work. Some variants of this problem have been previously considered in the context of one pass analysis of database streams =-=[1, 10, 20]-=-, query streams to a search engine [3], and packet data streams [7, 9, 19, 21]. Morris [24] showed that it is possible to approximately count up to n using lg lg n bits, and Flajolet [15] gave a detai... |

330 | New directions in traffic measurement and accounting
- ESTAN, VERGHESE
- 2001
(Show Context)
Citation Context ...ded. . . This feature will substantially decrease the CPU utilization needed to account for NetFlow packets. However, this sampling method is often unsatisfactory given the nature of Internet traffic =-=[9, 23]-=-. Moreover, in many cases, a small percentage of the packet categories account for a large percentage of the traffic. In general, because of the nature and characteristics of Internet traffic and inte... |

275 | Finding frequent items in data streams
- Charikar, Chen, et al.
- 2004
(Show Context)
Citation Context ...case time per packet. Related work. Some variants of this problem have been previously considered in the context of one pass analysis of database streams [1, 10, 20], query streams to a search engine =-=[3]-=-, and packet data streams [7, 9, 19, 21]. Morris [24] showed that it is possible to approximately count up to n using lg lg n bits, and Flajolet [15] gave a detailed analysis of this algorithm. Vitter... |

242 | Maintaining Stream Statistics over Sliding Windows
- Datar, Gionis, et al.
- 2002
(Show Context)
Citation Context ... work. Some variants of this problem have been previously considered in the context of one pass analysis of database streams [1, 10, 20], query streams to a search engine [3], and packet data streams =-=[7, 9, 19, 21]-=-. Morris [24] showed that it is possible to approximately count up to n using lg lg n bits, and Flajolet [15] gave a detailed analysis of this algorithm. Vitter [26] shows how to sample in a small amo... |

221 | Trajectory sampling for direct traffic observation - DUFFIELD, GROSSGLAUSER |

217 | The nature of the beast: recent traffic measurements from an internet backbone - Claffy, Miller, et al. - 1998 |

217 | Packet classification on multiple fields
- Gupta, McKeown
- 1999
(Show Context)
Citation Context ... work. Some variants of this problem have been previously considered in the context of one pass analysis of database streams [1, 10, 20], query streams to a search engine [3], and packet data streams =-=[7, 9, 19, 21]-=-. Morris [24] showed that it is possible to approximately count up to n using lg lg n bits, and Flajolet [15] gave a detailed analysis of this algorithm. Vitter [26] shows how to sample in a small amo... |

213 | New Sampling-Based Summary Statistics for Improving Approximate Query Answers - Gibbons, Matias - 1998 |

197 | An Introduction to Probability Theory and Its Applications, 3rd edn - FELLER - 1968 |

140 | Computing iceberg queries efficiently
- Fang, Shivakumar, et al.
- 1998
(Show Context)
Citation Context ...plemented in a small constant amount of worst-case time per packet. Related work. Some variants of this problem have been previously considered in the context of one pass analysis of database streams =-=[1, 10, 20]-=-, query streams to a search engine [3], and packet data streams [7, 9, 19, 21]. Morris [24] showed that it is possible to approximately count up to n using lg lg n bits, and Flajolet [15] gave a detai... |

122 | Sampling-based estimation of the number of distinct values of an attribute
- Haas, Naughton, et al.
- 1995
(Show Context)
Citation Context ...plemented in a small constant amount of worst-case time per packet. Related work. Some variants of this problem have been previously considered in the context of one pass analysis of database streams =-=[1, 10, 20]-=-, query streams to a search engine [3], and packet data streams [7, 9, 19, 21]. Morris [24] showed that it is possible to approximately count up to n using lg lg n bits, and Flajolet [15] gave a detai... |

115 | Random Sampling for Histogram Construction: How much is enough - Chaudhuri, Motwani, et al. - 1998 |

95 | A linear-time probabilistic counting algorithm for database applications
- Whang, Vander-Zanden, et al.
- 1990
(Show Context)
Citation Context ... sample in a small amount of space and linear time in a single pass. A related problem is computing the spectra (approximate number of distinct values) of a stream which can be achieved in lg n space =-=[16, 27]-=-. Alon et al. show that the first five moments can be approximated in lg n space while surprisingly all other (higher) moments require linear space [1]. On the particular issue of estimating frequenci... |

59 |
Counting large numbers of events in small registers
- Morris
- 1978
(Show Context)
Citation Context ... this problem have been previously considered in the context of one pass analysis of database streams [1, 10, 20], query streams to a search engine [3], and packet data streams [7, 9, 19, 21]. Morris =-=[24]-=- showed that it is possible to approximately count up to n using lg lg n bits, and Flajolet [15] gave a detailed analysis of this algorithm. Vitter [26] shows how to sample in a small amount of space ... |

54 | An approximate L 1 difference algorithm for massive data streams - Feigenbaum, Kannan, et al. |

41 | Approximate counting: a detailed analysis
- Flajolet
- 1985
(Show Context)
Citation Context ...reams [1, 10, 20], query streams to a search engine [3], and packet data streams [7, 9, 19, 21]. Morris [24] showed that it is possible to approximately count up to n using lg lg n bits, and Flajolet =-=[15]-=- gave a detailed analysis of this algorithm. Vitter [26] shows how to sample in a small amount of space and linear time in a single pass. A related problem is computing the spectra (approximate number... |

30 | Testing and spot checking of data streams - FEIGENBAUM, KANNAN, et al. - 2000 |

13 |
Finding a majority among n votes: Solution to problem 81-5
- Fischer, Salzberg
- 1982
(Show Context)
Citation Context ... without Randomization This section develops an algorithm for the most difficult model, the worst-case omniscient adversary. 3.1 Classic Majority Algorithm Our starting point is the elegant algorithm =-=[13]-=- for determining whether a value occurs a majority of the time in a stream, i.e., occurs more than n/2 times in a stream of length n. The basic model under which this algorithm was developed is that w... |

10 |
Probabilistic counting algorithms
- FLAJOLET, MARTIN
- 1985
(Show Context)
Citation Context ... sample in a small amount of space and linear time in a single pass. A related problem is computing the spectra (approximate number of distinct values) of a stream which can be achieved in lg n space =-=[16, 27]-=-. Alon et al. show that the first five moments can be approximated in lg n space while surprisingly all other (higher) moments require linear space [1]. On the particular issue of estimating frequenci... |

10 | Nonintrusive and accurate measurement of unidirectional delay and delay variation on the internet - Graham, Donnelly, et al. - 1998 |

10 | Probability and statistical inference - Kalbfleisch - 1985 |

3 |
Optimum algorithms for two random sampling problems
- Vitter
- 1983
(Show Context)
Citation Context ... and packet data streams [7, 9, 19, 21]. Morris [24] showed that it is possible to approximately count up to n using lg lg n bits, and Flajolet [15] gave a detailed analysis of this algorithm. Vitter =-=[26]-=- shows how to sample in a small amount of space and linear time in a single pass. A related problem is computing the spectra (approximate number of distinct values) of a stream which can be achieved i... |

2 |
Controlling High Bandwith Flows at the Congested Router
- Mahajan, Floyd
- 2001
(Show Context)
Citation Context ...ded. . . This feature will substantially decrease the CPU utilization needed to account for NetFlow packets. However, this sampling method is often unsatisfactory given the nature of Internet traffic =-=[9, 23]-=-. Moreover, in many cases, a small percentage of the packet categories account for a large percentage of the traffic. In general, because of the nature and characteristics of Internet traffic and inte... |

1 |
Sampled NetFlow, http://www.cisco.com/univercd/cc/td/doc/ product/software/ios120/120newft/120limit/120s/120s11/12s_sanf.htm
- Systems
- 2002
(Show Context)
Citation Context ...le, routers from one of the largest vendors (Cisco) collect perfect statistics on low-bandwidth connections but rely on sampling for higher speeds. The following excerpt from the Cisco NetFlow manual =-=[5]-=- illustrates this: Forwarding rates on a Gigabit Switch Router. . . an order of magnitude greater than traditional platforms that support NetFlow. “Touching” every switched packet for NetFlow accounti... |

1 |
Randomized Algorithms, Camb
- Motwani, Raghavan
- 1995
(Show Context)
Citation Context ...gory. Counters can be associatively indexed based on the monitored category. This indexing structure can be implemented in hardware by associative memory, or in software using dynamic perfect hashing =-=[25]-=-. In the latter case, our worst-case running times turn into with-high-probability running times. We believe that this model of computation captures essentially the entire spectrum of possible algorit... |