Results 1  10
of
61
Network Applications of Bloom Filters: A Survey
 Internet Mathematics
, 2002
"... Abstract. ABloomfilter is a simple spaceefficient randomized data structure for representing a set in order to support membership queries. Bloom filters allow false positives but the space savings often outweigh this drawback when the probability of an error is controlled. Bloom filters have been u ..."
Abstract

Cited by 341 (12 self)
 Add to MetaCart
Abstract. ABloomfilter is a simple spaceefficient randomized data structure for representing a set in order to support membership queries. Bloom filters allow false positives but the space savings often outweigh this drawback when the probability of an error is controlled. Bloom filters have been used in database applications since the 1970s, but only in recent years have they become popular in the networking literature. The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications. 1.
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
 ACM Transactions on Computer Systems
, 2003
"... Accurate network traffic measurement is required for accounting, bandwidth provisioning and detecting DoS attacks. These applications see the traffic as a collection of flows they need to measure. As link speeds and the number of flows increase, keeping a counter for each flow is too expensive (usin ..."
Abstract

Cited by 126 (8 self)
 Add to MetaCart
Accurate network traffic measurement is required for accounting, bandwidth provisioning and detecting DoS attacks. These applications see the traffic as a collection of flows they need to measure. As link speeds and the number of flows increase, keeping a counter for each flow is too expensive (using SRAM) or slow (using DRAM). The current stateoftheart methods (Cisco’s sampled NetFlow) which count periodically sampled packets are slow, inaccurate and resourceintensive. Previous work showed that at different granularities a small number of “heavy hitters” accounts for a large share of traffic. Our paper introduces a paradigm shift by concentrating the measurement process on large flows only — those above some threshold such as 0.1 % of the link capacity. We propose two novel and scalable algorithms for identifying the large flows: sample and hold and multistage filters, which take a constant number of memory references per packet and use a small amount of memory. If M is the available memory, we show analytically that the errors of our new algorithms are proportional to 1/M; by contrast, the error of an algorithm based on classical sampling is proportional to 1 / √ M, thus providing much less accuracy for the same amount of memory. We also describe further optimizations such as early removal and conservative update that further improve the accuracy of our algorithms, as measured on real traffic traces, by an order of magnitude. Our schemes allow a new form of accounting called threshold accounting in which only flows above a threshold are charged by usage while the rest are charged a fixed fee. Threshold accounting generalizes usagebased and duration based pricing.
Efficient TopK Query Calculation in Distributed Networks
 In PODC
, 2004
"... This paper presents a new algorithm to answer topk queries (e.g. “find the k objects with the highest aggregate values”) in a distributed network. Existing algorithms such as the Threshold Algorithm [FLN01] consume an excessive amount of bandwidth when the number of nodes, m, is high. We propose a ..."
Abstract

Cited by 61 (0 self)
 Add to MetaCart
This paper presents a new algorithm to answer topk queries (e.g. “find the k objects with the highest aggregate values”) in a distributed network. Existing algorithms such as the Threshold Algorithm [FLN01] consume an excessive amount of bandwidth when the number of nodes, m, is high. We propose a new algorithm called “ThreePhase Uniform Threshold” (TPUT). TPUT reduces network bandwidth consumption by pruning away ineligible objects, and terminates in three roundtrips regardless of data input. The paper presents two sets of results about TPUT. First, tracedriven simulations show that, depending on the size of the network, TPUT reduces network traffic by one to two orders of magnitude compared to existing algorithms. Second, TPUT is proven to be instanceoptimal on data series that satisfy a lower bound on the slope of decreases in values. In particular, analysis shows that by using a pruning parameter α < 1, TPUT achieves a qualitative reduction in network traffic, for example, lowering the optimality ratio from O(m ∗ m) to O(m ∗ √ m) for data series following Zipf distribution. 1
SpaceCode Bloom Filter for Efficient PerFlow Traffic Measurement
 In Proc. IEEE INFOCOM
, 2004
"... Perflow traffic measurement is critical for usage accounting, traffic engineering, and anomaly detection. Previous methodologies are either based on random sampling (e.g., Cisco's NetFlow), which is inaccurate, or only account for the "elephants". We introduce a novel technique for measuring perflo ..."
Abstract

Cited by 56 (2 self)
 Add to MetaCart
Perflow traffic measurement is critical for usage accounting, traffic engineering, and anomaly detection. Previous methodologies are either based on random sampling (e.g., Cisco's NetFlow), which is inaccurate, or only account for the "elephants". We introduce a novel technique for measuring perflow traffic approximately, for all flows regardless of their sizes, at very highspeed (say, OC768). The core of this technique is a novel data structure called Space Code Bloom Filter (SCBF). A SCBF is an approximate representation of a multiset; each element in this multiset...
Bloom histogram: Path selectivity estimation for xml data with updates
 In VLDB
, 2004
"... Costbased XML query optimization calls for accurate estimation of the selectivity of path expressions. Some other interactive and internet applications can also benefit from such estimations. While there are a number of estimation techniques proposed in the literature, almost none of them has any g ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
Costbased XML query optimization calls for accurate estimation of the selectivity of path expressions. Some other interactive and internet applications can also benefit from such estimations. While there are a number of estimation techniques proposed in the literature, almost none of them has any guarantee on the estimation accuracy within a given space limit. In addition, most of them assume that the XML data are more or less static, i.e., with few updates. In this paper, we present a framework for XML path selectivity estimation in a dynamic context. Specifically, we propose a novel data structure, bloom histogram, to approximate XML path frequency distribution within a small space budget and to estimate the path selectivity accurately with the bloom histogram. We obtain the upper bound of its estimation error and discuss the tradeoffs between the accuracy and the space limit. To support updates of bloom histograms efficiently when underlying XML data change, a dynamic summary layer is used to keep exact or more detailed XML path information. We demonstrate through our extensive experiments that the new solution can
Implementing signatures for transactional memory
 40th Intl. Symp. on Microarchitecture
, 2007
"... Transactional Memory (TM) systems must track the read and write sets—items read and written during a transaction—to detect conflicts among concurrent transactions. Several TMs use signatures, which summarize unbounded read/write sets in bounded hardware at a performance cost of false positives (conf ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
Transactional Memory (TM) systems must track the read and write sets—items read and written during a transaction—to detect conflicts among concurrent transactions. Several TMs use signatures, which summarize unbounded read/write sets in bounded hardware at a performance cost of false positives (conflicts detected when none exists). This paper examines different organizations to achieve hardwareefficient and accurate TM signatures. First, we find that implementing each signature with a single khashfunction Bloom filter (True Bloom signature) is inefficient, as it requires multiported SRAMs. Instead, we advocate using k singlehashfunction Bloom filters in parallel (Parallel Bloom signature), using areaefficient singleported SRAMs. Our formal analysis shows that both organizations perform equally well in theory and our simulationbased evaluation shows this to hold approximately in practice. We also show that by choosing highquality hash functions we can achieve signature designs noticeably more accurate than the previously proposed implementations. Finally, we adapt Pagh and Rodler’s cuckoo hashing to implement CuckooBloom signatures. While this representation does not support set intersection, it mitigates false positives for the common case of small read/write sets and performs like a Bloom filter for large sets. 1.
An Improved Construction for Counting Bloom Filters
 14th Annual European Symposium on Algorithms, LNCS 4168
, 2006
"... Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashingbas ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashingbased alternative based on dleft hashing called a dleft CBF (dlCBF). The dlCBF offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more. We describe the construction of dlCBFs, provide an analysis, and demonstrate their effectiveness experimentally. 1
Theory and network applications of dynamic bloom filters
 In Proceedings of the 25th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM
, 2006
"... Abstract — A bloom filter is a simple, spaceefficient, randomized data structure for concisely representing a static data set, in order to support approximate membership queries. It has great potential for distributed applications where systems need to share information about what resources they ha ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
Abstract — A bloom filter is a simple, spaceefficient, randomized data structure for concisely representing a static data set, in order to support approximate membership queries. It has great potential for distributed applications where systems need to share information about what resources they have. The space efficiency is achieved at the cost of a small probability of false positive in membership queries. However, for many applications the space savings and short locating time consistently outweigh this drawback. In this paper, we introduce dynamic bloom filters (DBF) to support concise representation and approximate membership queries of dynamic sets, and study the false positive probability and union algebra operations. We prove that DBF can control the false positive probability at a low level by adjusting the number of standard bloom filters used according to the actual size of current dynamic set. The space complexity is also acceptable if the actual size of dynamic set does not deviate too much from the predefined threshold. Furthermore, we present multidimension dynamic bloom filters (MDDBF) to support concise representation and approximate membership queries of dynamic sets in multiple attribute dimensions, and study the false positive probability and union algebra operations through mathematic analysis and experimentation. We also explore the optimization approach and three network applications of bloom filters, namely bloom joins, informed search, and global index implementation. Our simulation shows that informed search based on bloom filters can obtain higher recall and success rate of query than the blind search protocol.
SpaceCode Bloom Filter for Efficient Traffic Flow Measurement
 In Proceedings of the 2003 ACM SIGCOMM conference on Internet measurement
, 2003
"... Perflow traffic measurement is critical for usage accounting, traffic engineering, and anomaly detection. Previous methodologies are either based on random sampling (e.g., Cisco's NetFlow), which is inaccurate, or only account for the "elephants". Our paper introduces a novel technique for measurin ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
Perflow traffic measurement is critical for usage accounting, traffic engineering, and anomaly detection. Previous methodologies are either based on random sampling (e.g., Cisco's NetFlow), which is inaccurate, or only account for the "elephants". Our paper introduces a novel technique for measuring perflow traffic approximately, for all flows regardless of their sizes, at very highspeed (say, OC192+). The core of this technique is a novel data structure called Space Code Bloom Filter (SCBF). A SCBF is an approximate representation of a multiset; each element in this multiset is a traffic flow and its multiplicity is the number of packets in the flow. SCBF employs a Maximum Likelihood Estimation (MLE) method to measure the multiplicity of an element in the multiset. Through parameter tuning, SCBF allows for graceful tradeoff between measurement accuracy and computational and storage complexity. SCBF also contributes to the foundation of data streaming by introducing a new paradigm called blind streaming. We evaluated the performance of SCBF on packet traces gathered from a tier1 ISP backbone and through mathematical analysis. Our preliminary results demonstrate that SCBF achieves reasonable measurement accuracy with very low storage and computational complexity.
Approximately detecting duplicates for streaming data using stable bloom filters
 In SIGMOD
, 2006
"... Traditional duplicate elimination techniques are not applicable to many data stream applications. In general, precisely eliminating duplicates in an unbounded data stream is not feasible in many streaming scenarios. Therefore, we target at approximately eliminating duplicates in streaming environmen ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Traditional duplicate elimination techniques are not applicable to many data stream applications. In general, precisely eliminating duplicates in an unbounded data stream is not feasible in many streaming scenarios. Therefore, we target at approximately eliminating duplicates in streaming environments given a limited space. Based on a wellknown bitmap sketch, we introduce a data structure, Stable Bloom Filter, and a novel and simple algorithm. The basic idea is as follows: since there is no way to store the whole history of the stream, SBF continuously evicts the stale elements so that SBF has room for those more recent ones. After finding some properties of SBF analytically, we show that a tight upper bound of false positive rates is guaranteed. In our empirical study, we compare SBF to alternative methods. The results show that our method is superior in terms of both accuracy and time efficiency when a fixed small space and an acceptable false positive rate are given. 1.