## Building a better Bloom filter (2005)

Venue: | In Proceedings of the 14th Annual European Symposium on Algorithms (ESA |

Citations: | 8 - 4 self |

### BibTeX

@TECHREPORT{Kirsch05buildinga,

author = {Adam Kirsch and Michael Mitzenmacher},

title = {Building a better Bloom filter},

institution = {In Proceedings of the 14th Annual European Symposium on Algorithms (ESA},

year = {2005}

}

### OpenURL

### Abstract

A technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for randomness in practice. 1

### Citations

696 |
The art of computer programming., volume 3: Sorting and searching
- Knuth
- 1974
(Show Context)
Citation Context ...following: two hash functions h1(x) and h2(x) can simulate more than two hash functions of the form gi(x) = h1(x)+ ih2(x). (See, for example, Knuth’s discussion of open addressing with double hashing =-=[10]-=-.) In our context i will range from 0 up to some number k −1 to give k hash functions, and the hash values are taken modulo the size of the relevant hash table. We demonstrate that this technique can ... |

687 | Summary cache: a scalable wide-area web cache sharing protocol
- Fan, Cao, et al.
(Show Context)
Citation Context ...tions map each element in the universe to a random number uniform over the range. While the randomness of the hash functions is clearly an optimistic assumption, it appears to be suitable in practice =-=[8, 13]-=-. For each element x ∈ S, the bits hi(x) are set to 1 for 1 ≤ i ≤ k. (A location can be set to 1 multiple times.) To check if an item y is in S, we check whether all hi(y) are set to 1. If not, then c... |

536 | A Classical Introduction to Modern Number Theory, Second Edition - Ireland, Rosen - 1990 |

346 | Network applications of bloom filters: a survey
- Broder, Mitzenmacher
(Show Context)
Citation Context ...oom filters allow false positives, the space savings often outweigh this drawback. The Bloom filter and its many variations have proven increasingly important for many applications (see, for example, =-=[3]-=-). For those who are not familiar with the data structure, we review it below in Section 2. In this paper, we show that applying a standard technique from the hashing literature can simplify the imple... |

294 | An improved data stream summary: The count-min sketch and its applications
- Cormode, Muthukrishnan
- 2004
(Show Context)
Citation Context ... prime. Finally, we demonstrate the utility of this approach beyond the simple Bloom filter by showing how it can be used to reduce the number of hash functions required for the Count-Min sketches of =-=[4]-=-. ∗ Supported in part by an NSF Graduate Research Fellowship and NSF grants CCR-9983832 and CCR-0121154. † Supported in part by NSF grants CCR-9983832 and CCR-0121154. 1s2 Standard Bloom filters We be... |

281 | Probability and Computing: Randomized Algorithms and Probabilistic Analysis - MITZENMACHER, UPFAL - 2005 |

197 | Compressed bloom filters
- Mitzenmacher
- 2002
(Show Context)
Citation Context ...two concepts as we have defined them is unimportant in practice, since, as mentioned in Section 2, one can easily show that R is very close to Pr(F) with extremely high probability (see, for example, =-=[11]-=-). It turns out that this result generalizes very naturally to the framework presented in this paper, and so the practical difference between the two concepts is largely unimportant even in our very g... |

84 |
Probability and Measure, third edition
- Billingsley
- 1995
(Show Context)
Citation Context ...the sum of all ct for which t ≤ T and it = j. We generally drop the T when the meaning is clear. The Count-Min sketch consists of an array Count of width w def = ⌈e/ǫ⌉ and depth d def = ⌈ln1/δ⌉: Count=-=[1, 1]-=-, . . .,Count[d, w]. Every entry of the array is initialized to 0. In addition, the CountMin sketch uses d hash functions chosen independently from a pairwise independent family H : {1, . . .,n} → {1,... |

42 |
Practical performance of Bloom filters and parallel free-text searching
- Ramakrishna
- 1989
(Show Context)
Citation Context ...tions map each element in the universe to a random number uniform over the range. While the randomness of the hash functions is clearly an optimistic assumption, it appears to be suitable in practice =-=[8, 13]-=-. For each element x ∈ S, the bits hi(x) are set to 1 for 1 ≤ i ≤ k. (A location can be set to 1 multiple times.) To check if an item y is in S, we check whether all hi(y) are set to 1. If not, then c... |

17 | Fast and Accurate Bitstate Verification for SPIN
- Dillinger, Manolios
- 2004
(Show Context)
Citation Context ...symptotic false positive probability. This leads to less computation and potentially less need for randomness in practice. This improvement was found empirically in the work of Dillinger and Manolios =-=[5, 6]-=-; here we provide a full theoretical analysis of this technique. After reviewing the Bloom filter data structure, we begin with a specific example, focusing on a useful but somewhat idealized Bloom fi... |

4 |
On the False-Positive Rate of Bloom Filters. Submitted Under Review
- Bose, Guo, et al.
- 2004
(Show Context)
Citation Context ... Then directly we have Pr(F | ¬E) = n� a=k � �� �a � n k 1 − a p + 1 k �n−a k!S(a, k) p + 1 ka . One could proceed by taking the limit of this expression as n → ∞ (see, for example, the discussion of =-=[2]-=-). Alternatively, we may note that for a standard Bloom filter, we have a similar problem. Assuming each of the k hash values for an element z /∈ S are distinct (which occurs with high probability), i... |

3 |
Balls and Bins: A Case Study in Negative Dependence. Random Structures and Algorithms
- Dubhashi, Ranjan
- 1998
(Show Context)
Citation Context ... (since if gj(z) = gj(i) for two values of j, we must have h1(z) = h1(i) and h2(z) = h2(i), and in this case z’s count is not included in any Bj,i). In this sense, the Bj,i’s are negatively dependent =-=[7]-=-. It follows that for any value v, In particular, we have that so � d−1 Pr min j=0 Bj,i � d−1 � ≥ v ≤ Pr(Bj,i ≥ v). j=0 � d−1 Pr min j=0 Bj,i � ≥ ǫ��a�/2 ≤ (2/ǫw) d , � d−1 Pr(âi ≥ ai + ǫ��a�) ≤ Pr(Ai... |