## Simple Summaries for Hashing with Multiple Choices

Citations: | 9 - 3 self |

### BibTeX

@MISC{Kirsch_simplesummaries,

author = {Adam Kirsch and Michael Mitzenmacher},

title = {Simple Summaries for Hashing with Multiple Choices},

year = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

In a multiple-choice hashing scheme, each item is stored in one of d> = 2 possible hash tablebuckets. The availability of these multiple choices allows for a substantial reduction in the maximum load of the buckets. However, a lookup may now require examining each of the d locations. Forapplications where this cost is undesirable, Song et al. propose keeping a summary that allows one to determine which of the d locations is appropriate for each item, where the summary may allowfalse positives for items not in hash table. We propose alternative, simple constructions of such summaries that use less space for both the summary and the underlying hash table. Moreover, ourconstructions are easily analyzable and tunable.

### Citations

1597 | Space/time trade-offs in hash coding with allowable errors
- Bloom
- 1970
(Show Context)
Citation Context ... tradeoffs that can allow significant performance improvements for the corresponding summary. 2 Related Work There is a great deal of work on multiple-choice hashing schemes [17] and on Bloom filters =-=[2, 5]-=-. See those references for more background. As mentioned in the introduction, our starting point in this paper is the work of Song et al. [21], which introduces an approach for summarizing the locatio... |

762 | Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
- Fan, Cao, et al.
- 1999
(Show Context)
Citation Context ...tually in the hash table may yield false positives; otherwise, the summary could be no more efficient than a hash table. The small summary used by Song et al. [21] consists of a counting Bloom filter =-=[11, 16]-=-. We review this construction in detail in Section 3. In this paper, we suggest three alternative approaches for maintaining summaries for multiple-choice hashing schemes. The first is based on interp... |

390 | Network applications of bloom filters: A survey
- Broder, Mitzenmacher
- 2005
(Show Context)
Citation Context ... tradeoffs that can allow significant performance improvements for the corresponding summary. 2 Related Work There is a great deal of work on multiple-choice hashing schemes [17] and on Bloom filters =-=[2, 5]-=-. See those references for more background. As mentioned in the introduction, our starting point in this paper is the work of Song et al. [21], which introduces an approach for summarizing the locatio... |

306 |
Probability and computing: Randomized algorithms and probabilistic analysis
- Mitzenmacher, Upfal
- 2005
(Show Context)
Citation Context ...lematic.) These probabilities can be computed very easily. The probability of a failure can be calculated using standard probabilistic techniques, as it is just a special case of the birthday paradox =-=[18]-=-. The probability of a false positive, conditioned on no failure occurring (so all n items have distinct b bit strings), is n/2 b . For concreteness, we describe two specific instances of this scheme.... |

262 | Balanced allocations
- Azar, Broder, et al.
(Show Context)
Citation Context ...mum load (that is, the number of items in a bucket), as giving each item the choice between more than one bucket in the hash table often leads to a significant improvement in the balance of the items =-=[1, 4, 14, 17]-=-. These schemes can also be used to ensure that each bucket contains at most one item with high probability [3]. For these reasons, multiple-choice hashing schemes have been proposed for many applicat... |

208 | The power of two choices in randomized load balancing
- Mitzenmacher
(Show Context)
Citation Context ...mum load (that is, the number of items in a bucket), as giving each item the choice between more than one bucket in the hash table often leads to a significant improvement in the balance of the items =-=[1, 4, 14, 17]-=-. These schemes can also be used to ensure that each bucket contains at most one item with high probability [3]. For these reasons, multiple-choice hashing schemes have been proposed for many applicat... |

208 | Compressed Bloom filters
- Mitzenmacher
- 2001
(Show Context)
Citation Context ...tually in the hash table may yield false positives; otherwise, the summary could be no more efficient than a hash table. The small summary used by Song et al. [21] consists of a counting Bloom filter =-=[11, 16]-=-. We review this construction in detail in Section 3. In this paper, we suggest three alternative approaches for maintaining summaries for multiple-choice hashing schemes. The first is based on interp... |

166 |
Baeza-Yates, "Handbook of Algorithms and Data Structures
- Gonnet, A
- 1991
(Show Context)
Citation Context ...value (requiring log d = logloglogn + O(1) bits) that indicates what hash function was used for the corresponding item. Searching for an item in the summary can now be done using interpolation search =-=[12]-=-, which requires only O(loglogn) operations on average. Insertions and deletions are trivial; simply add or remove the appropriate string.sIn this summary construction, a failure occurs if two items y... |

90 | How useful is old information
- Mitzenmacher
(Show Context)
Citation Context ...se reasons, multiple-choice hashing schemes have been proposed for many applications, including network routers [4], peer-to-peer applications [6], and standard load balancing of jobs across machines =-=[9, 15]-=-. Recently, in the context of routers, Song et al. [21] suggested that a drawback of multiple-choice schemes is that at the time of a lookup, one cannot know which of the d possible locations to check... |

72 | The bloomier filter: an efficient data structure for static support lookup tables
- Chazelle, Kilian, et al.
- 2004
(Show Context)
Citation Context ... tables in Section 5. Finally, the problem of constructing summaries for multiple-choice hash tables seems closely connected with the work on a generalization of Bloom filters called Bloomier filters =-=[8]-=-, which are designed to represent functions on a set. In the Bloomier filter problem setting, each item in a set has an associated value; items not in the set have a null value. The goal is then to de... |

70 | Using multiple hash functions to improve IP lookups - Broder, Mitzenmacher |

65 | Fast hash table lookup using extended bloom filter: an aid to network processing
- Song, Dharmapurikar, et al.
(Show Context)
Citation Context ...osed for many applications, including network routers [4], peer-to-peer applications [6], and standard load balancing of jobs across machines [9, 15]. Recently, in the context of routers, Song et al. =-=[21]-=- suggested that a drawback of multiple-choice schemes is that at the time of a lookup, one cannot know which of the d possible locations to check for the item. The natural solution to this problem is ... |

49 | Interpreting stale load information
- Dahlin
- 2000
(Show Context)
Citation Context ...se reasons, multiple-choice hashing schemes have been proposed for many applications, including network routers [4], peer-to-peer applications [6], and standard load balancing of jobs across machines =-=[9, 15]-=-. Recently, in the context of routers, Song et al. [21] suggested that a drawback of multiple-choice schemes is that at the time of a lookup, one cannot know which of the d possible locations to check... |

47 |
Multilevel adaptive hashing
- Broder, Karlin
- 1990
(Show Context)
Citation Context ...sh table often leads to a significant improvement in the balance of the items [1, 4, 14, 17]. These schemes can also be used to ensure that each bucket contains at most one item with high probability =-=[3]-=-. For these reasons, multiple-choice hashing schemes have been proposed for many applications, including network routers [4], peer-to-peer applications [6], and standard load balancing of jobs across ... |

45 | Practical performance of Bloom filters and parallel free-text searching - Ramakrishna - 1989 |

40 |
An Optimal Bloom Filter Replacement
- Pagh, Pagh, et al.
- 2005
(Show Context)
Citation Context ... could be directly applied to give a (perhaps inefficient) solution. Limited results exist for Bloomier filters that have to cope with changing function values. However, lower bounds for such filters =-=[8, 19]-=- suggest that we must take advantage of the characteristics of our specific problem setting (for example, the skew of the distribution of the values) in order to guarantee good performance. 3 The Sche... |

34 | Geometric generalizations of the power of two choices - Byers, Considine, et al. - 2004 |

17 | Fast and Accurate Bitstate Verification for SPIN
- Dillinger, Manolios
- 2004
(Show Context)
Citation Context ..., the number of hashes required for a Bloom filter can be dramatically reduced using techniques related to double hashing, so that only two hash functions are required per (sufficiently large) filter =-=[10, 13]-=-. Because we are aiming for very small false positive probabilities, it is not immediately clear whether these techniques can be applied in this context; this remains an area for future work. 7.2 Dele... |

16 | On the false-positive rate of bloom filters
- Bose, Guo, et al.
- 2007
(Show Context)
Citation Context ...n from the previous sub-table. This requires the combinatorial fact that when r items are placed randomly into s buckets, the distribution of the number of buckets that remain empty can be calculated =-=[7]-=-. Some care is required to ensure that the computation is efficient and that the memory requirement is reasonable, but the procedure is not difficult to implement. 5.2 Skew in Multilevel Hash Tables W... |

12 |
The Power of Two Choices: A Survey of Techniques and Results
- Mitzenmacher, Richa, et al.
- 2001
(Show Context)
Citation Context ...mum load (that is, the number of items in a bucket), as giving each item the choice between more than one bucket in the hash table often leads to a significant improvement in the balance of the items =-=[1, 4, 14, 17]-=-. These schemes can also be used to ensure that each bucket contains at most one item with high probability [3]. For these reasons, multiple-choice hashing schemes have been proposed for many applicat... |

8 | Building a better bloom filter
- Kirsch, Mitzenmacher
- 2005
(Show Context)
Citation Context ...han 1, and correspondingly there is very little probability for a failure 1 Technically, we may wish to differentiate between the false positive probability and the false positive rate, as defined in =-=[13]-=-, but the distinction is unimportant in practice. See [13] for an explanation.sfor these items. A natural way to reduce the probability of a type 1 failure is to introduce more skew, specifically by m... |

1 | How useful is old information? IEEE Transactions on Parallel and Distributed Systems,11(1 - Mitzenmacher - 2000 |