Results 1 
9 of
9
Backyard Cuckoo Hashing: Constant WorstCase Operations with a Succinct Representation
, 2010
"... The performance of a dynamic dictionary is measured mainly by its update time, lookup time, and space consumption. In terms of update time and lookup time there are known constructions that guarantee constanttime operations in the worst case with high probability, and in terms of space consumption ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
The performance of a dynamic dictionary is measured mainly by its update time, lookup time, and space consumption. In terms of update time and lookup time there are known constructions that guarantee constanttime operations in the worst case with high probability, and in terms of space consumption there are known constructions that use essentially optimal space. In this paper we settle two fundamental open problems: • We construct the first dynamic dictionary that enjoys the best of both worlds: we present a twolevel variant of cuckoo hashing that stores n elements using (1+ϵ)n memory words, and guarantees constanttime operations in the worst case with high probability. Specifically, for any ϵ = Ω((log log n / log n) 1/2) and for any sequence of polynomially many operations, with high probability over the randomness of the initialization phase, all operations are performed in constant time which is independent of ϵ. The construction is based on augmenting cuckoo hashing with a “backyard ” that handles a large fraction of the elements, together with a deamortized perfect hashing scheme for eliminating the dependency on ϵ.
How to Approximate A Set Without Knowing Its Size In Advance
"... The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an online fashion, supporting membership queries without false negatives and with a false positive rate at most ϵ. That is, the membership algorithm must be correct on each x ∈ S, and may ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an online fashion, supporting membership queries without false negatives and with a false positive rate at most ϵ. That is, the membership algorithm must be correct on each x ∈ S, and may err with probability at most ϵ on each x / ∈ S. We study a wellmotivated, yet insufficiently explored, variant of this problem where the size n of the set is not known in advance. Existing optimal approximate membership data structures require that the size is known in advance, but in many practical scenarios this is not a realistic assumption. Moreover, even if the eventual size n of the set is known in advance, it is desirable to have the smallest possible space usage also when the current number of inserted elements is smaller than n. Our contribution consists of the following results: • We show a superlinear gap between the space complexity when the size is known in advance and the space complexity when the size is not known in advance. When the size is known in advance, it is wellknown that Θ(n log(1/ϵ)) bits of space are necessary and sufficient (Bloom ’70, Carter et al. ’78). However, when the size is not known in advance, we prove
Practical perfect hashing in nearly optimal space
 Information Systems
"... A hash function is a mapping from a key universe U to a range of integers, i.e., h: U↦→{0, 1,...,m−1}, where m is the range’s size. A perfect hash function for some set S ⊆ U is a hash function that is onetoone on S, where m≥S. A minimal perfect hash function for some set S ⊆ U is a perfect hash ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
A hash function is a mapping from a key universe U to a range of integers, i.e., h: U↦→{0, 1,...,m−1}, where m is the range’s size. A perfect hash function for some set S ⊆ U is a hash function that is onetoone on S, where m≥S. A minimal perfect hash function for some set S ⊆ U is a perfect hash function with a range of minimum size, i.e., m=S. This paper presents a construction for (minimal) perfect hash functions that combines theoretical analysis, practical performance, expected linear construction time and nearly optimal space consumption for the data structure. For n keys and m=n the space consumption ranges from 2.62n to 3.3n bits, and for m=1.23n it ranges from 1.95n to 2.7n bits. This is within a small constant factor from the theoretical lower bounds of 1.44n bits for m=n and 0.89n bits for m=1.23n. We combine several theoretical results into a practical solution that has turned perfect hashing into a very compact data structure to solve the membership problem when the key set S is static and known in advance. By taking into account the memory hierarchy we can construct (minimal) perfect hash functions for over a billion keys in 46 minutes using a commodity PC. An open source implementation of the algorithms is available
Balls and Bins: Smaller Hash Families and Faster Evaluation
, 2012
"... A fundamental fact in the analysis of randomized algorithms is that when n balls are hashed into n bins independently and uniformly at random, with high probability each bin contains at most O(log n / log log n) balls. In various applications, however, the assumption that a truly random hash functio ..."
Abstract
 Add to MetaCart
(Show Context)
A fundamental fact in the analysis of randomized algorithms is that when n balls are hashed into n bins independently and uniformly at random, with high probability each bin contains at most O(log n / log log n) balls. In various applications, however, the assumption that a truly random hash function is available is not always valid, and explicit functions are required. In this paper we study the size of families (or, equivalently, the description length of their functions) that guarantee a maximal load of O(log n / log log n) with high probability, as well as the evaluation time of their functions. Whereas such functions must be described using Ω(log n) bits, the best upper bound was formerly O(log 2 n / log log n) bits, which is attained by O(log n / log log n)wise independent functions. Traditional constructions of the latter offer an evaluation time of O(log n / log log n), which according to Siegel’s lower bound [FOCS ’89] can be reduced only at the cost of significantly increasing the description length. We construct two families that guarantee a maximal load of O(log n / log log n) with high probability. Our constructions are based on two different approaches, and exhibit different tradeoffs between the description length and the evaluation time. The first construction shows
Hebrew University
"... Abstract — The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an online fashion, supporting membership queries without false negatives and with a false positive rate at most ɛ. That is, the membership algorithm must be correct on each x ∈ ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an online fashion, supporting membership queries without false negatives and with a false positive rate at most ɛ. That is, the membership algorithm must be correct on each x ∈ S, and may err with probability at most ɛ on each x / ∈ S. We study a wellmotivated, yet insufficiently explored, variant of this problem where the size n of the set is not known in advance. Existing optimal approximate membership data structures require that the size is known in advance, but in many practical scenarios this is not a realistic assumption. Moreover, even if the eventual size n of the set is known in advance, it is desirable to have the smallest possible space usage also when the current number of inserted elements is smaller than n. Our contribution consists of the following results: • We show a superlinear gap between the space complexity when the size is known in advance and the space complexity when the size is not known in advance. When the size is known in advance, it is wellknown that Θ(n log(1/ɛ)) bits of space are necessary and sufficient (Bloom ’70, Carter et al. ’78). However, when the size is not known in advance, we prove that at least (1 − o(1))n log(1/ɛ) + Ω(n log log n) bits of space must be used. In particular, the average number of bits per element must depend on the size of the set. • We show that our space lower bound is tight, and can even be matched by a highly efficient data structure. We present a data structure that uses (1+o(1))n log(1/ɛ)+O(n log log n) bits of space for approximating any set of any size n, without having to know n in advance. Our data structure supports membership queries in constant time in the worst case with high probability, and supports insertions in expected amortized constant time. Moreover, it can be “deamortized” to support also insertions in constant time in the worst case with high probability by only increasing its space usage to O(n log(1/ɛ) + n log log n) bits. 1.
michaelm(at)eecs.harvard.edu
"... jthaler(at)fas.harvard.edu A dictionary (or map) is a keyvalue store that requires all keys be unique, and a multimap is a keyvalue store that allows for multiple values to be associated with the same key. We design hashingbased indexing schemes for dictionaries and multimaps that achieve worstc ..."
Abstract
 Add to MetaCart
(Show Context)
jthaler(at)fas.harvard.edu A dictionary (or map) is a keyvalue store that requires all keys be unique, and a multimap is a keyvalue store that allows for multiple values to be associated with the same key. We design hashingbased indexing schemes for dictionaries and multimaps that achieve worstcase optimal performance for lookups and updates, with a small or negligible probability the data structure will require a rehash operation, depending on whether we are working in the the externalmemory (I/O) model or one of the wellknown versions of the Random Access Machine (RAM) model. One of the main features of our constructions is that they are fully deamortized, meaning that their performance bounds hold without one having to tune their constructions with certain performance parameters, such as the constant factors in the exponents of failure probabilities or, in the case of the externalmemory model, the size of blocks or cache lines and the size of internal memory (i.e., our externalmemory algorithms are cache oblivious). Our solutions are based on a fully deamortized implementation of cuckoo hashing, which may be of independent interest. This hashing scheme uses two cuckoo hash tables, one “nested” inside the other, with one serving as a primary structure and the other serving as an auxiliary supporting queue/stash structure that is supersized with respect to traditional auxiliary structures but nevertheless adds negligible storage to our scheme. This auxiliary structure allows the success probability for cuckoo hashing to be very high, which is useful in cryptographic or dataintensive applications. ar X iv