Results 1 
9 of
9
Space Efficient Hash Tables With Worst Case Constant Access Time
 In STACS
, 2003
"... We generalize Cuckoo Hashing [23] to dary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
We generalize Cuckoo Hashing [23] to dary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes and the expected amortized insertion time is constant. This is the first dictionary that has worst case constant access time and expected constant update time, works with (1 + ffl) n space, and supports satellite information. Experiments indicate that d = 4 choices suffice for ffl 0:03. We also describe variants of the data structure that allow the use of hash functions that can be evaluted in constant time.
Why simple hash functions work: Exploiting the entropy in a data stream
 In Proceedings of the 19th Annual ACMSIAM Symposium on Discrete Algorithms
, 2008
"... Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealiz ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealized model is unrealistic because a truly random hash function requires an exponential number of bits to describe. Alternatively, one can provide rigorous bounds on performance when explicit families of hash functions are used, such as 2universal or O(1)wise independent families. For such families, performance guarantees are often noticeably weaker than for ideal hashing. In practice, however, it is commonly observed that weak hash functions, including 2universal hash functions, perform as predicted by the idealized analysis for truly random hash functions. In this paper, we try to explain this phenomenon. We demonstrate that the strong performance of universal hash functions in practice can arise naturally from a combination of the randomness of the hash function and the data. Specifically, following the large body of literature on random sources and randomness extraction, we model the data as coming from a “block source, ” whereby
Fast moment estimation in data streams in optimal space
 In Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC
, 2011
"... We give a spaceoptimal algorithm with update time O(log 2 (1/ε) log log(1/ε)) for (1 ± ε)approximating the pth frequency moment, 0 < p < 2, of a lengthn vector updated in a data stream. This provides a nearly exponential improvement in the update time complexity over the previous spaceoptimal alg ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
We give a spaceoptimal algorithm with update time O(log 2 (1/ε) log log(1/ε)) for (1 ± ε)approximating the pth frequency moment, 0 < p < 2, of a lengthn vector updated in a data stream. This provides a nearly exponential improvement in the update time complexity over the previous spaceoptimal algorithm of [KaneNelsonWoodruff, SODA 2010], which had update time Ω(1/ε 2). 1
Efficient hashing with lookups in two memory accesses, in: 16th
 SODA, ACMSIAM
"... The study of hashing is closely related to the analysis of balls and bins. Azar et. al. [1] showed that instead of using a single hash function if we randomly hash a ball into two bins and place it in the smaller of the two, then this dramatically lowers the maximum load on bins. This leads to the c ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
The study of hashing is closely related to the analysis of balls and bins. Azar et. al. [1] showed that instead of using a single hash function if we randomly hash a ball into two bins and place it in the smaller of the two, then this dramatically lowers the maximum load on bins. This leads to the concept of twoway hashing where the largest bucket contains O(log log n) balls with high probability. The hash look up will now search in both the buckets an item hashes to. Since an item may be placed in one of two buckets, we could potentially move an item after it has been initially placed to reduce maximum load. Using this fact, we present a simple, practical hashing scheme that maintains a maximum load of 2, with high probability, while achieving high memory utilization. In fact, with n buckets, even if the space for two items are preallocated per bucket, as may be desirable in hardware implementations, more than n items can be stored giving a high memory utilization. Assuming truly random hash functions, we prove the following properties for our hashing scheme. • Each lookup takes two random memory accesses, and reads at most two items per access. • Each insert takes O(log n) time and up to log log n+ O(1) moves, with high probability, and constant time in expectation. • Maintains 83.75 % memory utilization, without requiring dynamic allocation during inserts. We also analyze the tradeoff between the number of moves performed during inserts and the maximum load on a bucket. By performing at most h moves, we can maintain a maximum load of O(hlogl((~og~og:n/h)). So, even by performing one move, we achieve a better bound than by performing no moves at all. 1
Strongly historyindependent hashing with applications
 In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
, 2007
"... We present a strongly history independent (SHI) hash table that supports search in O(1) worstcase time, and insert and delete in O(1) expected time using O(n) data space. This matches the bounds for dynamic perfect hashing, and improves on the best previous results by Naor and Teague on history ind ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
We present a strongly history independent (SHI) hash table that supports search in O(1) worstcase time, and insert and delete in O(1) expected time using O(n) data space. This matches the bounds for dynamic perfect hashing, and improves on the best previous results by Naor and Teague on history independent hashing, which were either weakly history independent, or only supported insertion and search (no delete) each in O(1) expected time. The results can be used to construct many other SHI data structures. We show straightforward constructions for SHI ordered dictionaries: for n keys from {1,..., n k} searches take O(log log n) worstcase time and updates (insertions and deletions) O(log log n) expected time, and for keys in the comparison model searches take O(log n) worstcase time and updates O(log n) expected time. We also describe a SHI data structure for the ordermaintenance problem. It supports comparisons in O(1) worstcase time, and updates in O(1) expected time. All structures use O(n) data space. 1
HistoryIndependent Cuckoo Hashing
"... Cuckoo hashing is an efficient and practical dynamic dictionary. It provides expected amortized constant update time, worst case constant lookup time, and good memory utilization. Various experiments demonstrated that cuckoo hashing is highly suitable for modern computer architectures and distribute ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Cuckoo hashing is an efficient and practical dynamic dictionary. It provides expected amortized constant update time, worst case constant lookup time, and good memory utilization. Various experiments demonstrated that cuckoo hashing is highly suitable for modern computer architectures and distributed settings, and offers significant improvements compared to other schemes. In this work we construct a practical historyindependent dynamic dictionary based on cuckoo hashing. In a historyindependent data structure, the memory representation at any point in time yields no information on the specific sequence of insertions and deletions that led to its current content, other than the content itself. Such a property is significant when preventing unintended leakage of information, and was also found useful in several algorithmic settings. Our construction enjoys most of the attractive properties of cuckoo hashing. In particular, no dynamic memory allocation is required, updates are performed in expected amortized constant time, and membership queries are performed in worst case constant time. Moreover, with high probability, the lookup procedure queries only two memory entries which are independent and can be queried in parallel. The approach underlying our construction is to enforce a canonical memory representation on cuckoo hashing. That is, up to the initial randomness, each set of elements has a unique memory representation.
Fast Manhattan sketches in data streams
 In Proceedings of the 29th ACM SIGACTSIGMODSIGART Symposium on Principles of Database Systems (PODS
, 2010
"... The ℓ1distance, also known as the Manhattan or taxicab distance, between two vectors x, y in R n is Pn xi − yi. i=1 Approximating this distance is a fundamental primitive on massive databases, with applications to clustering, nearest neighbor search, network monitoring, regression, sampling, and ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
The ℓ1distance, also known as the Manhattan or taxicab distance, between two vectors x, y in R n is Pn xi − yi. i=1 Approximating this distance is a fundamental primitive on massive databases, with applications to clustering, nearest neighbor search, network monitoring, regression, sampling, and support vector machines. We give the first 1pass streaming algorithm for this problem in the turnstile model with O ∗ (ε −2) space and O ∗ (1) update time. The O ∗ notation hides polylogarithmic factors in ε, n, and the precision required to store vector entries. All previous algorithms either required Ω(ε −3) space or Ω(ε −2) update time and/or could not work in the turnstile model (i.e., support an arbitrary number of updates to each coordinate). Our bounds are optimal up to O ∗ (1) factors.
Incremental hashing in state space search
 In Workshop ”New Results in Planning, Scheduling and Design
, 2004
"... Abstract. State memorization is essential for statespace search to avoid redundant expansions and hashing serves as a method to, address store and retrieve states efficiently. In this paper we introduce incremental state hashing to compute hash values in constant time. The method will be most effec ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. State memorization is essential for statespace search to avoid redundant expansions and hashing serves as a method to, address store and retrieve states efficiently. In this paper we introduce incremental state hashing to compute hash values in constant time. The method will be most effective in guided depthfirst search traversals of state space graphs, like in IDA*, where the computation of the set of successors and their heuristic estimates is extremely fast: heuristic values are often computed incrementally or retrieved from precomputed pattern database tables, and backtracking keeps the changes in the state representation vector during the exploration small. The approach quickly decides if a given state is not present in a hash table, and accelerates successful search. It can further accelerate perfect hashing for pattern storage and lookup. If, for a better coverage of the state space, partial search methods without collision resolving is used, we establish another benefit for incremental state hashing. We exemplify our considerations in the (n 2 − 1)Puzzle, in action planning, and conduct experiments in Atomix. 1