Results 1 - 10
of
11
Why simple hash functions work: Exploiting the entropy in a data stream
- In Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms
, 2008
"... Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealiz ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealized model is unrealistic because a truly random hash function requires an exponential number of bits to describe. Alternatively, one can provide rigorous bounds on performance when explicit families of hash functions are used, such as 2-universal or O(1)-wise independent families. For such families, performance guarantees are often noticeably weaker than for ideal hashing. In practice, however, it is commonly observed that weak hash functions, including 2-universal hash functions, perform as predicted by the idealized analysis for truly random hash functions. In this paper, we try to explain this phenomenon. We demonstrate that the strong performance of universal hash functions in practice can arise naturally from a combination of the randomness of the hash function and the data. Specifically, following the large body of literature on random sources and randomness extraction, we model the data as coming from a “block source, ” whereby
Linear probing with constant independence
- In STOC ’07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
, 2007
"... Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a truly random hash function. Already Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a pairwise independent family may have expected logarithmic cost per operation. On the positive side, we show that 5-wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for linear probing.
Strongly history-independent hashing with applications
- In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
, 2007
"... We present a strongly history independent (SHI) hash table that supports search in O(1) worst-case time, and insert and delete in O(1) expected time using O(n) data space. This matches the bounds for dynamic perfect hashing, and improves on the best previous results by Naor and Teague on history ind ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We present a strongly history independent (SHI) hash table that supports search in O(1) worst-case time, and insert and delete in O(1) expected time using O(n) data space. This matches the bounds for dynamic perfect hashing, and improves on the best previous results by Naor and Teague on history independent hashing, which were either weakly history independent, or only supported insertion and search (no delete) each in O(1) expected time. The results can be used to construct many other SHI data structures. We show straightforward constructions for SHI ordered dictionaries: for n keys from {1,..., n k} searches take O(log log n) worst-case time and updates (insertions and deletions) O(log log n) expected time, and for keys in the comparison model searches take O(log n) worst-case time and updates O(log n) expected time. We also describe a SHI data structure for the order-maintenance problem. It supports comparisons in O(1) worst-case time, and updates in O(1) expected time. All structures use O(n) data space. 1
Hash-Based Techniques for High-Speed Packet Processing
"... Abstract Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little bac ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little background in either the theory or applications of hashing, reviewing the fundamentals as necessary. 1
Tabulation Based 5-Universal Hashing and Linear Probing
"... Previously [SODA’04] we devised the fastest known algorithm for 4-universal hashing. The hashing was based on small pre-computed4-universal tables. This led to a five-fold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the pre-computed tables a ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Previously [SODA’04] we devised the fastest known algorithm for 4-universal hashing. The hashing was based on small pre-computed4-universal tables. This led to a five-fold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the pre-computed tables are made 5-universal, then the hash value becomes 5-universal without any other change to the computation. Relatively this leads to even bigger gains since the direct methods for 5-universal hashing use degree 4 polynomials. Experimentally, we find that our method can gain up to an order of magnitude in speed over direct 5-universal hashing. Some of the most popular randomized algorithms have been proved to have the desired expected running time using
Hash-Based Data Structures for Extreme Conditions
, 2008
"... This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both an ..."
Abstract
- Add to MetaCart
This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both analyzable and practical. First, we show that a wide class of Bloom filter variants can be effectively implemented using very easily computable combinations of only two fully random hash functions. From a theoretical perspective, these results show that Bloom filters and related data structures can often be substantially derandomized with essentially no loss in performance. From a practical perspective, this derandomization allows for a significant speedup in certain query intensive applications. The rest of this work focuses on designing space-efficient, open-addressed, multiple choice hash tables for implementation in high-performance router hardware. Using multiple hash functions conserves space, but requires every hash table operation to consider multiple hash buckets, forcing a tradeoff between the slow speed of examining these buckets serially
The Power of Simple Tabulation Hashing Mihai Pǎtras¸cu AT&T Labs
, 2011
"... Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates ba ..."
Abstract
- Add to MetaCart
Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates back to Carter and Wegman (STOC’77). Keys are viewed as consisting of c characters. We initialize c tables T1,..., Tc mapping characters to random hash codes. A key x = (x1,..., xq) is hashed to T1[x1] ⊕ · · · ⊕ Tc[xc], where ⊕ denotes xor. While this scheme is not even 4-independent, we show that it provides many of the guarantees that are normally obtained via higher independence, e.g., Chernoff-type concentration, min-wise hashing for estimating set intersection, and cuckoo hashing. An important target of the analysis of algorithms is to determine whether there exist practical schemes, which enjoy mathematical guarantees on performance. Hashing and hash tables are one of the most common inner loops in real-world computation, and are even built-in “unit cost ” operations in high level programming languages that offer associative arrays. Often,
LINEAR PROBING WITH 5-WISE INDEPENDENCE ∗
"... Abstract. Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analy ..."
Abstract
- Add to MetaCart
Abstract. Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a hash function with random and independent function values. Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a 2-wise independent hash function may have expected logarithmic cost per operation. Recently, Pǎtra¸scu and Thorup have shown that also 3- and 4-wise independent hash functions may give rise to logarithmic expected query time. On the positive side, we show that 5-wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for hashing with linear probing.

