Results 1  10
of
10
Linear probing with constant independence
 In STOC ’07: Proceedings of the thirtyninth annual ACM symposium on Theory of computing
, 2007
"... Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a truly random hash function. Already Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a pairwise independent family may have expected logarithmic cost per operation. On the positive side, we show that 5wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for linear probing.
String hashing for linear probing
 In Proc. 20th SODA
, 2009
"... Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC’07, Pagh et al. presented data sets where t ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC’07, Pagh et al. presented data sets where the standard implementation of 2universal hashing leads to an expected number of Ω(log n) probes. They also showed that with 5universal hashing, the expected number of probes is constant. Unfortunately, we do not have 5universal hashing for, say, variable length strings. When we want to do such complex hashing from a complex domain, the generic standard solution is that we first do collision free hashing (w.h.p.) into a simpler intermediate domain, and second do the complicated hash function on this intermediate domain. Our contribution is that for an expected constant number of linear probes, it is suffices that each key has O(1) expected collisions with the first hash function, as long as the second hash function is 5universal. This means that the intermediate domain can be n times smaller, and such a smaller intermediate domain typically means that the overall hash function can be made simpler and at least twice as fast. The same doubling of hashing speed for O(1) expected probes follows for most domains bigger than 32bit integers, e.g., 64bit integers and fixed length strings. In addition, we study how the overhead from linear probing diminishes as the array gets larger, and what happens if strings are stored directly as intervals of the array. These cases were not considered by Pagh et al. 1
Cacheconscious collision resolution in string hash tables
 in “Proc. String Processing and Information Retrieval Symposium (SPIRE
, 2005
"... Abstract. Inmemory hash tables provide fast access to large numbers of strings, with less space overhead than sorted structures such as tries and binary trees. If chains are used for collision resolution, hash tables scale well, particularly if the pattern of access to the stored strings is skew. H ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
Abstract. Inmemory hash tables provide fast access to large numbers of strings, with less space overhead than sorted structures such as tries and binary trees. If chains are used for collision resolution, hash tables scale well, particularly if the pattern of access to the stored strings is skew. However, typical implementations of string hash tables, with lists of nodes, are not cacheefficient. In this paper we explore two alternatives to the standard representation: the simple expedient of including the string in its node, and the more drastic step of replacing each list of nodes by a contiguous array of characters. Our experiments show that, for large sets of strings, the improvement is dramatic. In all cases, the new structures give substantial savings in space at no cost in time. In the best case, the overhead space required for pointers is reduced by a factor of around 50, to less than two bits per string (with total space required, including 5.68 megabytes of strings, falling from 20.42 megabytes to 5.81 megabytes), while access times are also reduced. 1
On the kindependence required by linear probing and minwise independence
 In Proc. 37th International Colloquium on Automata, Languages and Programming (ICALP
, 2010
"... )independent hash functions are required, matching an upper bound of [Indyk, SODA’99]. We also show that the multiplyshift scheme of Dietzfelbinger, most commonly used in practice, fails badly in both applications. Abstract. We show that linear probing requires 5independent hash functions for exp ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
)independent hash functions are required, matching an upper bound of [Indyk, SODA’99]. We also show that the multiplyshift scheme of Dietzfelbinger, most commonly used in practice, fails badly in both applications. Abstract. We show that linear probing requires 5independent hash functions for expected constanttime performance, matching an upper bound of [Pagh et al. STOC’07]. For (1 + ε)approximate minwise independence, we show that Ω(lg 1 ε 1
Fast and compact hash tables for integer keys
 in Proc. 32nd Australasian Conf. Comput. Sci. (ACSC’09), 2009
"... A hash table is a fundamental data structure in computer science that can offer rapid storage and retrieval of data. A leading implementation for string keys is the cacheconscious array hash table. Although fast with strings, there is currently no information in the research literature on its perfor ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
A hash table is a fundamental data structure in computer science that can offer rapid storage and retrieval of data. A leading implementation for string keys is the cacheconscious array hash table. Although fast with strings, there is currently no information in the research literature on its performance with integer keys. More importantly, we do not know how efficient an integerbased array hash table is compared to other hash tables that are designed for integers, such as bucketized cuckoo hashing. In this paper, we explain how to efficiently implement an array hash table for integers. We then demonstrate, through careful experimental evaluations, which hash table, whether it be a bucketized cuckoo hash table, an array hash table, or alternative hash table schemes such as linear probing, offers the best performance—with respect to time and space— for maintaining a large dictionary of integers inmemory, on a current cacheoriented processor.
Tabulation Based 5Universal Hashing and Linear Probing
"... Previously [SODA’04] we devised the fastest known algorithm for 4universal hashing. The hashing was based on small precomputed4universal tables. This led to a fivefold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the precomputed tables a ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Previously [SODA’04] we devised the fastest known algorithm for 4universal hashing. The hashing was based on small precomputed4universal tables. This led to a fivefold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the precomputed tables are made 5universal, then the hash value becomes 5universal without any other change to the computation. Relatively this leads to even bigger gains since the direct methods for 5universal hashing use degree 4 polynomials. Experimentally, we find that our method can gain up to an order of magnitude in speed over direct 5universal hashing. Some of the most popular randomized algorithms have been proved to have the desired expected running time using
Redesigning the String Hash Table, Burst Trie, and BST to Exploit Cache
, 2011
"... A key decision when developing inmemory computing applications is choice of a mechanism to store and retrieve strings. The most efficient current data structures for this task are the hash table with movetofront chains and the burst trie, both of which use linked lists as a substructure, and vari ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
A key decision when developing inmemory computing applications is choice of a mechanism to store and retrieve strings. The most efficient current data structures for this task are the hash table with movetofront chains and the burst trie, both of which use linked lists as a substructure, and variants of binary search tree. These data structures are computationally efficient, but typical implementations use large numbers of nodes and pointers to manage strings, which is not efficient in use of cache. In this article, we explore two alternatives to the standard representation: the simple expedient of including the string in its node, and, for linked lists, the more drastic step of replacing each list of nodes by a contiguous array of characters. Our experiments show that, for large sets of strings, the improvement is dramatic. For hashing, in the best case the total space overhead is reduced to less than 1 bit per string. For the burst trie, over 300MB of strings can be stored in a total of under 200MB of memory with significantly improved search time. These results, on a variety of data sets, show that cachefriendly variants of fundamental data structures can yield remarkable gains in performance.
LINEAR PROBING WITH 5WISE INDEPENDENCE ∗
"... Abstract. Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analy ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a hash function with random and independent function values. Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a 2wise independent hash function may have expected logarithmic cost per operation. Recently, Pǎtra¸scu and Thorup have shown that also 3 and 4wise independent hash functions may give rise to logarithmic expected query time. On the positive side, we show that 5wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for hashing with linear probing.
TABULATION BASED 5INDEPENDENTHASHINGWITH APPLICATIONS TO LINEAR PROBINGANDSECOND MOMENT ESTIMATION ∗
"... Abstract. In the framework of Carter and Wegman, a kindependent hash function maps any k keys independently. It is known that 5independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic 5independent hash ..."
Abstract
 Add to MetaCart
Abstract. In the framework of Carter and Wegman, a kindependent hash function maps any k keys independently. It is known that 5independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic 5independent hash function evaluates a degree 4 polynomial over a prime field containing the key domain [n] = {0,..., n − 1}. Here we present an efficient 5independent hash function that uses no multiplications. Instead, for any parameter c, we make 2c − 1 lookups in tables of size O(n 1/c). In experiments on different computers, our scheme gained factors 1.8 to 10 in speed over the polynomial method. We also conducted experiments on the performance of hash functions inside the above applications. In particular, we give realistic examples of inputs that make the most popular 2independent hash function perform quite poorly. This illustrates the advantage of using schemes with provably good expected performance for all inputs.
Tabulation Based 5independent Hashing with Applications to Linear Probing and Second Moment Estimation ∗
"... In the framework of Carter and Wegman, a kindependent hash function maps any k keys independently. It is known that 5independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic 5independent hash function ..."
Abstract
 Add to MetaCart
In the framework of Carter and Wegman, a kindependent hash function maps any k keys independently. It is known that 5independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic 5independent hash function evaluates a degree 4 polynomial over a prime field containing the key domain[n] = {0,...,n−1}. Here we present an efficient 5independent hash function that uses no multiplications. Instead, for any parameter c, we make 2c−1 lookups in tables of size O(n 1/c). In experiments on different computers, our scheme gained factors 1.8 to 10 in speed over the polynomial method. We also conducted experiments on the performance of hash functions inside the above applications. In particular, we give realistic examples of inputs that make the most popular 2independent hash function perform quite poorly. This illustrates the advantage of using schemes with provably good expected performance for all inputs. 1 Introduction. We consider “kindependent hashing ” in the classic framework of Carter and Wegman [32]. For any i ≥ 1, let [i] = {0,1,...,i − 1}. We consider “hash ” functions from “keys ” in [n] to “hash values ” in [m]. A class H of hash functions is kindependent if for any distinct x0,...,xk−1 ∈ [n] and any possibly identical