Results 1 
9 of
9
Linear probing with constant independence
 In STOC ’07: Proceedings of the thirtyninth annual ACM symposium on Theory of computing
, 2007
"... Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a truly random hash function. Already Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a pairwise independent family may have expected logarithmic cost per operation. On the positive side, we show that 5wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for linear probing.
Algorithms and Experiments: The New (and Old) Methodology
 J. Univ. Comput. Sci
, 2001
"... The last twenty years have seen enormous progress in the design of algorithms, but little of it has been put into practice. Because many recently developed algorithms are hard to characterize theoretically and have large runningtime coefficients, the gap between theory and practice has widened over ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
The last twenty years have seen enormous progress in the design of algorithms, but little of it has been put into practice. Because many recently developed algorithms are hard to characterize theoretically and have large runningtime coefficients, the gap between theory and practice has widened over these years. Experimentation is indispensable in the assessment of heuristics for hard problems, in the characterization of asymptotic behavior of complex algorithms, and in the comparison of competing designs for tractable problems. Implementation, although perhaps not rigorous experimentation, was characteristic of early work in algorithms and data structures. Donald Knuth has throughout insisted on testing every algorithm and conducting analyses that can predict behavior on actual data; more recently, Jon Bentley has vividly illustrated the difficulty of implementation and the value of testing. Numerical analysts have long understood the need for standardized test suites to ensure robustness, precision and efficiency of numerical libraries. It is only recently, however, that the algorithms community has shown signs of returning to implementation and testing as an integral part of algorithm development. The emerging disciplines of experimental algorithmics and algorithm engineering have revived and are extending many of the approaches used by computing pioneers such as Floyd and Knuth and are placing on a formal basis many of Bentley's observations. We reflect on these issues, looking back at the last thirty years of algorithm development and forward to new challenges: designing cacheaware algorithms, algorithms for mixed models of computation, algorithms for external memory, and algorithms for scientific research.
String hashing for linear probing
 In Proc. 20th SODA
, 2009
"... Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC’07, Pagh et al. presented data sets where t ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC’07, Pagh et al. presented data sets where the standard implementation of 2universal hashing leads to an expected number of Ω(log n) probes. They also showed that with 5universal hashing, the expected number of probes is constant. Unfortunately, we do not have 5universal hashing for, say, variable length strings. When we want to do such complex hashing from a complex domain, the generic standard solution is that we first do collision free hashing (w.h.p.) into a simpler intermediate domain, and second do the complicated hash function on this intermediate domain. Our contribution is that for an expected constant number of linear probes, it is suffices that each key has O(1) expected collisions with the first hash function, as long as the second hash function is 5universal. This means that the intermediate domain can be n times smaller, and such a smaller intermediate domain typically means that the overall hash function can be made simpler and at least twice as fast. The same doubling of hashing speed for O(1) expected probes follows for most domains bigger than 32bit integers, e.g., 64bit integers and fixed length strings. In addition, we study how the overhead from linear probing diminishes as the array gets larger, and what happens if strings are stored directly as intervals of the array. These cases were not considered by Pagh et al. 1
On the kindependence required by linear probing and minwise independence
 In Proc. 37th International Colloquium on Automata, Languages and Programming (ICALP
, 2010
"... )independent hash functions are required, matching an upper bound of [Indyk, SODA’99]. We also show that the multiplyshift scheme of Dietzfelbinger, most commonly used in practice, fails badly in both applications. Abstract. We show that linear probing requires 5independent hash functions for exp ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
)independent hash functions are required, matching an upper bound of [Indyk, SODA’99]. We also show that the multiplyshift scheme of Dietzfelbinger, most commonly used in practice, fails badly in both applications. Abstract. We show that linear probing requires 5independent hash functions for expected constanttime performance, matching an upper bound of [Pagh et al. STOC’07]. For (1 + ε)approximate minwise independence, we show that Ω(lg 1 ε 1
Design and Analysis of CacheConscious Programs
, 1999
"... algorithms are presented in some examples. This work is about experimental algorithmics, and the methodology is therefore based on experiments. All important theory is experimentally evaluated. All experiments are made by the author; when I refer to other experimental work, it is not in direct compa ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
algorithms are presented in some examples. This work is about experimental algorithmics, and the methodology is therefore based on experiments. All important theory is experimentally evaluated. All experiments are made by the author; when I refer to other experimental work, it is not in direct comparison. Preprocessing input data to some required data structure is considered a part of the algorithm, that is, the time for reading the input stream to some data structure is measured as part of a program's execution. The aim is the construction of an analytical model for predicting the behaviour of the memory hierarchy, and attempts are made to discriminate the "random noise" from the execution of programs. This noise is considered to partly, but heavily, depend on the pattern of memory references of a program. With the knowledge of when memory references happen in the program and where the references are made in the different levels of the memory hierarchy, an analytical method is propos...
LINEAR PROBING WITH 5WISE INDEPENDENCE ∗
"... Abstract. Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analy ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a hash function with random and independent function values. Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a 2wise independent hash function may have expected logarithmic cost per operation. Recently, Pǎtra¸scu and Thorup have shown that also 3 and 4wise independent hash functions may give rise to logarithmic expected query time. On the positive side, we show that 5wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for hashing with linear probing.
S147–1 Efficiency issues in the RLF heuristic for graph coloring
"... This paper presents an efficient implementation of the wellknown Recursive Largest First (RLF) algorithm of Leighton to find heuristic solutions to the classical graph coloring problem. The main features under study are a lazy computation of the induced vertex degree and the use of efficient data s ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper presents an efficient implementation of the wellknown Recursive Largest First (RLF) algorithm of Leighton to find heuristic solutions to the classical graph coloring problem. The main features under study are a lazy computation of the induced vertex degree and the use of efficient data structures. Computational experiments show that the lazy feature leads to a novel implementation that is faster than previous implementations on graphs with high density. Cachemisses are instead determinant for the assessment of data structures. 1
Effect of cache lines in arraybased hashing algorithms
"... Abstract — Hashing algorithms and their efficiency is modeled with their expected probe lengths. This value measures the number of algorithmic steps required to find the position of an item inside the table. This metric, however, has an implicit assumption that all of these steps have a uniform cost ..."
Abstract
 Add to MetaCart
Abstract — Hashing algorithms and their efficiency is modeled with their expected probe lengths. This value measures the number of algorithmic steps required to find the position of an item inside the table. This metric, however, has an implicit assumption that all of these steps have a uniform cost. In this paper we show that this is not true on modern computers, and that caches and especially cache lines have a great impact on the performance and effectiveness of hashing algorithms that use arraybased structures. Spatial locality of memory accesses plays a major role in the effectiveness of an algorithm. We show a novel model of evaluating hashing schemes; this model is based on the number of cache misses the algorithms suffer. This new approach is shown to model the real performance of hash tables more precisely than previous methods. For developing this model the sketch of proof of the expected probe length of linear probing is included.
Tabulation Based 5independent Hashing with Applications to Linear Probing and Second Moment Estimation ∗
"... In the framework of Carter and Wegman, a kindependent hash function maps any k keys independently. It is known that 5independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic 5independent hash function ..."
Abstract
 Add to MetaCart
In the framework of Carter and Wegman, a kindependent hash function maps any k keys independently. It is known that 5independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic 5independent hash function evaluates a degree 4 polynomial over a prime field containing the key domain[n] = {0,...,n−1}. Here we present an efficient 5independent hash function that uses no multiplications. Instead, for any parameter c, we make 2c−1 lookups in tables of size O(n 1/c). In experiments on different computers, our scheme gained factors 1.8 to 10 in speed over the polynomial method. We also conducted experiments on the performance of hash functions inside the above applications. In particular, we give realistic examples of inputs that make the most popular 2independent hash function perform quite poorly. This illustrates the advantage of using schemes with provably good expected performance for all inputs. 1 Introduction. We consider “kindependent hashing ” in the classic framework of Carter and Wegman [32]. For any i ≥ 1, let [i] = {0,1,...,i − 1}. We consider “hash ” functions from “keys ” in [n] to “hash values ” in [m]. A class H of hash functions is kindependent if for any distinct x0,...,xk−1 ∈ [n] and any possibly identical