Results 1 -
8 of
8
Linear probing with constant independence
- In STOC ’07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
, 2007
"... Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a truly random hash function. Already Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a pairwise independent family may have expected logarithmic cost per operation. On the positive side, we show that 5-wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for linear probing.
Algorithms and Experiments: The New (and Old) Methodology
- J. Univ. Comput. Sci
, 2001
"... The last twenty years have seen enormous progress in the design of algorithms, but little of it has been put into practice. Because many recently developed algorithms are hard to characterize theoretically and have large running-time coefficients, the gap between theory and practice has widened over ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
The last twenty years have seen enormous progress in the design of algorithms, but little of it has been put into practice. Because many recently developed algorithms are hard to characterize theoretically and have large running-time coefficients, the gap between theory and practice has widened over these years. Experimentation is indispensable in the assessment of heuristics for hard problems, in the characterization of asymptotic behavior of complex algorithms, and in the comparison of competing designs for tractable problems. Implementation, although perhaps not rigorous experimentation, was characteristic of early work in algorithms and data structures. Donald Knuth has throughout insisted on testing every algorithm and conducting analyses that can predict behavior on actual data; more recently, Jon Bentley has vividly illustrated the difficulty of implementation and the value of testing. Numerical analysts have long understood the need for standardized test suites to ensure robustness, precision and efficiency of numerical libraries. It is only recently, however, that the algorithms community has shown signs of returning to implementation and testing as an integral part of algorithm development. The emerging disciplines of experimental algorithmics and algorithm engineering have revived and are extending many of the approaches used by computing pioneers such as Floyd and Knuth and are placing on a formal basis many of Bentley's observations. We reflect on these issues, looking back at the last thirty years of algorithm development and forward to new challenges: designing cache-aware algorithms, algorithms for mixed models of computation, algorithms for external memory, and algorithms for scientific research.
String hashing for linear probing
- In Proc. 20th SODA
, 2009
"... Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC’07, Pagh et al. presented data sets where t ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC’07, Pagh et al. presented data sets where the standard implementation of 2-universal hashing leads to an expected number of Ω(log n) probes. They also showed that with 5-universal hashing, the expected number of probes is constant. Unfortunately, we do not have 5-universal hashing for, say, variable length strings. When we want to do such complex hashing from a complex domain, the generic standard solution is that we first do collision free hashing (w.h.p.) into a simpler intermediate domain, and second do the complicated hash function on this intermediate domain. Our contribution is that for an expected constant number of linear probes, it is suffices that each key has O(1) expected collisions with the first hash function, as long as the second hash function is 5-universal. This means that the intermediate domain can be n times smaller, and such a smaller intermediate domain typically means that the overall hash function can be made simpler and at least twice as fast. The same doubling of hashing speed for O(1) expected probes follows for most domains bigger than 32-bit integers, e.g., 64-bit integers and fixed length strings. In addition, we study how the overhead from linear probing diminishes as the array gets larger, and what happens if strings are stored directly as intervals of the array. These cases were not considered by Pagh et al. 1
Design and Analysis of Cache-Conscious Programs
, 1999
"... algorithms are presented in some examples. This work is about experimental algorithmics, and the methodology is therefore based on experiments. All important theory is experimentally evaluated. All experiments are made by the author; when I refer to other experimental work, it is not in direct compa ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
algorithms are presented in some examples. This work is about experimental algorithmics, and the methodology is therefore based on experiments. All important theory is experimentally evaluated. All experiments are made by the author; when I refer to other experimental work, it is not in direct comparison. Preprocessing input data to some required data structure is considered a part of the algorithm, that is, the time for reading the input stream to some data structure is measured as part of a program's execution. The aim is the construction of an analytical model for predicting the behaviour of the memory hierarchy, and attempts are made to discriminate the "random noise" from the execution of programs. This noise is considered to partly, but heavily, depend on the pattern of memory references of a program. With the knowledge of when memory references happen in the program and where the references are made in the different levels of the memory hierarchy, an analytical method is propos...
On the k-independence required by linear probing and minwise independence
- In Proc. 37th International Colloquium on Automata, Languages and Programming (ICALP
, 2010
"... )-independent hash functions are required, matching an upper bound of [Indyk, SODA’99]. We also show that the multiply-shift scheme of Dietzfelbinger, most commonly used in practice, fails badly in both applications. Abstract. We show that linear probing requires 5-independent hash functions for exp ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
)-independent hash functions are required, matching an upper bound of [Indyk, SODA’99]. We also show that the multiply-shift scheme of Dietzfelbinger, most commonly used in practice, fails badly in both applications. Abstract. We show that linear probing requires 5-independent hash functions for expected constant-time performance, matching an upper bound of [Pagh et al. STOC’07]. For (1 + ε)-approximate minwise independence, we show that Ω(lg 1 ε 1
Effect of cache lines in array-based hashing algorithms
"... Abstract — Hashing algorithms and their efficiency is modeled with their expected probe lengths. This value measures the number of algorithmic steps required to find the position of an item inside the table. This metric, however, has an implicit assumption that all of these steps have a uniform cost ..."
Abstract
- Add to MetaCart
Abstract — Hashing algorithms and their efficiency is modeled with their expected probe lengths. This value measures the number of algorithmic steps required to find the position of an item inside the table. This metric, however, has an implicit assumption that all of these steps have a uniform cost. In this paper we show that this is not true on modern computers, and that caches and especially cache lines have a great impact on the performance and effectiveness of hashing algorithms that use array-based structures. Spatial locality of memory accesses plays a major role in the effectiveness of an algorithm. We show a novel model of evaluating hashing schemes; this model is based on the number of cache misses the algorithms suffer. This new approach is shown to model the real performance of hash tables more precisely than previous methods. For developing this model the sketch of proof of the expected probe length of linear probing is included.
LINEAR PROBING WITH 5-WISE INDEPENDENCE ∗
"... Abstract. Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analy ..."
Abstract
- Add to MetaCart
Abstract. Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a hash function with random and independent function values. Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a 2-wise independent hash function may have expected logarithmic cost per operation. Recently, Pǎtra¸scu and Thorup have shown that also 3- and 4-wise independent hash functions may give rise to logarithmic expected query time. On the positive side, we show that 5-wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for hashing with linear probing.
S1-47–1 Efficiency issues in the RLF heuristic for graph coloring
"... This paper presents an efficient implementation of the well-known Recursive Largest First (RLF) algorithm of Leighton to find heuristic solutions to the classical graph coloring problem. The main features under study are a lazy computation of the induced vertex degree and the use of efficient data s ..."
Abstract
- Add to MetaCart
This paper presents an efficient implementation of the well-known Recursive Largest First (RLF) algorithm of Leighton to find heuristic solutions to the classical graph coloring problem. The main features under study are a lazy computation of the induced vertex degree and the use of efficient data structures. Computational experiments show that the lazy feature leads to a novel implementation that is faster than previous implementations on graphs with high density. Cache-misses are instead determinant for the assessment of data structures. 1

