Results 1  10
of
13
Linear probing with constant independence
 In STOC ’07: Proceedings of the thirtyninth annual ACM symposium on Theory of computing
, 2007
"... Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
(Show Context)
Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a truly random hash function. Already Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a pairwise independent family may have expected logarithmic cost per operation. On the positive side, we show that 5wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for linear probing.
String hashing for linear probing
 In Proc. 20th SODA
, 2009
"... Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC’07, Pagh et al. presented data sets where t ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC’07, Pagh et al. presented data sets where the standard implementation of 2universal hashing leads to an expected number of Ω(log n) probes. They also showed that with 5universal hashing, the expected number of probes is constant. Unfortunately, we do not have 5universal hashing for, say, variable length strings. When we want to do such complex hashing from a complex domain, the generic standard solution is that we first do collision free hashing (w.h.p.) into a simpler intermediate domain, and second do the complicated hash function on this intermediate domain. Our contribution is that for an expected constant number of linear probes, it is suffices that each key has O(1) expected collisions with the first hash function, as long as the second hash function is 5universal. This means that the intermediate domain can be n times smaller, and such a smaller intermediate domain typically means that the overall hash function can be made simpler and at least twice as fast. The same doubling of hashing speed for O(1) expected probes follows for most domains bigger than 32bit integers, e.g., 64bit integers and fixed length strings. In addition, we study how the overhead from linear probing diminishes as the array gets larger, and what happens if strings are stored directly as intervals of the array. These cases were not considered by Pagh et al. 1
On the kindependence required by linear probing and minwise independence
 In Proc. 37th International Colloquium on Automata, Languages and Programming (ICALP
, 2010
"... )independent hash functions are required, matching an upper bound of [Indyk, SODA’99]. We also show that the multiplyshift scheme of Dietzfelbinger, most commonly used in practice, fails badly in both applications. Abstract. We show that linear probing requires 5independent hash functions for exp ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
)independent hash functions are required, matching an upper bound of [Indyk, SODA’99]. We also show that the multiplyshift scheme of Dietzfelbinger, most commonly used in practice, fails badly in both applications. Abstract. We show that linear probing requires 5independent hash functions for expected constanttime performance, matching an upper bound of [Pagh et al. STOC’07]. For (1 + ε)approximate minwise independence, we show that Ω(lg 1 ε 1
Algorithms and Experiments: The New (and Old) Methodology
 J. Univ. Comput. Sci
, 2001
"... The last twenty years have seen enormous progress in the design of algorithms, but little of it has been put into practice. Because many recently developed algorithms are hard to characterize theoretically and have large runningtime coefficients, the gap between theory and practice has widened over ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
The last twenty years have seen enormous progress in the design of algorithms, but little of it has been put into practice. Because many recently developed algorithms are hard to characterize theoretically and have large runningtime coefficients, the gap between theory and practice has widened over these years. Experimentation is indispensable in the assessment of heuristics for hard problems, in the characterization of asymptotic behavior of complex algorithms, and in the comparison of competing designs for tractable problems. Implementation, although perhaps not rigorous experimentation, was characteristic of early work in algorithms and data structures. Donald Knuth has throughout insisted on testing every algorithm and conducting analyses that can predict behavior on actual data; more recently, Jon Bentley has vividly illustrated the difficulty of implementation and the value of testing. Numerical analysts have long understood the need for standardized test suites to ensure robustness, precision and efficiency of numerical libraries. It is only recently, however, that the algorithms community has shown signs of returning to implementation and testing as an integral part of algorithm development. The emerging disciplines of experimental algorithmics and algorithm engineering have revived and are extending many of the approaches used by computing pioneers such as Floyd and Knuth and are placing on a formal basis many of Bentley's observations. We reflect on these issues, looking back at the last thirty years of algorithm development and forward to new challenges: designing cacheaware algorithms, algorithms for mixed models of computation, algorithms for external memory, and algorithms for scientific research.
Design and Analysis of CacheConscious Programs
, 1999
"... algorithms are presented in some examples. This work is about experimental algorithmics, and the methodology is therefore based on experiments. All important theory is experimentally evaluated. All experiments are made by the author; when I refer to other experimental work, it is not in direct compa ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
algorithms are presented in some examples. This work is about experimental algorithmics, and the methodology is therefore based on experiments. All important theory is experimentally evaluated. All experiments are made by the author; when I refer to other experimental work, it is not in direct comparison. Preprocessing input data to some required data structure is considered a part of the algorithm, that is, the time for reading the input stream to some data structure is measured as part of a program's execution. The aim is the construction of an analytical model for predicting the behaviour of the memory hierarchy, and attempts are made to discriminate the "random noise" from the execution of programs. This noise is considered to partly, but heavily, depend on the pattern of memory references of a program. With the knowledge of when memory references happen in the program and where the references are made in the different levels of the memory hierarchy, an analytical method is propos...
S147–1 Efficiency issues in the RLF heuristic for graph coloring
"... This paper presents an efficient implementation of the wellknown Recursive Largest First (RLF) algorithm of Leighton to find heuristic solutions to the classical graph coloring problem. The main features under study are a lazy computation of the induced vertex degree and the use of efficient data s ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
This paper presents an efficient implementation of the wellknown Recursive Largest First (RLF) algorithm of Leighton to find heuristic solutions to the classical graph coloring problem. The main features under study are a lazy computation of the induced vertex degree and the use of efficient data structures. Computational experiments show that the lazy feature leads to a novel implementation that is faster than previous implementations on graphs with high density. Cachemisses are instead determinant for the assessment of data structures. 1
LINEAR PROBING WITH 5WISE INDEPENDENCE ∗
"... Abstract. Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analy ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a hash function with random and independent function values. Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a 2wise independent hash function may have expected logarithmic cost per operation. Recently, Pǎtra¸scu and Thorup have shown that also 3 and 4wise independent hash functions may give rise to logarithmic expected query time. On the positive side, we show that 5wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for hashing with linear probing.
Effect of cache lines in arraybased hashing algorithms
"... Abstract — Hashing algorithms and their efficiency is modeled with their expected probe lengths. This value measures the number of algorithmic steps required to find the position of an item inside the table. This metric, however, has an implicit assumption that all of these steps have a uniform cost ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — Hashing algorithms and their efficiency is modeled with their expected probe lengths. This value measures the number of algorithmic steps required to find the position of an item inside the table. This metric, however, has an implicit assumption that all of these steps have a uniform cost. In this paper we show that this is not true on modern computers, and that caches and especially cache lines have a great impact on the performance and effectiveness of hashing algorithms that use arraybased structures. Spatial locality of memory accesses plays a major role in the effectiveness of an algorithm. We show a novel model of evaluating hashing schemes; this model is based on the number of cache misses the algorithms suffer. This new approach is shown to model the real performance of hash tables more precisely than previous methods. For developing this model the sketch of proof of the expected probe length of linear probing is included.
Combining Static Analysis and Simulation to Speed up Cache Performance Evaluation of Programs
"... This paper presents a new method for cache performance evaluation of programs. The method is especially suitable for programs using dynamic memory allocation. For such programs, static analysis methods are fast, but often not sufficiently accurate. On the other hand, traditional memory simulations a ..."
Abstract
 Add to MetaCart
(Show Context)
This paper presents a new method for cache performance evaluation of programs. The method is especially suitable for programs using dynamic memory allocation. For such programs, static analysis methods are fast, but often not sufficiently accurate. On the other hand, traditional memory simulations are accurate, but slow. The new method combines the benefits of static analysis and simulation. Often, data transfers between main memory and cache memory cause the dominating performance bottlenecks. We must locate the bottlenecks of a subject program to improve its performance. This can be done by simulating the cache behavior of the subject program. Typically, the subject program is augmented with commands for a memory simulator. The new method is based on integrated simulation, partial evaluation, and program slicing. Because of memory access patterns of typical programs, the augmented simulation code can be partially evaluated during compilation of the integrated simulation program. Program slicing is used to remove the computations of the subject program that are not needed in the cache simulation. The new method can reduce the time needed in cache performance evaluations without losing accuracy of the results. 1
Abstract How Caching Affects Hashing ∗
"... A number of recent papers have considered the influence of modern computer memory hierarchies on the performance of hashing algorithms [1, 2, 3]. Motivation for these papers is drawn from recent technology trends that have produced an everwidening gap between the speed of CPUs and the latency of dy ..."
Abstract
 Add to MetaCart
A number of recent papers have considered the influence of modern computer memory hierarchies on the performance of hashing algorithms [1, 2, 3]. Motivation for these papers is drawn from recent technology trends that have produced an everwidening gap between the speed of CPUs and the latency of dynamic random access memories. The result is an emerging computing folklore which contends that inferior hash functions, in terms of the number of collisions they produce, may in fact lead to superior performance because these collisions mainly occur in cache rather than main memory. This line of reasoning is the antithesis of that used to justify most of the improvements that have been proposed for open address hashing over the past forty years. Such improvements have generally sought to minimize collisions by spreading data elements more randomly through the hash table. Indeed the name “hashing � itself is meant to convey this notion [12]. However, the very act of spreading the data elements throughout the table negatively impacts their degree of spatial locality in computer memory, thereby increasing the likelihood of cache misses during long probe sequences. In this paper we study the performance tradeoffs that exist when implementing open address hash functions on contemporary computers. Experimental analyses are reported that make use of a variety of different hash functions, ranging from linear probing to highly “chaotic� forms of double hashing, using data sets that are justified through informationtheoretic analyses. Our results, contrary to those in a number of recently published papers, show that the savings gained by reducing collisions (and therefore probe sequence lengths) usually compensate for any increase in cache misses. That is, linear probing is usually no better than, and in some cases performs far worse than double hash functions that spread data more randomly through the table. to us. ∗ We wish to thank to Bernard Moret for suggesting this topic