## Linear probing with constant independence (2007)

### Cached

### Download Links

- [www.cs.lth.se]
- [www.it-c.dk]
- [www.itu.dk]
- [www.it-c.dk]
- [www.itu.dk]
- [www.itu.dk]
- [www.itu.dk]
- [www.itu.dk]
- [www.itu.dk]
- [www.itu.dk]
- DBLP

### Other Repositories/Bibliography

Venue: | In STOC ’07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing |

Citations: | 14 - 2 self |

### BibTeX

@INPROCEEDINGS{Pagh07linearprobing,

author = {Anna Pagh and Rasmus Pagh},

title = {Linear probing with constant independence},

booktitle = {In STOC ’07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing},

year = {2007},

pages = {318--327},

publisher = {ACM Press}

}

### OpenURL

### Abstract

Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a truly random hash function. Already Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a pairwise independent family may have expected logarithmic cost per operation. On the positive side, we show that 5-wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for linear probing.

### Citations

2377 | The Art of Computer Programming - Knuth - 1968 |

1584 | Probability inequalities for sums of bounded random variables
- Hoeffding
- 1963
(Show Context)
Citation Context ...an those for fully independent families. In the fully independent case, the probability that an interval of length q is fully loaded is less than e q(1−α+ln α) , according to ChernoffHoeffding bounds =-=[3, 6]-=-. Plugging this bound into the proof of Theorem 4 would give, e.g., E(C U α ) < 1 + 1−α+ln α e ln 2 · |1 − α + ln α| . (1) For α close to 1, a good upper bound on (1) is 1 + 2 (1 − ln2 α) −2 . The con... |

774 |
A measure of the asymptotic efficiency for tests of a hypothesis based on the sum of observations
- Chernoff
- 1952
(Show Context)
Citation Context ...an those for fully independent families. In the fully independent case, the probability that an interval of length q is fully loaded is less than e q(1−α+ln α) , according to ChernoffHoeffding bounds =-=[3, 6]-=-. Plugging this bound into the proof of Theorem 4 would give, e.g., E(C U α ) < 1 + 1−α+ln α e ln 2 · |1 − α + ln α| . (1) For α close to 1, a good upper bound on (1) is 1 + 2 (1 − ln2 α) −2 . The con... |

718 |
Universal classes of hash functions
- Carter, Wegman
- 1979
(Show Context)
Citation Context ...he work that has since gone into understanding the properties of linear probing, is based on the assumption that h is a truly random function. In 1977, Carter and Wegman’s notion of universal hashing =-=[2]-=- initiated a new era in the design of hashing algorithms, where explicit and efficient ways of choosing hash functions replaced the unrealistic assumption of complete randomness. In their seminal pape... |

358 |
New hash functions and their use in authentication and set equality
- Wegman, Carter
- 1981
(Show Context)
Citation Context .... However, some interesting k-wise independent families have a slightly nonuniform distribution, and we will provide analysis for such families as well. .s2.2 Hash function families Carter and Wegman =-=[15]-=- observed that the family of degree k − 1 polynomials in any finite field is k-wise independent. Specifically, for any prime p we may use the field defined by arithmetic modulo p to get a family of fu... |

220 |
A fast and simple randomized parallel algorithm for the maximal independent set problem
- Alon, Babai, et al.
- 1986
(Show Context)
Citation Context ...ue given by � 5.2α(1 + ǫ) 2 4 T(α,ǫ) = min + − 1, (1 − (1 + ǫ)α) 2 9α Remark that T(α,ǫ) = O �α(1+ǫ) 2 (1−α) 2 � . 2.2 Hash function families 3α2 (1 + ǫ) 2 (1 − (1 + ǫ)α) 4 � 2 + 4 � 9α � Alon et al. =-=[1]-=- observed that the family of degree k − 1 polynomials in any finite field is k-wise independent. Specifically, for any prime p we may use the field defined by arithmetic modulo p to get a family of fu... |

136 | Cuckoo hashing
- Pagh, Rodler
(Show Context)
Citation Context ... searches is conducted, showing that the expected number of probes made during a search for a random element in the table is less than 1 + 2 1−α . 1.3 Significance Several recent experimental studies =-=[1, 5, 9]-=- have found linear probing to be the fastest hash table organization for moderate load factors (30-70%). While linear probing operations are known to require more instructions than those of other open... |

107 | Sorting and Searching - Knuth - 1998 |

88 | A complexity theory of efficient parallel algorithms - Kruskal, Rudolph, et al. - 1988 |

61 |
Tabulation based 4-universal hashing with applications to second moment estimation
- Thorup, Zhang
- 2004
(Show Context)
Citation Context ...imum load ¯α for this family is in the range [α; (1 + n/p)α]. By choosing p much larger than n we can make ¯α arbitrarily close to α. A recently proposed k-wise independent family of Thorup and Zhang =-=[14]-=- has uniformly distributed function values in [r], and thus ¯α = α. From a theoretical perspective (ignoring constant factors) it is inferior to Siegel’s highly independent family [12], since the eval... |

47 | Space efficient hash tables with worst case constant access time
- Fotakis, Pagh, et al.
- 2003
(Show Context)
Citation Context ...ing probe sequence, we believe that it refers to linear probing.sA potentially more practical method due to Dietzfelbinger (seemingly described in the literature only as a “personal communication” in =-=[5]-=-) can be used to achieve characteristics similar to those of linear probing, still using space n ǫ . This method splits the problem into many subproblems of roughly the same size, and simulates full r... |

30 | Balanced allocation and dictionaries with tightly packed constant size bins
- Dietzfelbinger, Weidling
- 2005
(Show Context)
Citation Context ... it is mentioned here together with the double hashing probe sequence, we believe that it refers to linear probing.sA potentially more practical method is the “split and share” technique described in =-=[4]-=-. It can be used to achieve characteristics similar to those of linear probing, still using space n ǫ , for any given ǫ > 0. The idea is to split the set of keys into many subsets of roughly the same ... |

26 | On universal classes of extremely random constant-time hash functions
- Siegel
(Show Context)
Citation Context ...-wise independence is sufficient to achieve essentially the same performance as in the fully random case. (We use n to denote the number of keys inserted into the hash table.) Another paper by Siegel =-=[12]-=- shows that evaluation of a hash function from a O(log n)-wise independent family requires time Ω(log n) unless the space used to describe the function is n Ω(1) . A family of functions is given that ... |

23 | Sorting and Searching, vol. 3 of The Art of Computer Programming - Knuth - 1973 |

18 |
The Analysis of Closed Hashing under Limited Randomness (Extended Abstract
- Schmidt, Siegel
- 1990
(Show Context)
Citation Context ...o [...] double hashing and open addressing.” 1 1.1 Previous results using limited randomness The first analysis of linear probing relying only on limited randomness was given by Siegel and Schmidt in =-=[11, 13]-=-. Specifically, they show that O(log n)-wise independence is sufficient to achieve essentially the same performance as in the fully random case. (We use n to denote the number of keys inserted into th... |

17 |
and Flemming Friche Rodler, “Cuckoo hashing
- Pagh
- 2004
(Show Context)
Citation Context ...nds are a factor Ω( full independence. (The exponent can be made arbitrarily close to zero by increasing the independence of the hash function.) 1.3. Significance. Several recent experimental studies =-=[1, 4, 8]-=- have found linear probing to be the fastest hash table organization for moderate load factors (30-70%). While linear probing operations are known to require more instructions than those of other open... |

12 | Graph and hashing algorithms for modern architectures: Design and performance
- Black, Martel, et al.
- 1998
(Show Context)
Citation Context ... searches is conducted, showing that the expected number of probes made during a search for a random element in the table is less than 1 + 2 1−α . 1.3 Significance Several recent experimental studies =-=[1, 5, 9]-=- have found linear probing to be the fastest hash table organization for moderate load factors (30-70%). While linear probing operations are known to require more instructions than those of other open... |

11 | Strongly history-independent hashing with applications
- Blelloch, Golovin
- 2007
(Show Context)
Citation Context ... CPU registers, can be used to give provably good expected performance. The work of the present paper has been built upon in designing hash tables with additional considerations. Blelloch and Golovin =-=[2]-=- described a linear probing hash table implementation that is strongly history independent. Thorup [15] studied how to get efficient compositions of hash functions for linear probing when the domain o... |

10 |
How caching affects hashing
- Heileman, Luo
- 2005
(Show Context)
Citation Context ... searches is conducted, showing that the expected number of probes made during a search for a random element in the table is less than 1 + 2 1−α . 1.3 Significance Several recent experimental studies =-=[1, 5, 9]-=- have found linear probing to be the fastest hash table organization for moderate load factors (30-70%). While linear probing operations are known to require more instructions than those of other open... |

8 | String hashing for linear probing
- Thorup
- 2009
(Show Context)
Citation Context ...has been built upon in designing hash tables with additional considerations. Blelloch and Golovin [2] described a linear probing hash table implementation that is strongly history independent. Thorup =-=[15]-=- studied how to get efficient compositions of hash functions for linear probing when the domain of keys is complex, like the set of variable-length strings. 2. Preliminaries. 2.1. Notation and definit... |

6 | Notes on ”open” addressing
- Knuth
- 1963
(Show Context)
Citation Context ...n a greedy fashion (ensuring that no key x is moved beyond h(x)), until a vacant array position is encountered. Linear probing dates back to 1954, but was first analyzed by Knuth in a 1963 memorandum =-=[7]-=- now considered to be the birth of the area of analysis of algorithms [10]. Knuth’s analysis, as well as most of the work that has since gone into understanding the properties of linear probing, is ba... |

6 | Closed hashing is computable and optimally randomizable with universal hash functions
- Siegel, Schmidt
- 1995
(Show Context)
Citation Context ...o [...] double hashing and open addressing.” 1 1.1 Previous results using limited randomness The first analysis of linear probing relying only on limited randomness was given by Siegel and Schmidt in =-=[11, 13]-=-. Specifically, they show that O(log n)-wise independence is sufficient to achieve essentially the same performance as in the fully random case. (We use n to denote the number of keys inserted into th... |

3 |
Special issue on average case analysis of algorithms
- Prodinger, S
- 1998
(Show Context)
Citation Context ... is not present in the data structure. Linear probing dates back to 1954, but was first analyzed by Knuth in a 1963 memorandum [8] now considered to be the birth of the area of analysis of algorithms =-=[10]-=-. Knuth’s analysis, as well as most of the work that has since gone into understanding the properties of linear probing, is based on the assumption that h is a truly random function. In 1977, Carter a... |

3 | Double hashing is computable and randomizable with universal hash functions, submitted
- Schmidt, Siegel
- 1995
(Show Context)
Citation Context ...e analysis to [...] double hashing and open addressing.” 1 1.1 Previous results using limited randomness The first analysis of linear probing relying only on limited randomness was given by Siegel in =-=[11, 12]-=-. Specifically, he shows that O(log n)-wise independence is sufficient to achieve essentially the same performance as in the fully random case. However, another paper by Siegel [13] shows that evaluat... |

1 |
Szpankowski (eds.), Special issue on average case analysis of algorithms
- Prodinger, W
- 1998
(Show Context)
Citation Context ... vacant array position is encountered. Linear probing dates back to 1954, but was first analyzed by Knuth in a 1963 memorandum [7] now considered to be the birth of the area of analysis of algorithms =-=[10]-=-. Knuth’s analysis, as well as most of the work that has since gone into understanding the properties of linear probing, is based on the assumption that h is a truly random function. In 1977, Carter a... |

1 |
Karloff and Prabhakar Raghavan, Randomized algorithms and pseudorandom numbers
- Howard
- 1993
(Show Context)
Citation Context ...andom accesses while computing the hash function value may destroy this advantage. According to our knowledge, the first paper in analysis of algorithms where exactly 5-wise independence appeared was =-=[6]-=-. They study a version of Quicksort that uses a 5-wise independent pseudorandom number generator. 1.2. Our results. We show in this paper that linear probing using a pairwise independent family may ha... |