## Dispersing Hash Functions (2000)

### Cached

### Download Links

- [www.it-c.dk]
- [www.itu.dk]
- [www.it-c.dk]
- [www.itu.dk]
- [www.brics.dk]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the 4th International Workshop on Randomization and Approximation Techniques in Computer Science (RANDOM ’00), volume 8 of Proceedings in Informatics |

Citations: | 3 - 3 self |

### BibTeX

@INPROCEEDINGS{Pagh00dispersinghash,

author = {Rasmus Pagh},

title = {Dispersing Hash Functions},

booktitle = {In Proceedings of the 4th International Workshop on Randomization and Approximation Techniques in Computer Science (RANDOM ’00), volume 8 of Proceedings in Informatics},

year = {2000},

pages = {53--67}

}

### OpenURL

### Abstract

A new hashing primitive is introduced: dispersing hash functions. A family of hash functions F is dispersing if, for any set S of a certain size and random h ∈ F, the expected value of |S | − |h[S] | is not much larger than the expectancy if h had been chosen at random from the set of all functions. We give tight, up to a logarithmic factor, upper and lower bounds on the size of dispersing families. Such families previously studied, for example universal families, are significantly larger than the smallest dispersing families, making them less suitable for derandomization. We present several applications of dispersing families to derandomization (fast element distinctness, set inclusion, and static dictionary initialization). Also, a tight relationship between dispersing families and extractors, which may be of independent interest, is exhibited. We also investigate the related issue of program size for hash functions which are nearly perfect. In particular, we exhibit a dramatic increase in program size for hash functions more dispersing than a random function. 1

### Citations

667 |
Universal classes of hash functions
- Carter, Wegman
- 1979
(Show Context)
Citation Context ... which are nearly perfect. In particular, we exhibit a dramatic increase in program size for hash functions more dispersing than a random function. 1 Introduction Universal families of hash functions =-=[1]-=-, widely used in various areas of computer science (data structures, derandomization, cryptology), have the property, among other things, that any set S is dispersed by a random function from the fami... |

217 |
Storing a sparse table with O(1) worst case access time
- Fredman, os, et al.
- 1984
(Show Context)
Citation Context ...tions. In other words, storing a \near-perfect" hash function is nearly as expensive as storing a perfect one. 1.1 Related work The dispersion property of universal families was shown andsrst use=-=d in [6]. It-=- has since found application in several papers [4, 9, 13]. Another formulation of the dispersion property of a family fh i g is that that E i jh i [S]j should be \large". The denition of a disper... |

143 |
Data Structure and Algorithms 1: Sorting and Searching
- Mehlhorn
- 1984
(Show Context)
Citation Context ...o dispersers. It also covers the stronger notion of extractors, where the requirement is near-uniformity of the random variable h i (x), for uniformly and independently chosen h i and x 2 S. Mehlhorn =-=[10]-=- has given tight bounds (up to a constant factor) on the number of bits needed to represent perfect and universal hash functions, i.e. determined the size of such families up to a polynomial (see also... |

139 |
Recent developements in explicit constructions of extractors
- Shaltiel
(Show Context)
Citation Context ... As we will see in Section 4, extractors are closely related to dispersing hash function families. Construction of extractors and dispersers has been intensely researched in recent years. We refer to =-=[20]-=- and the references therein for an overview. The extractors considered in this paper are seeded extractors, and not the kind of deterministic extractors considered in some recent works on randomness e... |

136 | Should tables be sorted
- Yao
- 1981
(Show Context)
Citation Context ...t et al. [4]. In an implicit dictionary, the elements of S are placed in an array of n words. A result of Yao states that without extra memory, a lookup requires log n table lookups in the worst case =-=[19]-=-. The question considered in [4] is how little extra memory is needed to enable constant worst case lookup time. The information stored outside the table in their construction is the description of (e... |

87 | Loss-less condensers, unbalanced expanders, and extractors
- Ta-Shma, Umans, et al.
- 2001
(Show Context)
Citation Context ...nimum seed length of an O(1)-dispersing family, which is roughly O(log log u + log(r/n)). For constant ɛ (applicable when r = O(n)), the best 8known seed length is O(log log u + (log log n) 2+o(1) ) =-=[21]-=-. In particular, when log log u ≥ (log log n) 2+Ω(1) this gives an O(1)-dispersing family of functions with r = O(n) that has size (log u) O(1) , which is polynomial in the lower bound of Theorem 2. 4... |

78 | Extracting all the randomness and reducing the error in Trevisan’s extractors
- Raz, Reingold, et al.
- 1999
(Show Context)
Citation Context ...e f 00 2 F 00 is O(log(2 log(r) t ) + log log u) = O(log(r=n) + log log u). 2 The best explicit extractor in current literature with the required parameters has seed length s = O((log log(u r=n)) 3 ) =-=[16]-=-. 5 Applications The model of computation used for our applications is a unit cost RAM with word size w. We assume that the RAM to has a special instruction which, given the parameters of a dispersing... |

56 |
Extracting randomness: How and why a survey
- Nisan
- 1996
(Show Context)
Citation Context ...jSj or smaller (whereas we will be interested in jRj jSj), and \large" means greater than (1 ) jRj, for some choice of parameter (while we can only hope for some fraction of jSj). Nisan's surve=-=y [12]-=- gives a good introduction to dispersers. It also covers the stronger notion of extractors, where the requirement is near-uniformity of the random variable h i (x), for uniformly and independently cho... |

49 | Low redundancy in static dictionaries with constant query time
- Pagh
- 2001
(Show Context)
Citation Context ...a dispersing one immediately gives an improved result. 1.1 Related work The dispersion property of universal families was shown and first used in [7]. It has since found application in several papers =-=[6, 11, 16]-=-. Mehlhorn [12] has given tight bounds (up to a constant factor) on the number of bits needed to represent universal hash functions, i.e., determined the size of such families up to a polynomial. Anot... |

45 |
On the size of separating systems and families of perfect hash functions
- Fredman, Komlós
- 1984
(Show Context)
Citation Context ...has given tight bounds (up to a constant factor) on the number of bits needed to represent perfect and universal hash functions, i.e. determined the size of such families up to a polynomial (see also =-=[5, 1-=-5]). 1.2 Notation In the following, S denotes a subset of U = f1; : : : ; ug, jSj = n, and we consider functions from U to R = f1; : : : ; rg where r n > 1 and u 2r. The set of all functions from U ... |

38 |
János Komlós, and Endre Szemerédi. Storing a sparse table with O(1) worst case access time
- Fredman
- 1984
(Show Context)
Citation Context ...he literature where replacing a universal family with a dispersing one immediately gives an improved result. 1.1 Related work The dispersion property of universal families was shown and first used in =-=[7]-=-. It has since found application in several papers [6, 11, 16]. Mehlhorn [12] has given tight bounds (up to a constant factor) on the number of bits needed to represent universal hash functions, i.e.,... |

34 | Extremal Combinatorics with Applications in Computer Science - Jukna - 2001 |

31 |
On the Program Size of Perfect and Universal Hash Functions
- Mehlhorn
- 1982
(Show Context)
Citation Context ...iately gives an improved result. 1.1 Related work The dispersion property of universal families was shown and first used in [7]. It has since found application in several papers [6, 11, 16]. Mehlhorn =-=[12]-=- has given tight bounds (up to a constant factor) on the number of bits needed to represent universal hash functions, i.e., determined the size of such families up to a polynomial. Another way of stat... |

30 | Non-expansive hashing
- Linial, Sasson
- 1996
(Show Context)
Citation Context ...unction is nearly as expensive as storing a perfect one. 1.1 Related work The dispersion property of universal families was shown andsrst used in [6]. It has since found application in several papers =-=[4, 9, 13]. An-=-other formulation of the dispersion property of a family fh i g is that that E i jh i [S]j should be \large". The denition of a disperser is similar to this in that one requires j [ i h i [S]j to... |

29 |
Deterministic sorting in O(n log log n) time and linear space
- Han
(Show Context)
Citation Context ... for these problems require Ω(n log n) time, and performing a relational join is easily reduced to sorting. The currently best deterministic linear-space sorting algorithm runs in time O(n log log n) =-=[9]-=-. Using randomization and universal hashing, the time for relational joins can be improved to O(n), expected, using linear space. The number of random bits used for this is Ω(log n + log w). We now sh... |

27 |
Polynomial hash functions are reliable (extended abstract
- Dietzfelbinger, Gil, et al.
(Show Context)
Citation Context ... using O(n) words of memory. The best known deterministic algorithm runs in time O(n log n) [14]. Randomized algorithms running in time O(n), can be made to use as few as O(log n + log w) random bits =-=-=-[2]. Here, we see how to achieve another trade-o, namely expected time O(n log n), for any constant > 0, using O(log w) random bits. Randomized universe reduction Picking random functions from a ( n... |

21 | Low redundancy in static dictionaries with O(1) lookup time
- Pagh
- 1999
(Show Context)
Citation Context ...unction is nearly as expensive as storing a perfect one. 1.1 Related work The dispersion property of universal families was shown andsrst used in [6]. It has since found application in several papers =-=[4, 9, 13]. An-=-other formulation of the dispersion property of a family fh i g is that that E i jh i [S]j should be \large". The denition of a disperser is similar to this in that one requires j [ i h i [S]j to... |

21 | Error reduction for extractors
- Raz, Reingold, et al.
- 1999
(Show Context)
Citation Context ... is (1 − Ω(1))-close to uniform. ✷ It should be noted that there is an efficient explicit way of converting an extractor with nontrivial constant error into an extractor with almost any smaller error =-=[16]-=-. Unfortunately, this conversion slightly weakens other parameters, so the problem of constructing optimal extractors cannot be said to be quite the same as that of constructing optimal dispersing fam... |

19 |
Faster deterministic sorting and priority queues in linear space
- Thorup
- 1998
(Show Context)
Citation Context ... size O(log O() n). Running through the functions wesnd in time O(n log O() n) a function h such that jh[S]j n n= log n (the size of h[S] can be found by sorting in time O(n(log log n) 2 ), see [18]). Now choose S 1 S maximally such that h is 1-1 on S. We have reduced our problem to two subproblems: A dictionary for SnS 1 (which has size at most n= log n) and a dictionary for h[S 1 ] (which ... |

18 | and M.Naor. Implicit O(1) probe search
- Fiat
- 1993
(Show Context)
Citation Context ...ts. However, the only requirement on the function is that it is ( n) ; n; O(n); 2 w )-dispersing, so we can reduce the extra memory to O(log w) bits (this result was also shown in a follow-up paper [=-=-=-3], using an entirely new construction). 6 Existentially dispersing families By the result of section 3.1, we cannot expect C(S; h) =2 (or better) when picking h at random function from some family. ... |

17 |
Non-oblivious hashing
- Fiat, Naor, et al.
- 1992
(Show Context)
Citation Context ...unction is nearly as expensive as storing a perfect one. 1.1 Related work The dispersion property of universal families was shown andsrst used in [6]. It has since found application in several papers =-=[4, 9, 13]. An-=-other formulation of the dispersion property of a family fh i g is that that E i jh i [S]j should be \large". The denition of a disperser is similar to this in that one requires j [ i h i [S]j to... |

15 |
Derandomizing complexity classes
- Miltersen
- 2001
(Show Context)
Citation Context ...w that for t log n and > 0 there exist (n; )-extractors with s = O(log(log(u)=)). Much research eort is currently directed towards explicit construction of such functions (see e.g. the surveys [11, 12]). 5 Theorem 11 Suppose r is a power of 2, E : U f0; 1g s ! f0; 1g t is an (bn=2c; )-extractor, where = O(n=r), F 0 (U ! f0; 1g s ) is strongly (1 + )-universal, and F 00 (U ! f0; 1g log(r) t... |

14 | Large deviation inequalities for sums of indicator variables
- Janson
- 1994
(Show Context)
Citation Context ... value 1 i h(s i ) 2 fh(s 1 ); : : : ; h(s i 1 )g. Clearly C(S; h) = P i X i . The random variables X 1 ; : : : ; X n are not independent; however, they are negatively related: 2 Denition 1 (Janson [7=-=]-=-) Indicator random variables (I i ) n i=0 are negatively related if for each j there exist random variables (J ij ) n i=0 with distribution equal to the conditional distribution of (I i ) n i=0 given ... |

11 |
A Note on Universal Classes of Hash Functions
- Sarwate
- 1980
(Show Context)
Citation Context ...using the family of all \evenly distributing" functions. This is analogous to universal hash functions, where it is also possible to improve marginally upon the performance of a truly random func=-=tion [17-=-]. Example 1 Consider the case n = r = 3, u = 6, where = 8=9. If we pick a function at random from those mapping two elements of U to each element in the range, the expected number of collisions is 3... |

11 |
Dubhashi and Desh Ranjan. Balls and bins: A study in negative dependence. Random Structures and Algorithms
- Devdatt
- 1998
(Show Context)
Citation Context .... (2) 2r We now turn to giving tail bounds. Let S = {s1, . . . , sn} and let Bi,k be the indicator random variable that is 1 if h(sk) = i. These random variables are known to be negatively associated =-=[3]-=-. Note that C(S, h) = ∑ i Bi, where Bi := max(0, |{k | h(sk) = i}| − 1). Since the Bi are non-decreasing functions of disjoint subsets of the Bi,k, the Bi are also negatively associated. As shown in [... |

11 |
The k-th prime is greater than k(ln k + lnln k − 1) for k > 2
- Dusart
- 1999
(Show Context)
Citation Context ...ic fact: The number of primes in the interval (m/2; m], denoted π(m) − π(m/2), is at least m/(3 ln m). This can be shown, for example, by using the upper and lower bounds on π(m/2) and π(m) of Dusart =-=[4]-=-, for m > 45: π(m) − π(m/2) ≥ (m/ ln m) − 4 3 Since |x − y| has at most |x − y| with probability at most ln u ln(m/2) m/ ln(m/2) ≥ m/(3 ln m) . 2 prime divisors larger than m/2 this implies that p div... |

9 | Faster deterministic dictionaries
- Pagh
- 2000
(Show Context)
Citation Context ... S U = f0; 1g w , jSj = n, allowing constant time lookup of elements (plus any associated information) and using O(n) words of memory. The best known deterministic algorithm runs in time O(n log n) [=-=1-=-4]. Randomized algorithms running in time O(n), can be made to use as few as O(log n + log w) random bits [2]. Here, we see how to achieve another trade-o, namely expected time O(n log n), for any co... |

9 |
Improved Bounds for Covering Complete Uniform Hypergraphs
- Radhakrishnan
- 1992
(Show Context)
Citation Context ...has given tight bounds (up to a constant factor) on the number of bits needed to represent perfect and universal hash functions, i.e. determined the size of such families up to a polynomial (see also =-=[5, 1-=-5]). 1.2 Notation In the following, S denotes a subset of U = f1; : : : ; ug, jSj = n, and we consider functions from U to R = f1; : : : ; rg where r n > 1 and u 2r. The set of all functions from U ... |

5 |
Fiat and Moni Naor. Implicit O(1) probe search
- Amos
- 1993
(Show Context)
Citation Context ...s. However, the only requirement on the function is that it is, say, (2, n, O(n), 2w )-dispersing, so we can reduce the extra memory to O(log w) bits. (This result was also shown in a follow-up paper =-=[5]-=-, using an entirely different nonconstructive argument.) 116 Open problems The most obvious open problem is to find explicit dispersing families of close to minimal size for a wider range of paramete... |

2 |
Constructing efficient dictionaries in close to sorting time
- Ruˇzić
- 2008
(Show Context)
Citation Context ...mation) and using O(n) words of memory. The best known deterministic algorithm runs in time O(n log n) [8]. In the case u = n O(1) there exists a construction algorithm running in time O(n log log n) =-=[19]-=-. Randomized algorithms running in time O(n) can be made to use as few as O(log n + log w) random bits [2]. Here we will show how to achieve another trade-off, namely expected time O(n log log n) usin... |