Results 1 
3 of
3
On the kindependence required by linear probing and minwise independence
 In Proc. 37th International Colloquium on Automata, Languages and Programming (ICALP
, 2010
"... )independent hash functions are required, matching an upper bound of [Indyk, SODA’99]. We also show that the multiplyshift scheme of Dietzfelbinger, most commonly used in practice, fails badly in both applications. Abstract. We show that linear probing requires 5independent hash functions for exp ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
)independent hash functions are required, matching an upper bound of [Indyk, SODA’99]. We also show that the multiplyshift scheme of Dietzfelbinger, most commonly used in practice, fails badly in both applications. Abstract. We show that linear probing requires 5independent hash functions for expected constanttime performance, matching an upper bound of [Pagh et al. STOC’07]. For (1 + ε)approximate minwise independence, we show that Ω(lg 1 ε 1
Tabulation Based 5independent Hashing with Applications to Linear Probing and Second Moment Estimation ∗
"... In the framework of Carter and Wegman, a kindependent hash function maps any k keys independently. It is known that 5independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic 5independent hash function ..."
Abstract
 Add to MetaCart
In the framework of Carter and Wegman, a kindependent hash function maps any k keys independently. It is known that 5independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic 5independent hash function evaluates a degree 4 polynomial over a prime field containing the key domain[n] = {0,...,n−1}. Here we present an efficient 5independent hash function that uses no multiplications. Instead, for any parameter c, we make 2c−1 lookups in tables of size O(n 1/c). In experiments on different computers, our scheme gained factors 1.8 to 10 in speed over the polynomial method. We also conducted experiments on the performance of hash functions inside the above applications. In particular, we give realistic examples of inputs that make the most popular 2independent hash function perform quite poorly. This illustrates the advantage of using schemes with provably good expected performance for all inputs. 1 Introduction. We consider “kindependent hashing ” in the classic framework of Carter and Wegman [32]. For any i ≥ 1, let [i] = {0,1,...,i − 1}. We consider “hash ” functions from “keys ” in [n] to “hash values ” in [m]. A class H of hash functions is kindependent if for any distinct x0,...,xk−1 ∈ [n] and any possibly identical
The Power of Simple Tabulation Hashing Mihai Pǎtras¸cu AT&T Labs
, 2011
"... Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates ba ..."
Abstract
 Add to MetaCart
Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates back to Carter and Wegman (STOC’77). Keys are viewed as consisting of c characters. We initialize c tables T1,..., Tc mapping characters to random hash codes. A key x = (x1,..., xq) is hashed to T1[x1] ⊕ · · · ⊕ Tc[xc], where ⊕ denotes xor. While this scheme is not even 4independent, we show that it provides many of the guarantees that are normally obtained via higher independence, e.g., Chernofftype concentration, minwise hashing for estimating set intersection, and cuckoo hashing. An important target of the analysis of algorithms is to determine whether there exist practical schemes, which enjoy mathematical guarantees on performance. Hashing and hash tables are one of the most common inner loops in realworld computation, and are even builtin “unit cost ” operations in high level programming languages that offer associative arrays. Often,