Results 1 
8 of
8
Linear probing with constant independence
 In STOC ’07: Proceedings of the thirtyninth annual ACM symposium on Theory of computing
, 2007
"... Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a truly random hash function. Already Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a pairwise independent family may have expected logarithmic cost per operation. On the positive side, we show that 5wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for linear probing.
Tabulation Based 5Universal Hashing and Linear Probing
"... Previously [SODA’04] we devised the fastest known algorithm for 4universal hashing. The hashing was based on small precomputed4universal tables. This led to a fivefold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the precomputed tables a ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Previously [SODA’04] we devised the fastest known algorithm for 4universal hashing. The hashing was based on small precomputed4universal tables. This led to a fivefold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the precomputed tables are made 5universal, then the hash value becomes 5universal without any other change to the computation. Relatively this leads to even bigger gains since the direct methods for 5universal hashing use degree 4 polynomials. Experimentally, we find that our method can gain up to an order of magnitude in speed over direct 5universal hashing. Some of the most popular randomized algorithms have been proved to have the desired expected running time using
LINEAR PROBING WITH 5WISE INDEPENDENCE ∗
"... Abstract. Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analy ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a hash function with random and independent function values. Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a 2wise independent hash function may have expected logarithmic cost per operation. Recently, Pǎtra¸scu and Thorup have shown that also 3 and 4wise independent hash functions may give rise to logarithmic expected query time. On the positive side, we show that 5wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for hashing with linear probing.
Tabulation Based 5independent Hashing with Applications to Linear Probing and Second Moment Estimation ∗
"... In the framework of Carter and Wegman, a kindependent hash function maps any k keys independently. It is known that 5independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic 5independent hash function ..."
Abstract
 Add to MetaCart
In the framework of Carter and Wegman, a kindependent hash function maps any k keys independently. It is known that 5independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic 5independent hash function evaluates a degree 4 polynomial over a prime field containing the key domain[n] = {0,...,n−1}. Here we present an efficient 5independent hash function that uses no multiplications. Instead, for any parameter c, we make 2c−1 lookups in tables of size O(n 1/c). In experiments on different computers, our scheme gained factors 1.8 to 10 in speed over the polynomial method. We also conducted experiments on the performance of hash functions inside the above applications. In particular, we give realistic examples of inputs that make the most popular 2independent hash function perform quite poorly. This illustrates the advantage of using schemes with provably good expected performance for all inputs. 1 Introduction. We consider “kindependent hashing ” in the classic framework of Carter and Wegman [32]. For any i ≥ 1, let [i] = {0,1,...,i − 1}. We consider “hash ” functions from “keys ” in [n] to “hash values ” in [m]. A class H of hash functions is kindependent if for any distinct x0,...,xk−1 ∈ [n] and any possibly identical
Derandomization, Hashing and Expanders
"... Regarding complexity of computation, randomness is a significant resource beside time and space. Particularly from a theoretical viewpoint, it is a fundamental question whether availability of random numbers gives any additional power. Most of randomized algorithms are analyzed under the assumption ..."
Abstract
 Add to MetaCart
Regarding complexity of computation, randomness is a significant resource beside time and space. Particularly from a theoretical viewpoint, it is a fundamental question whether availability of random numbers gives any additional power. Most of randomized algorithms are analyzed under the assumption that independent and unbiased random bits are accessible. However, truly random bits are scarce in reality. In practice, pseudorandom generators are used in place of random numbers; usually, even the seed of the generator does not come from a source of true randomness. While things mostly work well in practice, there are occasional problems with use of weak pseudorandom generators. Further, randomized algorithms are not suited for applications where reliability is a key concern. Derandomization is the process of minimizing the use of random bits, either to small amounts or removing them altogether. We may identify two lines of work in this direction. There has been a lot of work in designing general tools for simulating randomness and making deterministic versions of randomized algorithms,
The Power of Simple Tabulation Hashing Mihai Pǎtras¸cu AT&T Labs
, 2011
"... Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates ba ..."
Abstract
 Add to MetaCart
Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates back to Carter and Wegman (STOC’77). Keys are viewed as consisting of c characters. We initialize c tables T1,..., Tc mapping characters to random hash codes. A key x = (x1,..., xq) is hashed to T1[x1] ⊕ · · · ⊕ Tc[xc], where ⊕ denotes xor. While this scheme is not even 4independent, we show that it provides many of the guarantees that are normally obtained via higher independence, e.g., Chernofftype concentration, minwise hashing for estimating set intersection, and cuckoo hashing. An important target of the analysis of algorithms is to determine whether there exist practical schemes, which enjoy mathematical guarantees on performance. Hashing and hash tables are one of the most common inner loops in realworld computation, and are even builtin “unit cost ” operations in high level programming languages that offer associative arrays. Often,
Constantround secure twoparty computation from a linear number of oblivious transfer
, 2013
"... We construct a protocol for constant round TwoParty Secure Function Evaluation in the standard model which improves previous protocols in several ways. We are able to reduce the number of calls to Oblivious Transfer by a factor proportional to the security parameter. In addition to being more effic ..."
Abstract
 Add to MetaCart
We construct a protocol for constant round TwoParty Secure Function Evaluation in the standard model which improves previous protocols in several ways. We are able to reduce the number of calls to Oblivious Transfer by a factor proportional to the security parameter. In addition to being more efficient than previous instantiations, our protocol only requires black box calls to OT and Commitment. This is achieved by the use of a faulty variant of the CutandChoose OT. The concepts of Garbling Schemes, faulty CutandChoose Oblivious Transfer and Privacy Amplification are combined using the CutandChoose paradigm to obtain the final protocol.
LargeScale Learning with Less RAM via Randomization
"... We reduce the memory footprint of popular largescale online learning methods by projecting our weight vector onto a coarse discrete set using randomized rounding. Compared to standard 32bit float encodings, this reduces RAM usage by more than 50 % during training and by up to 95 % when making pred ..."
Abstract
 Add to MetaCart
We reduce the memory footprint of popular largescale online learning methods by projecting our weight vector onto a coarse discrete set using randomized rounding. Compared to standard 32bit float encodings, this reduces RAM usage by more than 50 % during training and by up to 95 % when making predictions from a fixed model, with almost no loss in accuracy. We also show that randomized counting can be used to implement percoordinate learning rates, improving model quality with little additional RAM. We prove these memorysaving methods achieve regret guarantees similar to their exact variants. Empirical evaluation confirms excellent performance, dominating standard approaches across memory versus accuracy tradeoffs. 1.