Results 1 
8 of
8
Less hashing, same performance: Building a better bloom filter
 In Proc. the 14th Annual European Symposium on Algorithms (ESA 2006
, 2006
"... ABSTRACT: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, on ..."
Abstract

Cited by 28 (3 self)
 Add to MetaCart
ABSTRACT: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for
On Universal Classes of Extremely Random Constant Time Hash Functions and Their TimeSpace Tradeoff
"... A family of functions F that map [0; n] 7! [0; n], is said to be hwise independent if any h points in [0; n] have an image, for randomly selected f 2 F , that is uniformly distributed. This paper gives both probabilistic and explicit randomized constructions of n ffl wise independent functions, ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
A family of functions F that map [0; n] 7! [0; n], is said to be hwise independent if any h points in [0; n] have an image, for randomly selected f 2 F , that is uniformly distributed. This paper gives both probabilistic and explicit randomized constructions of n ffl wise independent functions, ffl ! 1, that can be evaluated in constant time for the standard random access model of computation. Simple extensions give comparable behavior for larger domains. As a consequence, many probabilistic algorithms can for the first time be shown to achieve their expected asymptotic performance for a feasible model of computation. This paper also establishes a tight tradeoff in the number of random seeds that must be precomputed for a random function that runs in time T and is hwise independent. Categories and Subject Descriptors: E.2 [Data Storage Representation]: Hashtable representation; F.1.2 [Modes of Computation]: Probabilistic Computation; F2.3 [Tradepffs among Computational Measures]...
Toward a usable theory of Chernoff Bounds for heterogeneous and partially dependent random variables
, 1992
"... Let X be a sum of real valued random variables and have a bounded mean E[X]. The generic ChernoffHoeffding estimate for large deviations of X is: P rfX \GammaE[X ] ag min 0 e \Gamma(a+E[X]) E[e X ], which applies with a 0 to random variables with very small tails. At issue is how to use this ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Let X be a sum of real valued random variables and have a bounded mean E[X]. The generic ChernoffHoeffding estimate for large deviations of X is: P rfX \GammaE[X ] ag min 0 e \Gamma(a+E[X]) E[e X ], which applies with a 0 to random variables with very small tails. At issue is how to use this method to attain sharp and useful estimates. We present a number of ChernoffHoeffding bounds for sums of random variables that may have a variety of dependent relationships and that may be heterogeneously distributed. AMS classifications 60F10, Large deviations, 68Q25 Analysis of algorithms, 62E17, Approximations to distributions (nonasymptotic), 60E15, Inequalities. Key words: Hoeffding bounds, Chernoff bounds, dependent random variables, Bernoulli trials. This research was supported, in part, by grants NSFCCR8902221, NSFCCR8906949, and NSFCCR9204202. 1 Summary In the analysis of probabilistic algorithms, some of the following problems may arise, possibly in complex combinations....
Closed Hashing is Computable and Optimally Randomizable with Universal Hash Functions
"... Universal hash functions that exhibit c log nwise independence are shown to give a performance in double hashing, uniform hashing and virtually any reasonable generalization of double hashing that has an expected probe count of 1 1\Gammaff +O( 1 n ) for the insertion of the ffnth item into a ta ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Universal hash functions that exhibit c log nwise independence are shown to give a performance in double hashing, uniform hashing and virtually any reasonable generalization of double hashing that has an expected probe count of 1 1\Gammaff +O( 1 n ) for the insertion of the ffnth item into a table of size n, for any fixed ff ! 1. This performance is optimal. These results are derived from a novel formulation that overestimates the expected probe count by underestimating the presence of local items already inserted into the hash table, and from a very sharp analysis of the underlying stochastic structures formed by colliding items. Analogous bounds are attained for the expected rth moment of the probe count, for any fixed r, and linear probing is also shown to achieve a performance with universal hash functions that is equivalent to the fully random case. Categories and Subject Descriptors: E.1 [Data]: Data Structuresarrays; tables; E.2 [Data]: Data Storage Representationsha...
Median Bounds and their Application
 Journal of Algorithms
, 1999
"... This paper addresses these issues in the following ways. First, a framework (Theorems 2.1 and 2.4) is presented for establishing median estimates. It is strong enough to prove, as simple corollaries, the two or three nontrivial median bounds (not so readily identified) in the literature. Second, se ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
This paper addresses these issues in the following ways. First, a framework (Theorems 2.1 and 2.4) is presented for establishing median estimates. It is strong enough to prove, as simple corollaries, the two or three nontrivial median bounds (not so readily identified) in the literature. Second, several new median results are presented, which are all, apart from one, derived via this framework. Third, median estimates are shown to simplify the analysis of some probabilistic algorithms and processes. Applications include both divideandconquer calculations and tail bound estimates for monotone functions of weakly dependent random variables. In particular, a simple analysis is given for the log 2 log 2 n + O(1) probe cost for both successful and unsuccessful Interpolation Search, which is less than two probes worse than the best bound but much simpler. Median bounds are also used, for example, to attain a tail bound to show that n random numbers can be sorted in linear time with probability 12 cn , for any fixed constant c. This result supports the design of a pipelined version of Ranade's Common PRAM emulation algorithm on an n2 n butterfly network with only one column of 2 n processors, by showing that each processor can perform a sorting step that was previously distributed among n switches. The tenor of the majority of median estimates established in Section 2 is that whereas it may be difficult to prove that some explicit integral (or discrete sum) exceeds 1/2, by some tiny amount, it is often much easier to establish global shapebased characteristics of a function  such as the number of zeros in some interval, or an inequality of the form f < g  by taking a mix of derivatives and logarithmic derivatives to show that, say, f and g both begin at zer...
Double Hashing is Computable and Randomizable with Universal Hash Functions
"... Universal hash functions that exhibit c log nwise independence are shown to give a performance in double hashing and virtually any reasonable generalization of double hashing that has an expected probe count of 1/(1alpha) + epsilon for the insertion of the alpha nth item into a table of size n, f ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Universal hash functions that exhibit c log nwise independence are shown to give a performance in double hashing and virtually any reasonable generalization of double hashing that has an expected probe count of 1/(1alpha) + epsilon for the insertion of the alpha nth item into a table of size n, for any fixed alpha 0. This performance is within epsilon of optimal. These results are derived from a novel formulation that overestimates the expected probe count by underestimating the presence of partial items already inserted into the hash table, and from a sharp analysis of the underlying stochastic structures formed by colliding items.
HashBased Data Structures for Extreme Conditions
, 2008
"... This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both an ..."
Abstract
 Add to MetaCart
This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both analyzable and practical. First, we show that a wide class of Bloom filter variants can be effectively implemented using very easily computable combinations of only two fully random hash functions. From a theoretical perspective, these results show that Bloom filters and related data structures can often be substantially derandomized with essentially no loss in performance. From a practical perspective, this derandomization allows for a significant speedup in certain query intensive applications. The rest of this work focuses on designing spaceefficient, openaddressed, multiple choice hash tables for implementation in highperformance router hardware. Using multiple hash functions conserves space, but requires every hash table operation to consider multiple hash buckets, forcing a tradeoff between the slow speed of examining these buckets serially