Results 1  10
of
11
Less hashing, same performance: Building a better bloom filter
 In Proc. the 14th Annual European Symposium on Algorithms (ESA 2006
, 2006
"... ABSTRACT: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, on ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
ABSTRACT: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for
On Universal Classes of Extremely Random Constant Time Hash Functions and Their TimeSpace Tradeoff
"... A family of functions F that map [0; n] 7! [0; n], is said to be hwise independent if any h points in [0; n] have an image, for randomly selected f 2 F , that is uniformly distributed. This paper gives both probabilistic and explicit randomized constructions of n ffl wise independent functions, ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
A family of functions F that map [0; n] 7! [0; n], is said to be hwise independent if any h points in [0; n] have an image, for randomly selected f 2 F , that is uniformly distributed. This paper gives both probabilistic and explicit randomized constructions of n ffl wise independent functions, ffl ! 1, that can be evaluated in constant time for the standard random access model of computation. Simple extensions give comparable behavior for larger domains. As a consequence, many probabilistic algorithms can for the first time be shown to achieve their expected asymptotic performance for a feasible model of computation. This paper also establishes a tight tradeoff in the number of random seeds that must be precomputed for a random function that runs in time T and is hwise independent. Categories and Subject Descriptors: E.2 [Data Storage Representation]: Hashtable representation; F.1.2 [Modes of Computation]: Probabilistic Computation; F2.3 [Tradepffs among Computational Measures]...
Closed Hashing is Computable and Optimally Randomizable with Universal Hash Functions
"... Universal hash functions that exhibit c log nwise independence are shown to give a performance in double hashing, uniform hashing and virtually any reasonable generalization of double hashing that has an expected probe count of 1 1\Gammaff +O( 1 n ) for the insertion of the ffnth item into a ta ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Universal hash functions that exhibit c log nwise independence are shown to give a performance in double hashing, uniform hashing and virtually any reasonable generalization of double hashing that has an expected probe count of 1 1\Gammaff +O( 1 n ) for the insertion of the ffnth item into a table of size n, for any fixed ff ! 1. This performance is optimal. These results are derived from a novel formulation that overestimates the expected probe count by underestimating the presence of local items already inserted into the hash table, and from a very sharp analysis of the underlying stochastic structures formed by colliding items. Analogous bounds are attained for the expected rth moment of the probe count, for any fixed r, and linear probing is also shown to achieve a performance with universal hash functions that is equivalent to the fully random case. Categories and Subject Descriptors: E.1 [Data]: Data Structuresarrays; tables; E.2 [Data]: Data Storage Representationsha...
Toward a usable theory of Chernoff Bounds for heterogeneous and partially dependent random variables
, 1992
"... Let X be a sum of real valued random variables and have a bounded mean E[X]. The generic ChernoffHoeffding estimate for large deviations of X is: P rfX \GammaE[X ] ag min 0 e \Gamma(a+E[X]) E[e X ], which applies with a 0 to random variables with very small tails. At issue is how to use this ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Let X be a sum of real valued random variables and have a bounded mean E[X]. The generic ChernoffHoeffding estimate for large deviations of X is: P rfX \GammaE[X ] ag min 0 e \Gamma(a+E[X]) E[e X ], which applies with a 0 to random variables with very small tails. At issue is how to use this method to attain sharp and useful estimates. We present a number of ChernoffHoeffding bounds for sums of random variables that may have a variety of dependent relationships and that may be heterogeneously distributed. AMS classifications 60F10, Large deviations, 68Q25 Analysis of algorithms, 62E17, Approximations to distributions (nonasymptotic), 60E15, Inequalities. Key words: Hoeffding bounds, Chernoff bounds, dependent random variables, Bernoulli trials. This research was supported, in part, by grants NSFCCR8902221, NSFCCR8906949, and NSFCCR9204202. 1 Summary In the analysis of probabilistic algorithms, some of the following problems may arise, possibly in complex combinations....
bounds and their application
 Tenth Annual ACMSIAM Symposium on Discrete Algorithms
, 1999
"... ..."
Double Hashing is Computable and Randomizable with Universal Hash Functions
"... Universal hash functions that exhibit c log nwise independence are shown to give a performance in double hashing and virtually any reasonable generalization of double hashing that has an expected probe count of 1/(1alpha) + epsilon for the insertion of the alpha nth item into a table of size n, f ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Universal hash functions that exhibit c log nwise independence are shown to give a performance in double hashing and virtually any reasonable generalization of double hashing that has an expected probe count of 1/(1alpha) + epsilon for the insertion of the alpha nth item into a table of size n, for any fixed alpha 0. This performance is within epsilon of optimal. These results are derived from a novel formulation that overestimates the expected probe count by underestimating the presence of partial items already inserted into the hash table, and from a sharp analysis of the underlying stochastic structures formed by colliding items.
HashBased Data Structures for Extreme Conditions
, 2008
"... This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both an ..."
Abstract
 Add to MetaCart
This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both analyzable and practical. First, we show that a wide class of Bloom filter variants can be effectively implemented using very easily computable combinations of only two fully random hash functions. From a theoretical perspective, these results show that Bloom filters and related data structures can often be substantially derandomized with essentially no loss in performance. From a practical perspective, this derandomization allows for a significant speedup in certain query intensive applications. The rest of this work focuses on designing spaceefficient, openaddressed, multiple choice hash tables for implementation in highperformance router hardware. Using multiple hash functions conserves space, but requires every hash table operation to consider multiple hash buckets, forcing a tradeoff between the slow speed of examining these buckets serially
unknown title
"... A novel extension to external double hashing providing significant reduction to both successful and unsuccessful search lengths is presented. The experimental and analytical results demonstrate the reductions possible. This method does not restrict the hashing table configuration parameters and util ..."
Abstract
 Add to MetaCart
A novel extension to external double hashing providing significant reduction to both successful and unsuccessful search lengths is presented. The experimental and analytical results demonstrate the reductions possible. This method does not restrict the hashing table configuration parameters and utilizes very little additional storage space per bucket. The runtime performance for insertion is slightly greater than for ordinary external double hashing. 1 1.
Abstract How Caching Affects Hashing ∗
"... A number of recent papers have considered the influence of modern computer memory hierarchies on the performance of hashing algorithms [1, 2, 3]. Motivation for these papers is drawn from recent technology trends that have produced an everwidening gap between the speed of CPUs and the latency of dy ..."
Abstract
 Add to MetaCart
A number of recent papers have considered the influence of modern computer memory hierarchies on the performance of hashing algorithms [1, 2, 3]. Motivation for these papers is drawn from recent technology trends that have produced an everwidening gap between the speed of CPUs and the latency of dynamic random access memories. The result is an emerging computing folklore which contends that inferior hash functions, in terms of the number of collisions they produce, may in fact lead to superior performance because these collisions mainly occur in cache rather than main memory. This line of reasoning is the antithesis of that used to justify most of the improvements that have been proposed for open address hashing over the past forty years. Such improvements have generally sought to minimize collisions by spreading data elements more randomly through the hash table. Indeed the name “hashing � itself is meant to convey this notion [12]. However, the very act of spreading the data elements throughout the table negatively impacts their degree of spatial locality in computer memory, thereby increasing the likelihood of cache misses during long probe sequences. In this paper we study the performance tradeoffs that exist when implementing open address hash functions on contemporary computers. Experimental analyses are reported that make use of a variety of different hash functions, ranging from linear probing to highly “chaotic� forms of double hashing, using data sets that are justified through informationtheoretic analyses. Our results, contrary to those in a number of recently published papers, show that the savings gained by reducing collisions (and therefore probe sequence lengths) usually compensate for any increase in cache misses. That is, linear probing is usually no better than, and in some cases performs far worse than double hash functions that spread data more randomly through the table. to us. ∗ We wish to thank to Bernard Moret for suggesting this topic