Results 1  10
of
10
Space Efficient Hash Tables With Worst Case Constant Access Time
 In STACS
, 2003
"... We generalize Cuckoo Hashing [23] to dary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
(Show Context)
We generalize Cuckoo Hashing [23] to dary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes and the expected amortized insertion time is constant. This is the first dictionary that has worst case constant access time and expected constant update time, works with (1 + ffl) n space, and supports satellite information. Experiments indicate that d = 4 choices suffice for ffl 0:03. We also describe variants of the data structure that allow the use of hash functions that can be evaluted in constant time.
On Universal Classes of Extremely Random Constant Time Hash Functions and Their TimeSpace Tradeoff
"... A family of functions F that map [0; n] 7! [0; n], is said to be hwise independent if any h points in [0; n] have an image, for randomly selected f 2 F , that is uniformly distributed. This paper gives both probabilistic and explicit randomized constructions of n ffl wise independent functions, ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
A family of functions F that map [0; n] 7! [0; n], is said to be hwise independent if any h points in [0; n] have an image, for randomly selected f 2 F , that is uniformly distributed. This paper gives both probabilistic and explicit randomized constructions of n ffl wise independent functions, ffl ! 1, that can be evaluated in constant time for the standard random access model of computation. Simple extensions give comparable behavior for larger domains. As a consequence, many probabilistic algorithms can for the first time be shown to achieve their expected asymptotic performance for a feasible model of computation. This paper also establishes a tight tradeoff in the number of random seeds that must be precomputed for a random function that runs in time T and is hwise independent. Categories and Subject Descriptors: E.2 [Data Storage Representation]: Hashtable representation; F.1.2 [Modes of Computation]: Probabilistic Computation; F2.3 [Tradepffs among Computational Measures]...
Closed Hashing is Computable and Optimally Randomizable with Universal Hash Functions
"... Universal hash functions that exhibit c log nwise independence are shown to give a performance in double hashing, uniform hashing and virtually any reasonable generalization of double hashing that has an expected probe count of 1 1\Gammaff +O( 1 n ) for the insertion of the ffnth item into a ta ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Universal hash functions that exhibit c log nwise independence are shown to give a performance in double hashing, uniform hashing and virtually any reasonable generalization of double hashing that has an expected probe count of 1 1\Gammaff +O( 1 n ) for the insertion of the ffnth item into a table of size n, for any fixed ff ! 1. This performance is optimal. These results are derived from a novel formulation that overestimates the expected probe count by underestimating the presence of local items already inserted into the hash table, and from a very sharp analysis of the underlying stochastic structures formed by colliding items. Analogous bounds are attained for the expected rth moment of the probe count, for any fixed r, and linear probing is also shown to achieve a performance with universal hash functions that is equivalent to the fully random case. Categories and Subject Descriptors: E.1 [Data]: Data Structuresarrays; tables; E.2 [Data]: Data Storage Representationsha...
Toward a usable theory of Chernoff Bounds for heterogeneous and partially dependent random variables
, 1992
"... Let X be a sum of real valued random variables and have a bounded mean E[X]. The generic ChernoffHoeffding estimate for large deviations of X is: P rfX \GammaE[X ] ag min 0 e \Gamma(a+E[X]) E[e X ], which applies with a 0 to random variables with very small tails. At issue is how to use this ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Let X be a sum of real valued random variables and have a bounded mean E[X]. The generic ChernoffHoeffding estimate for large deviations of X is: P rfX \GammaE[X ] ag min 0 e \Gamma(a+E[X]) E[e X ], which applies with a 0 to random variables with very small tails. At issue is how to use this method to attain sharp and useful estimates. We present a number of ChernoffHoeffding bounds for sums of random variables that may have a variety of dependent relationships and that may be heterogeneously distributed. AMS classifications 60F10, Large deviations, 68Q25 Analysis of algorithms, 62E17, Approximations to distributions (nonasymptotic), 60E15, Inequalities. Key words: Hoeffding bounds, Chernoff bounds, dependent random variables, Bernoulli trials. This research was supported, in part, by grants NSFCCR8902221, NSFCCR8906949, and NSFCCR9204202. 1 Summary In the analysis of probabilistic algorithms, some of the following problems may arise, possibly in complex combinations....
Double Hashing is Computable and Randomizable with Universal Hash Functions
"... Universal hash functions that exhibit c log nwise independence are shown to give a performance in double hashing and virtually any reasonable generalization of double hashing that has an expected probe count of 1/(1alpha) + epsilon for the insertion of the alpha nth item into a table of size n, f ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Universal hash functions that exhibit c log nwise independence are shown to give a performance in double hashing and virtually any reasonable generalization of double hashing that has an expected probe count of 1/(1alpha) + epsilon for the insertion of the alpha nth item into a table of size n, for any fixed alpha 0. This performance is within epsilon of optimal. These results are derived from a novel formulation that overestimates the expected probe count by underestimating the presence of partial items already inserted into the hash table, and from a sharp analysis of the underlying stochastic structures formed by colliding items.
unknown title
"... A novel extension to external double hashing providing significant reduction to both successful and unsuccessful search lengths is presented. The experimental and analytical results demonstrate the reductions possible. This method does not restrict the hashing table configuration parameters and util ..."
Abstract
 Add to MetaCart
(Show Context)
A novel extension to external double hashing providing significant reduction to both successful and unsuccessful search lengths is presented. The experimental and analytical results demonstrate the reductions possible. This method does not restrict the hashing table configuration parameters and utilizes very little additional storage space per bucket. The runtime performance for insertion is slightly greater than for ordinary external double hashing. 1 1.
On the Cell Probe Complexity of Membership and Perfect Hashing ∗
"... We study two fundamental static data structure problems, membership and perfect hashing, in Yao’s cell probe model. The first space and bit probe optimal worst case upper bound is given for the membership problem. We also give a new efficient membership scheme where the query algorithm makes just on ..."
Abstract
 Add to MetaCart
(Show Context)
We study two fundamental static data structure problems, membership and perfect hashing, in Yao’s cell probe model. The first space and bit probe optimal worst case upper bound is given for the membership problem. We also give a new efficient membership scheme where the query algorithm makes just one adaptive choice, and probes a total of three words. A lower bound shows that two word probes generally do not suffice. For minimal perfect hashing we show a tight bit probe lower bound, and give a simple scheme achieving this performance, making just one adaptive choice. Linear range perfect hashing is shown to be implementable with the same number of bit probes, of which just one is adaptive. In contrast, we establish that for sufficiently sparse sets, nonadaptive perfect hashing needs exponentially more bit probes. This is the first such separation of adaptivity and nonadaptivity. 1.
Hashing, randomness and dictionaries
, 2002
"... This thesis is centered around one of the most basic information retrieval problems, namely that of storing and accessing the elements of a set. Each element in the set has some associated information that is returned along with it. The problem is referred to as the dictionary problem, due to the si ..."
Abstract
 Add to MetaCart
(Show Context)
This thesis is centered around one of the most basic information retrieval problems, namely that of storing and accessing the elements of a set. Each element in the set has some associated information that is returned along with it. The problem is referred to as the dictionary problem, due to the similarity to a bookshelf dictionary, which contains a set of words and has an explanation associated with each word. In the static version of the problem the set is fixed, whereas in the dynamic version, insertions and deletions of elements are possible. The approach taken is that of the theoretical algorithms community. We work (almost) exclusively with a model, a mathematical object that is meant to capture essential aspects of a real computer. The main model considered here (and in most of the literature on dictionaries) is a unit cost RAM with a word size that allows a set element to be stored in one word. We consider several variants of the dictionary problem, as well as some related problems. The problems are studied mainly from an upper bound perspective,
Abstract How Caching Affects Hashing ∗
"... A number of recent papers have considered the influence of modern computer memory hierarchies on the performance of hashing algorithms [1, 2, 3]. Motivation for these papers is drawn from recent technology trends that have produced an everwidening gap between the speed of CPUs and the latency of dy ..."
Abstract
 Add to MetaCart
(Show Context)
A number of recent papers have considered the influence of modern computer memory hierarchies on the performance of hashing algorithms [1, 2, 3]. Motivation for these papers is drawn from recent technology trends that have produced an everwidening gap between the speed of CPUs and the latency of dynamic random access memories. The result is an emerging computing folklore which contends that inferior hash functions, in terms of the number of collisions they produce, may in fact lead to superior performance because these collisions mainly occur in cache rather than main memory. This line of reasoning is the antithesis of that used to justify most of the improvements that have been proposed for open address hashing over the past forty years. Such improvements have generally sought to minimize collisions by spreading data elements more randomly through the hash table. Indeed the name “hashing � itself is meant to convey this notion [12]. However, the very act of spreading the data elements throughout the table negatively impacts their degree of spatial locality in computer memory, thereby increasing the likelihood of cache misses during long probe sequences. In this paper we study the performance tradeoffs that exist when implementing open address hash functions on contemporary computers. Experimental analyses are reported that make use of a variety of different hash functions, ranging from linear probing to highly “chaotic� forms of double hashing, using data sets that are justified through informationtheoretic analyses. Our results, contrary to those in a number of recently published papers, show that the savings gained by reducing collisions (and therefore probe sequence lengths) usually compensate for any increase in cache misses. That is, linear probing is usually no better than, and in some cases performs far worse than double hash functions that spread data more randomly through the table. to us. ∗ We wish to thank to Bernard Moret for suggesting this topic
Contents
"... This is a preliminary version of the first part of our book on the theory of quadrature methods. Further parts will follow soon. The approximate evaluation of definite integrals is treated in every book on numerical analysis. Our intention is to complete these representations by showing that there i ..."
Abstract
 Add to MetaCart
This is a preliminary version of the first part of our book on the theory of quadrature methods. Further parts will follow soon. The approximate evaluation of definite integrals is treated in every book on numerical analysis. Our intention is to complete these representations by showing that there is a coherent theory with many interesting (solved and unsolved) problems as well as many elegant and deep results. For more information about the conceptions, we refer to the first chapter. We emphasize the word “theory ” in the title. To anyone merely interested in the treatment of a concrete integration problem, we recommend the books of Davis and Rabinowitz or Krommer and Ueberhuber.