Results 1 - 10
of
72
Space Efficient Hash Tables With Worst Case Constant Access Time
- In STACS
, 2003
"... We generalize Cuckoo Hashing [23] to d-ary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
We generalize Cuckoo Hashing [23] to d-ary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1 + ffl) n memory cells, for any constant ffl ? 0. Assuming uniform hashing, accessing or deleting table entries takes at most d = O(ln ffl ) probes and the expected amortized insertion time is constant. This is the first dictionary that has worst case constant access time and expected constant update time, works with (1 + ffl) n space, and supports satellite information. Experiments indicate that d = 4 choices suffice for ffl 0:03. We also describe variants of the data structure that allow the use of hash functions that can be evaluted in constant time.
Why simple hash functions work: Exploiting the entropy in a data stream
- In Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms
, 2008
"... Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealiz ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Hashing is fundamental to many algorithms and data structures widely used in practice. For theoretical analysis of hashing, there have been two main approaches. First, one can assume that the hash function is truly random, mapping each data item independently and uniformly to the range. This idealized model is unrealistic because a truly random hash function requires an exponential number of bits to describe. Alternatively, one can provide rigorous bounds on performance when explicit families of hash functions are used, such as 2-universal or O(1)-wise independent families. For such families, performance guarantees are often noticeably weaker than for ideal hashing. In practice, however, it is commonly observed that weak hash functions, including 2-universal hash functions, perform as predicted by the idealized analysis for truly random hash functions. In this paper, we try to explain this phenomenon. We demonstrate that the strong performance of universal hash functions in practice can arise naturally from a combination of the randomness of the hash function and the data. Specifically, following the large body of literature on random sources and randomness extraction, we model the data as coming from a “block source, ” whereby
Implementing signatures for transactional memory
- 40th Intl. Symp. on Microarchitecture
, 2007
"... Transactional Memory (TM) systems must track the read and write sets—items read and written during a transaction—to detect conflicts among concurrent transactions. Several TMs use signatures, which summarize unbounded read/write sets in bounded hardware at a performance cost of false positives (conf ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
Transactional Memory (TM) systems must track the read and write sets—items read and written during a transaction—to detect conflicts among concurrent transactions. Several TMs use signatures, which summarize unbounded read/write sets in bounded hardware at a performance cost of false positives (conflicts detected when none exists). This paper examines different organizations to achieve hardware-efficient and accurate TM signatures. First, we find that implementing each signature with a single k-hashfunction Bloom filter (True Bloom signature) is inefficient, as it requires multi-ported SRAMs. Instead, we advocate using k single-hash-function Bloom filters in parallel (Parallel Bloom signature), using area-efficient single-ported SRAMs. Our formal analysis shows that both organizations perform equally well in theory and our simulationbased evaluation shows this to hold approximately in practice. We also show that by choosing high-quality hash functions we can achieve signature designs noticeably more accurate than the previously proposed implementations. Finally, we adapt Pagh and Rodler’s cuckoo hashing to implement Cuckoo-Bloom signatures. While this representation does not support set intersection, it mitigates false positives for the common case of small read/write sets and performs like a Bloom filter for large sets. 1.
An Improved Construction for Counting Bloom Filters
- 14th Annual European Symposium on Algorithms, LNCS 4168
, 2006
"... Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-bas ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-based alternative based on d-left hashing called a d-left CBF (dlCBF). The dlCBF offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more. We describe the construction of dlCBFs, provide an analysis, and demonstrate their effectiveness experimentally. 1
Linear probing with constant independence
- In STOC ’07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
, 2007
"... Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free access to a truly random hash function. Already Carter and Wegman, in their seminal paper on universal hashing, raised the question of extending their analysis to linear probing. However, we show in this paper that linear probing using a pairwise independent family may have expected logarithmic cost per operation. On the positive side, we show that 5-wise independence is enough to ensure constant expected time per operation. This resolves the question of finding a space and time efficient hash function that provably ensures good performance for linear probing.
Cuckoo hashing: Further analysis
, 2003
"... We consider cuckoo hashing as proposed by Pagh and Rodler in 2001. We show that the expected construction time of the hash table is O(n) as long as the two open addressing tables are each of size at least (1 #)n,where#>0andn is the number of data points. Slightly improved bounds are obtained for ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
We consider cuckoo hashing as proposed by Pagh and Rodler in 2001. We show that the expected construction time of the hash table is O(n) as long as the two open addressing tables are each of size at least (1 #)n,where#>0andn is the number of data points. Slightly improved bounds are obtained for various probabilities and constraints. The analysis rests on simple properties of branching processes.
More Robust Hashing: Cuckoo Hashing with a Stash
- IN PROCEEDINGS OF THE 16TH ANNUAL EUROPEAN SYMPOSIUM ON ALGORITHMS (ESA
, 2008
"... Cuckoo hashing holds great potential as a high-performance hashing scheme for real applications. Up to this point, the greatest drawback of cuckoo hashing appears to be that there is a polynomially small but practically significant probability that a failure occurs during the insertion of an item, r ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Cuckoo hashing holds great potential as a high-performance hashing scheme for real applications. Up to this point, the greatest drawback of cuckoo hashing appears to be that there is a polynomially small but practically significant probability that a failure occurs during the insertion of an item, requiring an expensive rehashing of all items in the table. In this paper, we show that this failure probability can be dramatically reduced by the addition of a very small constant-sized stash. We demonstrate both analytically and through simulations that stashes of size equivalent to only three or four items yield tremendous improvements, enhancing cuckoo hashing’s practical viability in both hardware and software. Our analysis naturally extends previous analyses of multiple cuckoo hashing variants, and the approach may prove useful in further related schemes.
A Sybil-Proof One-Hop DHT
- Workshop on Social NetworkSystems,Glasgow,Scotland
, 2008
"... Decentralized systems, such as structured overlays, are subject to the Sybil attack, in which an adversary creates many false identities to increase its influence. This paper describes a one-hop distributed hash table which uses the social links between users to strongly resist the Sybil attack. The ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Decentralized systems, such as structured overlays, are subject to the Sybil attack, in which an adversary creates many false identities to increase its influence. This paper describes a one-hop distributed hash table which uses the social links between users to strongly resist the Sybil attack. The social network is assumed to be fast mixing, meaning that a random walk in the honest part of the network quickly approaches the uniform distribution. As in the related SybilLimit system [25], with a social network of n honest nodes and m honest edges, the protocol can tolerate up to o(n / log n) attack edges (social links from honest nodes to compromised nodes). The routing tables contain O ( √ m log m) entries per node and are constructed efficiently by a distributed protocol. This is the first sublinear solution to this problem. Preliminary simulation results are presented to demonstrate the approach’s effectiveness. 1.
Rapid mathematical programming
, 2004
"... This book was typeset with TEX using L ATEX and many further formatting packages. The pictures were prepared using pstricks, xfig, gnuplot and gmt. All numerals in this text are recycled. Für meine Eltern Preface Avoid reality at all costs — fortune(6) As the inclined reader will find out soon enoug ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This book was typeset with TEX using L ATEX and many further formatting packages. The pictures were prepared using pstricks, xfig, gnuplot and gmt. All numerals in this text are recycled. Für meine Eltern Preface Avoid reality at all costs — fortune(6) As the inclined reader will find out soon enough, this thesis is not about deeply involved mathematics as a mean in itself, but about how to apply mathematics to solve real-world problems. We will show how to shape, forge, and yield our tool of choice to rapidly answer questions of concern to people outside the world of mathematics. But there is more to it. Our tool of choice is software. This is not unusual, since it has become standard practice in science to use software as part of experiments and sometimes even for proofs. But in order to call an experiment scientific it must be reproducible. Is this the case?

