Results 1 - 10
of
24
Hash-Based Techniques for High-Speed Packet Processing
"... Abstract Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little bac ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract Hashing is an extremely useful technique for a variety of high-speed packet-processing applications in routers. In this chapter, we survey much of the recent work in this area, paying particular attention to the interaction between theoretical and applied research. We assume very little background in either the theory or applications of hashing, reviewing the fundamentals as necessary. 1
History-Independent Cuckoo Hashing
"... Cuckoo hashing is an efficient and practical dynamic dictionary. It provides expected amortized constant update time, worst case constant lookup time, and good memory utilization. Various experiments demonstrated that cuckoo hashing is highly suitable for modern computer architectures and distribute ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Cuckoo hashing is an efficient and practical dynamic dictionary. It provides expected amortized constant update time, worst case constant lookup time, and good memory utilization. Various experiments demonstrated that cuckoo hashing is highly suitable for modern computer architectures and distributed settings, and offers significant improvements compared to other schemes. In this work we construct a practical history-independent dynamic dictionary based on cuckoo hashing. In a history-independent data structure, the memory representation at any point in time yields no information on the specific sequence of insertions and deletions that led to its current content, other than the content itself. Such a property is significant when preventing unintended leakage of information, and was also found useful in several algorithmic settings. Our construction enjoys most of the attractive properties of cuckoo hashing. In particular, no dynamic memory allocation is required, updates are performed in expected amortized constant time, and membership queries are performed in worst case constant time. Moreover, with high probability, the lookup procedure queries only two memory entries which are independent and can be queried in parallel. The approach underlying our construction is to enforce a canonical memory representation on cuckoo hashing. That is, up to the initial randomness, each set of elements has a unique memory representation.
Dynamic external hashing: The limit of buffering
- In Proc. ACM Symposium on Parallelism in Algorithms and Architectures
, 2009
"... Hash tables are one of the most fundamental data structures in computer science, in both theory and practice. They are especially useful in external memory, where their query performance approaches the ideal cost of just one disk access. Knuth [16] gave an elegant analysis showing that with some sim ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Hash tables are one of the most fundamental data structures in computer science, in both theory and practice. They are especially useful in external memory, where their query performance approaches the ideal cost of just one disk access. Knuth [16] gave an elegant analysis showing that with some simple collision resolution strategies such as linear probing or chaining, the expected average number of disk I/Os of a lookup is merely 1 + 1/2 Ω(b) , where each I/O can read and/or write a disk block containing b items. Inserting a new item into the hash table also costs 1 + 1/2 Ω(b) I/Os, which is again almost the best one can do if the hash table is entirely stored on disk. However, this requirement is unrealistic since any algorithm operating on an external hash table must have some internal memory (at least Ω(1) blocks) to work with. The availability of a small internal memory buffer can dramatically reduce the amortized insertion cost to o(1) I/Os for many external memory data structures. In this paper we study the inherent query-insertion tradeoff of external hash tables in the presence of a memory buffer. In particular, we show that for any constant c> 1, if the expected average successful query cost is targeted at 1 + O(1/b c) I/Os, then it is not possible to support insertions in less than 1 − O(1/b c−1 6) I/Os amortized, which means that the memory buffer is essentially useless. While if the query cost is relaxed to 1 + O(1/b c) I/Os for any constant c < 1, there is a simple dynamic hash table with o(1) insertion cost. Categories and Subject Descriptors F.2.3 [Analysis of algorithms and problem complexity]: Tradeoffs between complexity measures; E.2 [Data storage]: hash-table representations
Maximum matchings in random bipartite graphs and the space utilization of cuckoo hashtables
, 2009
"... We study the the following question in Random Graphs. We are given two disjoint sets L, R with |L | = n = αm and |R | = m. We construct a random graph G by allowing each x ∈ L to choose d random neighbours in R. The question discussed is as to the size µ(G) of the largest matching in G. When consi ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We study the the following question in Random Graphs. We are given two disjoint sets L, R with |L | = n = αm and |R | = m. We construct a random graph G by allowing each x ∈ L to choose d random neighbours in R. The question discussed is as to the size µ(G) of the largest matching in G. When considered in the context of Cuckoo Hashing, one key question is as to when is µ(G) = n whp? We answer this question exactly when d is at least three. We also establish a precise threshold for when Phase 1 of the Karp-Sipser Greedy matching algorithm suffices to compute a maximum matching whp.
String hashing for linear probing
- In Proc. 20th SODA
, 2009
"... Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC’07, Pagh et al. presented data sets where t ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Linear probing is one of the most popular implementations of dynamic hash tables storing all keys in a single array. When we get a key, we first hash it to a location. Next we probe consecutive locations until the key or an empty location is found. At STOC’07, Pagh et al. presented data sets where the standard implementation of 2-universal hashing leads to an expected number of Ω(log n) probes. They also showed that with 5-universal hashing, the expected number of probes is constant. Unfortunately, we do not have 5-universal hashing for, say, variable length strings. When we want to do such complex hashing from a complex domain, the generic standard solution is that we first do collision free hashing (w.h.p.) into a simpler intermediate domain, and second do the complicated hash function on this intermediate domain. Our contribution is that for an expected constant number of linear probes, it is suffices that each key has O(1) expected collisions with the first hash function, as long as the second hash function is 5-universal. This means that the intermediate domain can be n times smaller, and such a smaller intermediate domain typically means that the overall hash function can be made simpler and at least twice as fast. The same doubling of hashing speed for O(1) expected probes follows for most domains bigger than 32-bit integers, e.g., 64-bit integers and fixed length strings. In addition, we study how the overhead from linear probing diminishes as the array gets larger, and what happens if strings are stored directly as intervals of the array. These cases were not considered by Pagh et al. 1
On the Cell Probe Complexity of Dynamic Membership
"... We study the dynamic membership problem, one of the most fundamental data structure problems, in the cell probe model with an arbitrary cell size. We consider a cell probe model equipped with a cache that consists of at least a constant number of cells; reading or writing the cache is free of charge ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We study the dynamic membership problem, one of the most fundamental data structure problems, in the cell probe model with an arbitrary cell size. We consider a cell probe model equipped with a cache that consists of at least a constant number of cells; reading or writing the cache is free of charge. For nearly all common data structures, it is known that with sufficiently large cells together with the cache, we can significantly lower the amortized update cost to o(1). In this paper, we show that this is not the case for the dynamic membership problem. Specifically, for any deterministic membership data structure under a random input sequence, if the expected average query cost is no more than 1+δ for some small constant δ, we prove that the expected amortized update cost must be at least Ω(1), namely, it does not benefit from large block writes (and a cache). The space the structure uses is irrelevant to this lower bound. We also extend this lower bound to randomized membership structures, by using a variant of Yao’s minimax principle. Finally, we show that the structure cannot do better even if it is allowed to answer a query mistakenly with a small constant probability. 1
Tabulation Based 5-Universal Hashing and Linear Probing
"... Previously [SODA’04] we devised the fastest known algorithm for 4-universal hashing. The hashing was based on small pre-computed4-universal tables. This led to a five-fold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the pre-computed tables a ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Previously [SODA’04] we devised the fastest known algorithm for 4-universal hashing. The hashing was based on small pre-computed4-universal tables. This led to a five-fold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the pre-computed tables are made 5-universal, then the hash value becomes 5-universal without any other change to the computation. Relatively this leads to even bigger gains since the direct methods for 5-universal hashing use degree 4 polynomials. Experimentally, we find that our method can gain up to an order of magnitude in speed over direct 5-universal hashing. Some of the most popular randomized algorithms have been proved to have the desired expected running time using
An Analysis of Random-Walk Cuckoo Hashing
"... In this paper, we provide a polylogarithmic bound that holds with high probability on the insertion time for cuckoo hashing under the random-walk insertion method. Cuckoo hashing provides a useful methodology for building practical, high-performance hash tables. The essential idea of cuckoo hashing ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper, we provide a polylogarithmic bound that holds with high probability on the insertion time for cuckoo hashing under the random-walk insertion method. Cuckoo hashing provides a useful methodology for building practical, high-performance hash tables. The essential idea of cuckoo hashing is to combine the power of schemes that allow multiple hash locations for an item with the power to dynamically change the location of an item among its possible locations. Previous work on the case where the number of choices is larger than two has required a breadth-first search analysis, which is both inefficient in practice and currently has only a polynomial high probability upper bound on the insertion time. Here we significantly advance the state of the art by proving a polylogarithmic bound on the more efficient randomwalk method, where items repeatedly kick out random blocking items until a free location for an item is found. 1
Some Open Questions Related to Cuckoo Hashing
"... Abstract. The purpose of this brief note is to describe recent work in the area of cuckoo hashing, including a clear description of several open problems, with the hope of spurring further research. 1 ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. The purpose of this brief note is to describe recent work in the area of cuckoo hashing, including a clear description of several open problems, with the hope of spurring further research. 1
Private Search in the Real World
"... Encrypted search — performing queries on protected data — has been explored in the past; however, its inherent inefficiency has raised questions of practicality. Here, we focus on improving the performance and extending its functionality enough to make it practical. We do this by optimizing the syst ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Encrypted search — performing queries on protected data — has been explored in the past; however, its inherent inefficiency has raised questions of practicality. Here, we focus on improving the performance and extending its functionality enough to make it practical. We do this by optimizing the system, and by stepping back from the goal of achieving maximal privacy guarantees in an encrypted search scenario and consider efficiency and functionality as priorities. We design and analyze the privacy implications of two practical extensions applicable to any keyword-based private search system. We evaluate their efficiency by building them on top of a private search system, called SADS. Additionally, we improve SADS ’ performance, privacy guaranties and functionality. The extended SADS system offers improved efficiency parameters that meet practical usability requirements in a relaxed adversarial model. We present the experimental results and evaluate the performance of the system. We also demonstrate analytically that our scheme can meet the basic needs of a major hospital complex’s admissions records. Overall, we achieve performance comparable to a simply configured MySQL database system. 1.

