Results 11 - 20
of
24
Optimal fast hashing
- In 28th IEEE International Conference on Computer Communications (INFOCOM
, 2009
"... Abstract—This paper is about designing optimal highthroughput hashing schemes that minimize the total number of memory accesses needed to build and access an hash table. Recent schemes often promote the use of multiple-choice hashing. However, such a choice also implies a significant increase in the ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Abstract—This paper is about designing optimal highthroughput hashing schemes that minimize the total number of memory accesses needed to build and access an hash table. Recent schemes often promote the use of multiple-choice hashing. However, such a choice also implies a significant increase in the number of memory accesses to the hash table, which translates into higher power consumption and lower throughput. In this paper, we propose to only use choice when needed. Given some target hash table overflow rate, we provide a lower bound on the total number of needed memory accesses. Then, we design and analyze schemes that provably achieve this lower bound over a large range of target overflow values. Further, for the multilevel hash table scheme, we prove that the optimum occurs when its subtable sizes decrease in a geometric way, thus formally confirming a heuristic rule-of-thumb. A. Background I.
Maximum matchings in random bipartite graphs and the space utilization of cuckoo hashtables
, 2009
"... We study the the following question in Random Graphs. We are given two disjoint sets L, R with |L | = n = αm and |R | = m. We construct a random graph G by allowing each x ∈ L to choose d random neighbours in R. The question discussed is as to the size µ(G) of the largest matching in G. When consi ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We study the the following question in Random Graphs. We are given two disjoint sets L, R with |L | = n = αm and |R | = m. We construct a random graph G by allowing each x ∈ L to choose d random neighbours in R. The question discussed is as to the size µ(G) of the largest matching in G. When considered in the context of Cuckoo Hashing, one key question is as to when is µ(G) = n whp? We answer this question exactly when d is at least three. We also establish a precise threshold for when Phase 1 of the Karp-Sipser Greedy matching algorithm suffices to compute a maximum matching whp.
De-amortized Cuckoo Hashing: Provable Worst-Case Performance and Experimental Results
"... Cuckoo hashing is a highly practical dynamic dictionary: it provides amortized constant insertion time, worst case constant deletion time and lookup time, and good memory utilization. However, with a noticeable probability during the insertion of n elements some insertion requires Ω(log n) time. Whe ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Cuckoo hashing is a highly practical dynamic dictionary: it provides amortized constant insertion time, worst case constant deletion time and lookup time, and good memory utilization. However, with a noticeable probability during the insertion of n elements some insertion requires Ω(log n) time. Whereas such an amortized guarantee may be suitable for some applications, in other applications (such as high-performance routing) this is highly undesirable. Kirsch and Mitzenmacher (Allerton ’07) proposed a de-amortization of cuckoo hashing using queueing techniques that preserve its attractive properties. They demonstrated a significant improvement to the worst case performance of cuckoo hashing via experimental results, but left open the problem of constructing a scheme with provable properties. In this work we present a de-amortization of cuckoo hashing that provably guarantees constant worst case operations. Specifically, for any sequence of polynomially many operations, with overwhelming probability over the randomness of the initialization phase, each operation is performed in constant time. In addition, we present a general approach for proving that the performance guarantees are preserved when using hash functions with limited independence
Compact Data Structures with Fast Queries
, 2005
"... Many applications dealing with large data structures can benefit from keeping them in compressed form. Compression has many benefits: it can allow a representation to fit in main memory rather than swapping out to disk, and it improves cache performance since it allows more data to fit into the c ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Many applications dealing with large data structures can benefit from keeping them in compressed form. Compression has many benefits: it can allow a representation to fit in main memory rather than swapping out to disk, and it improves cache performance since it allows more data to fit into the cache. However, a data structure is only useful if it allows the application to perform fast queries (and updates) to the data.
Compact Dictionaries for Variable-Length Keys and Data, with Applications
, 2007
"... We consider the problem of maintaining a dynamic dictionary T of keys and associated data for which both the keys and data are bit strings that can vary in length from zero up to the length w of a machine word. We present a data structure for this variable-bit-length dictionary problem that supports ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We consider the problem of maintaining a dynamic dictionary T of keys and associated data for which both the keys and data are bit strings that can vary in length from zero up to the length w of a machine word. We present a data structure for this variable-bit-length dictionary problem that supports constant time lookup and expected amortized constant time insertion and deletion. It uses O(m + 3n − n log 2 n) bits, where n is the number of elements in T, and m is the total number of bits across all strings in T (keys and data). Our dictionary uses an array A[1... n] in which locations store variable-bit-length strings. We present a data structure for this variable-bit-length array problem that supports worst-case constant-time lookups and updates and uses O(m + n) bits, where m is the total number of bits across all strings stored in A. The motivation for these structures is to support applications for which it is helpful to efficiently store short varying length bit strings. We present several applications, including representations for semi-dynamic graphs, order queries on integers sets, cardinal trees with varying cardinality, and simplicial meshes of d dimensions. These results either generalize or simplify previous results.
An Analysis of Random-Walk Cuckoo Hashing
"... In this paper, we provide a polylogarithmic bound that holds with high probability on the insertion time for cuckoo hashing under the random-walk insertion method. Cuckoo hashing provides a useful methodology for building practical, high-performance hash tables. The essential idea of cuckoo hashing ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper, we provide a polylogarithmic bound that holds with high probability on the insertion time for cuckoo hashing under the random-walk insertion method. Cuckoo hashing provides a useful methodology for building practical, high-performance hash tables. The essential idea of cuckoo hashing is to combine the power of schemes that allow multiple hash locations for an item with the power to dynamically change the location of an item among its possible locations. Previous work on the case where the number of choices is larger than two has required a breadth-first search analysis, which is both inefficient in practice and currently has only a polynomial high probability upper bound on the insertion time. Here we significantly advance the state of the art by proving a polylogarithmic bound on the more efficient randomwalk method, where items repeatedly kick out random blocking items until a free location for an item is found. 1
Incremental hashing in state space search
- In Workshop ”New Results in Planning, Scheduling and Design
, 2004
"... Abstract. State memorization is essential for state-space search to avoid redundant expansions and hashing serves as a method to, address store and retrieve states efficiently. In this paper we introduce incremental state hashing to compute hash values in constant time. The method will be most effec ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. State memorization is essential for state-space search to avoid redundant expansions and hashing serves as a method to, address store and retrieve states efficiently. In this paper we introduce incremental state hashing to compute hash values in constant time. The method will be most effective in guided depth-first search traversals of state space graphs, like in IDA*, where the computation of the set of successors and their heuristic estimates is extremely fast: heuristic values are often computed incrementally or retrieved from pre-computed pattern database tables, and backtracking keeps the changes in the state representation vector during the exploration small. The approach quickly decides if a given state is not present in a hash table, and accelerates successful search. It can further accelerate perfect hashing for pattern storage and look-up. If, for a better coverage of the state space, partial search methods without collision resolving is used, we establish another benefit for incremental state hashing. We exemplify our considerations in the (n 2 − 1)-Puzzle, in action planning, and conduct experiments in Atomix. 1
Balanced Allocations: Balls-into-Bins Revisited and Chains-into-Bins
, 2008
"... The study of balls-into-bins games or occupancy problems has a long history since these processes can be used to translate realistic problems into mathematical ones in a natural way. In general, the goal of a balls-into-bins game is to allocate a set of independent objects (tasks, jobs, balls) to a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The study of balls-into-bins games or occupancy problems has a long history since these processes can be used to translate realistic problems into mathematical ones in a natural way. In general, the goal of a balls-into-bins game is to allocate a set of independent objects (tasks, jobs, balls) to a set of resources (servers, bins, urns) and, thereby, to minimize the maximum load. In this paper we show two results. First, we analyse the maximum load for the chains-into-bins problem where we have n bins and the balls are connected in n/ℓ chains of length ℓ. In this process, the balls of one chain have to be allocated to ℓ consecutive bins. We allow each chain d i.u.r. bin choices. The chain is allocated using the rule that the maximum load of any bin receiving a ball of that chain is minimized. We show that, for d ≥ 2, the maximum load is (ln ln(n/ℓ))/ln d + O(1) with probability 1 − O(1 / lnln(n/ℓ)). This shows that the maximum load is decreasing with increasing chain length. Secondly, we analyse for which number of random choices d and which number of balls m < n, the maximum load of an off-line assignment can be upper bounded by one. This holds, for example, for m < 0.97677 · n and d = 4.
Some Open Questions Related to Cuckoo Hashing
"... Abstract. The purpose of this brief note is to describe recent work in the area of cuckoo hashing, including a clear description of several open problems, with the hope of spurring further research. 1 ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. The purpose of this brief note is to describe recent work in the area of cuckoo hashing, including a clear description of several open problems, with the hope of spurring further research. 1
Building BLAST for Coprocessor Accelerators Using Macah
, 2008
"... The problem of detecting similarities between different genetic sequences is fundamental to many research pursuits in biology and genetics. BLAST (Basic Local Alignment and Search Tool) is the most commonly used tool for identifying and assessing the significance of such similarities. With the quant ..."
Abstract
- Add to MetaCart
The problem of detecting similarities between different genetic sequences is fundamental to many research pursuits in biology and genetics. BLAST (Basic Local Alignment and Search Tool) is the most commonly used tool for identifying and assessing the significance of such similarities. With the quantity of available genetic sequence data rapidly increasing, improving the performance of the BLAST algorithm is a problem of great interest. BLAST compares a single query sequence against a database of known sequences, employing a heuristic algorithm that consists of three stages arranged in a pipeline, such that the output of one stage feeds into the input of the next stage. Several recent studies have successfully investigated the use of Field-Programmable Gate Arrays (FPGAs) to accelerate the execution of the BLAST algorithm, focusing on the
first and second stages, which account for the vast majority of the algorithms execution time. While these results are encouraging, translating algorithms like
BLAST that contain somewhat complex and unpredictable control flow and data access patterns into versions suitable for implementation on coprocessor accelerators like FPGAs turns out to be quite difficult using currently available tools. Such architectures are usually programmed using Hardware Description Languages (HDLs), which are significantly more difficult to learn and use than
standard programming languages. In this paper, an accelerated version of the BLAST algorithm is presented, written in a new language called Macah, which
is designed to make the task of programming coprocessor accelerators easier for programmers familiar with the widely-known C language.

