Results

**11 - 19**of**19**### Hash-Based Data Structures for Extreme Conditions

, 2008

"... This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both an ..."

Abstract
- Add to MetaCart

This thesis is about the design and analysis of Bloom filter and multiple choice hash table variants for application settings with extreme resource requirements. We employ a very flexible methodology, combining theoretical, numerical, and empirical techniques to obtain constructions that are both analyzable and practical. First, we show that a wide class of Bloom filter variants can be effectively implemented using very easily computable combinations of only two fully random hash functions. From a theoretical perspective, these results show that Bloom filters and related data structures can often be substantially derandomized with essentially no loss in performance. From a practical perspective, this derandomization allows for a significant speedup in certain query intensive applications. The rest of this work focuses on designing space-efficient, open-addressed, multiple choice hash tables for implementation in high-performance router hardware. Using multiple hash functions conserves space, but requires every hash table operation to consider multiple hash buckets, forcing a tradeoff between the slow speed of examining these buckets serially

### The Power of Simple Tabulation Hashing Mihai Pǎtras¸cu AT&T Labs

, 2011

"... Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates ba ..."

Abstract
- Add to MetaCart

Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates back to Carter and Wegman (STOC’77). Keys are viewed as consisting of c characters. We initialize c tables T1,..., Tc mapping characters to random hash codes. A key x = (x1,..., xq) is hashed to T1[x1] ⊕ · · · ⊕ Tc[xc], where ⊕ denotes xor. While this scheme is not even 4-independent, we show that it provides many of the guarantees that are normally obtained via higher independence, e.g., Chernoff-type concentration, min-wise hashing for estimating set intersection, and cuckoo hashing. An important target of the analysis of algorithms is to determine whether there exist practical schemes, which enjoy mathematical guarantees on performance. Hashing and hash tables are one of the most common inner loops in real-world computation, and are even built-in “unit cost ” operations in high level programming languages that offer associative arrays. Often,

### Cache-Oblivious Dictionaries and Multimaps with Negligible Failure Probability

"... Abstract. A dictionary (or map) is a key-value store that requires all keys be unique, and a multimap is a key-value store that allows for multiple values to be associated with the same key. We design hashing-based indexing schemes for dictionaries and multimaps that achieve worst-case optimal perfo ..."

Abstract
- Add to MetaCart

Abstract. A dictionary (or map) is a key-value store that requires all keys be unique, and a multimap is a key-value store that allows for multiple values to be associated with the same key. We design hashing-based indexing schemes for dictionaries and multimaps that achieve worst-case optimal performance for lookups and updates, with minimal space overhead and sub-polynomial probability that the data structure will require a rehash operation. Our dictionary structure is designed for the Random Access Machine (RAM) model, while our multimap implementation is designed for the cache-oblivious external memory (I/O) model. The failure probabilities for our structures are sub-polynomial, which can be useful in cryptographic or data-intensive applications. 1

### Hardness Preserving Reductions via Cuckoo Hashing

, 2012

"... A common method for increasing the usability and uplifting the security of pseudorandom function families (PRFs) is to “hash ” the inputs into a smaller domain before applying the PRF. This approach, known as “Levin’s trick”, is used to achieve “PRF domain extension ” (using a short, e.g., fixed, in ..."

Abstract
- Add to MetaCart

A common method for increasing the usability and uplifting the security of pseudorandom function families (PRFs) is to “hash ” the inputs into a smaller domain before applying the PRF. This approach, known as “Levin’s trick”, is used to achieve “PRF domain extension ” (using a short, e.g., fixed, input length PRF to get a variable-length PRF), and more recently to transform non-adaptive PRFs to adaptive ones. Such reductions, however, are vulnerable to a “birthday attack”: after √ |U | queries to the resulting PRF, where U being the hash function range, a collision (i.e., two distinct inputs have the same hash value) happens with high probability. As a consequence, the resulting PRF is insecure against an attacker making this number of queries. In this work we show how to go beyond the birthday attack barrier, by replacing the above simple hashing approach with a variant of cuckoo hashing — a hashing paradigm typically used for resolving hash collisions in a table, by using two hash functions and two tables, and cleverly assigning each element into one of the two tables. We use this approach to obtain: (i) A domain extension method that requires just two calls to the original PRF, can withstand as many queries as the original domain size and has a distinguishing probability that is exponentially small in the non cryptographic work. (ii) A security-preserving reduction from non-adaptive to adaptive PRFs. 1

### On Dynamic Range Reporting in One Dimension Christian Worm Mortensen ∗ IT U. Copenhagen

, 2005

"... We consider the problem of maintaining a dynamic set of integers and answering queries of the form: report a point (equivalently, all points) in a given interval. Range searching is a natural and fundamental variant of integer search, and can be solved using predecessor search. However, for a RAM wi ..."

Abstract
- Add to MetaCart

We consider the problem of maintaining a dynamic set of integers and answering queries of the form: report a point (equivalently, all points) in a given interval. Range searching is a natural and fundamental variant of integer search, and can be solved using predecessor search. However, for a RAM with w-bit words, we show how to perform updates in O(lg w) time and answer queries in O(lg lg w) time. The update time is identical to the van Emde Boas structure, but the query time is exponentially faster. Existing lower bounds show that achieving our query time for predecessor search requires doubly-exponentially slower updates. We present some arguments supporting the conjecture that our solution is optimal. Our solution is based on a new and interesting recursion idea which is “more extreme” that the van Emde Boas recursion. Whereas van Emde Boas uses a simple recursion (repeated halving) on each path in a trie, we use a nontrivial, van Emde Boas-like recursion on every such path. Despite this, our algorithm is quite clean when seen from the right angle. To achieve linear space for our data structure, we solve a problem which is of independent interest. We develop the first scheme for dynamic perfect hashing requiring sublinear space. This gives a dynamic Bloomier filter (an approximate storage scheme for sparse vectors) which uses low space. We strengthen previous lower bounds to show that these results are optimal. 1

### Tabulation Based 5-independent Hashing with Applications to Linear Probing and Second Moment Estimation ∗

"... In the framework of Carter and Wegman, a k-independent hash function maps any k keys independently. It is known that 5-independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic 5-independent hash function ..."

Abstract
- Add to MetaCart

In the framework of Carter and Wegman, a k-independent hash function maps any k keys independently. It is known that 5-independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic 5-independent hash function evaluates a degree 4 polynomial over a prime field containing the key domain[n] = {0,...,n−1}. Here we present an efficient 5-independent hash function that uses no multiplications. Instead, for any parameter c, we make 2c−1 lookups in tables of size O(n 1/c). In experiments on different computers, our scheme gained factors 1.8 to 10 in speed over the polynomial method. We also conducted experiments on the performance of hash functions inside the above applications. In particular, we give realistic examples of inputs that make the most popular 2-independent hash function perform quite poorly. This illustrates the advantage of using schemes with provably good expected performance for all inputs. 1 Introduction. We consider “k-independent hashing ” in the classic framework of Carter and Wegman [32]. For any i ≥ 1, let [i] = {0,1,...,i − 1}. We consider “hash ” functions from “keys ” in [n] to “hash values ” in [m]. A class H of hash functions is k-independent if for any distinct x0,...,xk−1 ∈ [n] and any possibly identical

### Pseudo-random graphs and bit probe schemes with one-sided error

, 2011

"... We study probabilistic bit-probe schemes for the membership problem. Given a set A of at most n elements from the universe of size m we organize such a structure that queries of type “x ∈ A? ” can be answered very quickly. H. Buhrman, P.B. Miltersen, J. Radhakrishnan, and S. Venkatesh proposed a bit ..."

Abstract
- Add to MetaCart

We study probabilistic bit-probe schemes for the membership problem. Given a set A of at most n elements from the universe of size m we organize such a structure that queries of type “x ∈ A? ” can be answered very quickly. H. Buhrman, P.B. Miltersen, J. Radhakrishnan, and S. Venkatesh proposed a bit-probe scheme based on expanders. Their scheme needs space of O(n log m) bits. The scheme has a randomized algorithm processing queries; it needs to read only one randomly chosen bit from the memory to answer a query. For every x the answer is correct with high probability (with two-sided errors). In this paper we show that for the same problem there exists a bitprobe scheme with one-sided error that needs space of O(n log 2 m + poly(log m)) bits. The difference with the model of Buhrman, Miltersen, Radhakrishnan, and Venkatesh is that we consider a bit-probe scheme

### urn:nbn:de:gbv:ilm1-2009000150 Dedicated to Life i Summary

"... The contribution of this thesis is a mathematical analysis a high performance hashing scheme called cuckoo hashing when combined with two very simple and efficient classes of functions that we refer to as the multiplicative class and the linear class, respectively. We prove that cuckoo hashing tends ..."

Abstract
- Add to MetaCart

The contribution of this thesis is a mathematical analysis a high performance hashing scheme called cuckoo hashing when combined with two very simple and efficient classes of functions that we refer to as the multiplicative class and the linear class, respectively. We prove that cuckoo hashing tends to work badly with these classes. In order to show this, we investigate how the inner structure of such functions influences the behavior of the cuckoo scheme when a set S of keys is inserted into initially empty tables. Cuckoo Hashing uses two tables of size m each. It is known that the insertion of an arbitrary set S of size n = (1 − δ)m for an arbitrary constant δ ∈ (0, 1) (which yields a load factor n/(2m) of up to 1/2) fails with probability O(1/n) if the hash functions are chosen from an Ω(log n)-wise independent class. This leads to the result of expected amortized constant time for a single insertion. In contrast to this we prove lower bounds of the following kind: If S is a uniformly random chosen set of size n = m/2 (leading to a load factor of only 1/4 (!)) then the insertion of S fails with probability Ω(1), or even with probability 1 − o(1), if