Results 1  10
of
16
Tight thresholds for cuckoo hashing via XORSAT
, 2010
"... We settle the question of tight thresholds for offline cuckoo hashing. The problem can be stated as follows: we have n keys to be hashed into m buckets each capable of holding a single key. Each key has k ≥ 3 (distinct) associated buckets chosen uniformly at random and independently of the choices ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
We settle the question of tight thresholds for offline cuckoo hashing. The problem can be stated as follows: we have n keys to be hashed into m buckets each capable of holding a single key. Each key has k ≥ 3 (distinct) associated buckets chosen uniformly at random and independently of the choices of other keys. A hash table can be constructed successfully if each key can be placed into one of its buckets. We seek thresholds ck such that, as n goes to infinity, if n/m ≤ c for some c < ck then a hash table can be constructed successfully with high probability, and if n/m ≥ c for some c> ck a hash table cannot be constructed successfully with high probability. Here we are considering the offline version of the problem, where all keys and hash values are given, so the problem is equivalent to previous models of multiplechoice hashing. We find the thresholds for all values of k> 2 by showing that they are in fact the same as the previously known thresholds for the random kXORSAT problem. We then extend these results to the setting where keys can have differing number of choices, and provide evidence in the form of an algorithm for a conjecture extending this result to cuckoo hash tables that store multiple keys in a bucket.
Monotone Minimal Perfect Hashing: Searching a Sorted Table with O(1) Accesses
"... A minimal perfect hash function maps a set S of n keys into the set { 0, 1,..., n − 1} bijectively. Classical results state that minimal perfect hashing is possible in constant time using a structure occupying space close to the lower bound of log e bits per element. Here we consider the problem of ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
(Show Context)
A minimal perfect hash function maps a set S of n keys into the set { 0, 1,..., n − 1} bijectively. Classical results state that minimal perfect hashing is possible in constant time using a structure occupying space close to the lower bound of log e bits per element. Here we consider the problem of monotone minimal perfect hashing, in which the bijection is required to preserve the lexicographical ordering of the keys. A monotone minimal perfect hash function can be seen as a very weak form of index that provides ranking just on the set S (and answers randomly outside of S). Our goal is to minimise the description size of the hash function: we show that, for a set S of n elements out of a universe of 2 w elements, O(n log log w) bits are sufficient to hash monotonically with evaluation time O(log w). Alternatively, we can get space O(n log w) bits with O(1) query time. Both of these data structures improve a straightforward construction with O(n log w) space and O(log w) query time. As a consequence, it is possible to search a sorted table with O(1) accesses to the table (using additional O(n log log w) bits). Our results are based on a structure (of independent interest) that represents a trie in a very compact way, but admits errors. As a further application of the same structure, we show how to compute the predecessor (in the sorted order of S) of an arbitrary element, using O(1) accesses in expectation and an index of O(n log w) bits, improving the trivial result of O(nw) bits. This implies an efficient index for searching a blocked memory.
Theory and Practise of Monotone Minimal Perfect Hashing
"... Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, orderpreserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given orde ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
(Show Context)
Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, orderpreserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable �(n log n) lower bound on the number of bits required to store the function. Recently, it was observed [1] that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. We refer to this restricted version of the problem as monotone minimal perfect hashing. We analyse experimentally the data structures proposed in [1], and along our way we propose some new methods that, albeit asymptotically equivalent or worse, perform very well in practise, and provide a balance between access speed, ease of construction, and space usage. 1
Bloomier filters: A second look
 Algorithms  ESA 2008, 16th Annual European Symposium
"... Abstract. A Bloom filter is a space efficient structure for storing static sets, where the space efficiency is gained at the expense of a small probability of falsepositives. A Bloomier filter generalizes a Bloom filter to compactly store a function with a static support. In this article we give a ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract. A Bloom filter is a space efficient structure for storing static sets, where the space efficiency is gained at the expense of a small probability of falsepositives. A Bloomier filter generalizes a Bloom filter to compactly store a function with a static support. In this article we give a simple construction of a Bloomier filter. The construction is linear in space and requires constant time to evaluate. The creation of our Bloomier filter takes linear time which is faster than the existing construction. We show how one can improve the space utilization further at the cost of increasing the time for creating the data structure. 1
Some Open Questions Related to Cuckoo Hashing
"... Abstract. The purpose of this brief note is to describe recent work in the area of cuckoo hashing, including a clear description of several open problems, with the hope of spurring further research. 1 ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract. The purpose of this brief note is to describe recent work in the area of cuckoo hashing, including a clear description of several open problems, with the hope of spurring further research. 1
Don’t Rush into a Union: Take Time to Find Your Roots
, 2011
"... We present a new threshold phenomenon in data structure lower bounds where slightly reduced update times lead to exploding query times. Consider incremental connectivity, letting tU be the time to insert an edge and tq be the query time. For tU = Ω(tq), the problem is equivalent to the wellundersto ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We present a new threshold phenomenon in data structure lower bounds where slightly reduced update times lead to exploding query times. Consider incremental connectivity, letting tU be the time to insert an edge and tq be the query time. For tU = Ω(tq), the problem is equivalent to the wellunderstood union–find problem: INSERTEDGE(s, t) can be implemented by UNION(FIND(s), FIND(t)). This gives worstcase time tU = tq = O(lg n / lg lg n) and amortized tU = tq = O(α(n)). By contrast, we show that if tU = o(lg n / lg lg n), the query time explodes to tq ≥ n 1−o(1). In other words, if the data structure doesn’t have time to find the roots of each disjoint set (tree) during edge insertion, there is no effective way to organize the information! For amortized complexity, we demonstrate a new inverseAckermann type tradeoff in the regime tU = o(tq). A similar lower bound is given for fully dynamic connectivity, where an update time of o(lg n) forces the query time to be n 1−o(1). This lower bound allows for amortization and Las Vegas randomization, and comes close to the known O(lg n · (lg lg n) O(1) ) upper bound. 1
Fast Prefix Search in Little Space, with Applications
"... Abstract. It has been shown in the indexing literature that there is an essential difference between prefix/range searches on the one hand, and predecessor/rank searches on the other hand, in that the former provably allows faster query resolution. Traditionally, prefix search is solved by data stru ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. It has been shown in the indexing literature that there is an essential difference between prefix/range searches on the one hand, and predecessor/rank searches on the other hand, in that the former provably allows faster query resolution. Traditionally, prefix search is solved by data structures that are also dictionaries—they actually contain the strings in S. For very large collections stored in slowaccess memory, we propose much more compact data structures that support weak prefix searches—they return the ranks of matching strings provided that some string in S starts with the given prefix. In fact, we show that our most spaceefficient data structure is asymptotically spaceoptimal. Previously, data structures such as String Btrees (and more complicated cacheoblivious string data structures) have implicitly supported weak prefix queries, but they all have query time that grows logarithmically with the size of the string collection. In contrast, our data structures are simple, naturally cacheefficient, and have query time that depends only on the length of the prefix, all the way down to constant query time for strings that fit in one machine word. We give several applications of weak prefix searches, including exact prefix counting and approximate counting of tuples matching conjunctive prefix conditions. 1
Balls and Bins: Smaller Hash Families and Faster Evaluation
, 2012
"... A fundamental fact in the analysis of randomized algorithms is that when n balls are hashed into n bins independently and uniformly at random, with high probability each bin contains at most O(log n / log log n) balls. In various applications, however, the assumption that a truly random hash functio ..."
Abstract
 Add to MetaCart
A fundamental fact in the analysis of randomized algorithms is that when n balls are hashed into n bins independently and uniformly at random, with high probability each bin contains at most O(log n / log log n) balls. In various applications, however, the assumption that a truly random hash function is available is not always valid, and explicit functions are required. In this paper we study the size of families (or, equivalently, the description length of their functions) that guarantee a maximal load of O(log n / log log n) with high probability, as well as the evaluation time of their functions. Whereas such functions must be described using Ω(log n) bits, the best upper bound was formerly O(log 2 n / log log n) bits, which is attained by O(log n / log log n)wise independent functions. Traditional constructions of the latter offer an evaluation time of O(log n / log log n), which according to Siegel’s lower bound [FOCS ’89] can be reduced only at the cost of significantly increasing the description length. We construct two families that guarantee a maximal load of O(log n / log log n) with high probability. Our constructions are based on two different approaches, and exhibit different tradeoffs between the description length and the evaluation time. The first construction shows
© 20YY ACM 00000000/20YY/00000002 $5.00Theory and Practice of Monotone Minimal Perfect Hashing
"... supported by the MIUR PRIN projects “Mathematical aspects and forthcoming applications of automata and formal languages ” and “Grafi del web e ranking”, and by a Yahoo! Faculty Grant. Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provi ..."
Abstract
 Add to MetaCart
supported by the MIUR PRIN projects “Mathematical aspects and forthcoming applications of automata and formal languages ” and “Grafi del web e ranking”, and by a Yahoo! Faculty Grant. Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and
Memory Efficient Sanitization of a Deduplicated Storage System
"... Sanitization is the process of securely erasing sensitive data from a storage system, effectively restoring the system to a state as if the sensitive data had never been stored. Depending on the threat model, sanitization could require erasing all unreferenced blocks. This is particularly challengin ..."
Abstract
 Add to MetaCart
(Show Context)
Sanitization is the process of securely erasing sensitive data from a storage system, effectively restoring the system to a state as if the sensitive data had never been stored. Depending on the threat model, sanitization could require erasing all unreferenced blocks. This is particularly challenging in deduplicated storage systems because each piece of data on the physical media could be referred to by multiple namespace objects. For large storage systems, where available memory is a small fraction of storage capacity, standard techniques for tracking data references will not fit in memory, and we discuss multiple sanitization techniques that tradeoff I/O and memory requirements. We have three key contributions. First, we provide an understanding of the threat model and what is required to sanitize a deduplicated storage system as compared to a device. Second, we have designed a memory efficient algorithm using perfect hashing that only requires from 2.54 to 2.87 bits per reference (98 % savings) while minimizing the amount of I/O. Third, we present a complete sanitization design for EMC Data Domain. 1