Results 1  10
of
27
Cuckoo hashing
 Journal of Algorithms
, 2001
"... We present a simple dictionary with worst case constant lookup time, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al. (Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4):738–761, 1994). The space usage is similar to that ..."
Abstract

Cited by 124 (6 self)
 Add to MetaCart
We present a simple dictionary with worst case constant lookup time, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al. (Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4):738–761, 1994). The space usage is similar to that of binary search trees, i.e., three words per key on average. Besides being conceptually much simpler than previous dynamic dictionaries with worst case constant lookup time, our data structure is interesting in that it does not use perfect hashing, but rather a variant of open addressing where keys can be moved back in their probe sequences. An implementation inspired by our algorithm, but using weaker hash functions, is found to be quite practical. It is competitive with the best known dictionaries having an average case (but no nontrivial worst case) guarantee. Key Words: data structures, dictionaries, information retrieval, searching, hashing, experiments * Partially supported by the Future and Emerging Technologies programme of the EU
Timespace tradeoffs for predecessor search
 In Proc. 38th ACM Sympos. Theory Comput
, 2006
"... We develop a new technique for proving cellprobe lower bounds for static data structures. Previous lower bounds used a reduction to communication games, which was known not to be tight by counting arguments. We give the first lower bound for an explicit problem which breaks this communication compl ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
We develop a new technique for proving cellprobe lower bounds for static data structures. Previous lower bounds used a reduction to communication games, which was known not to be tight by counting arguments. We give the first lower bound for an explicit problem which breaks this communication complexity barrier. In addition, our bounds give the first separation between polynomial and near linear space. Such a separation is inherently impossible by communication complexity. Using our lower bound technique and new upper bound constructions, we obtain tight bounds for searching predecessors among a static set of integers. Given a set Y of n integers of ℓ bits each, the goal is to efficiently find predecessor(x) = max {y ∈ Y  y ≤ x}. For this purpose, we represent Y on a RAM with word length w using S words of space. Defining a = lg S n +lg w, we show that the optimal search time is, up to constant factors: logw n lg min ℓ−lg n
Fast moment estimation in data streams in optimal space
 In Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC
, 2011
"... We give a spaceoptimal algorithm with update time O(log 2 (1/ε) log log(1/ε)) for (1 ± ε)approximating the pth frequency moment, 0 < p < 2, of a lengthn vector updated in a data stream. This provides a nearly exponential improvement in the update time complexity over the previous spaceoptimal alg ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
We give a spaceoptimal algorithm with update time O(log 2 (1/ε) log log(1/ε)) for (1 ± ε)approximating the pth frequency moment, 0 < p < 2, of a lengthn vector updated in a data stream. This provides a nearly exponential improvement in the update time complexity over the previous spaceoptimal algorithm of [KaneNelsonWoodruff, SODA 2010], which had update time Ω(1/ε 2). 1
On the exact space complexity of sketching and streaming small norms
 In SODA
, 2010
"... We settle the 1pass space complexity of (1 ± ε)approximating the Lp norm, for real p with 1 ≤ p ≤ 2, of a lengthn vector updated in a lengthm stream with updates to its coordinates. We assume the updates are integers in the range [−M, M]. In particular, we show the space required is Θ(ε −2 log(mM ..."
Abstract

Cited by 18 (10 self)
 Add to MetaCart
We settle the 1pass space complexity of (1 ± ε)approximating the Lp norm, for real p with 1 ≤ p ≤ 2, of a lengthn vector updated in a lengthm stream with updates to its coordinates. We assume the updates are integers in the range [−M, M]. In particular, we show the space required is Θ(ε −2 log(mM) + log log(n)) bits. Our result also holds for 0 < p < 1; although Lp is not a norm in this case, it remains a welldefined function. Our upper bound improves upon previous algorithms of [Indyk, JACM ’06] and [Li, SODA ’08]. This improvement comes from showing an improved derandomization of the Lp sketch of Indyk by using kwise independence for small k, as opposed to using the heavy hammer of a generic pseudorandom generator against spacebounded computation such as Nisan’s PRG. Our lower bound improves upon previous work of [AlonMatiasSzegedy, JCSS ’99] and [Woodruff, SODA ’04], and is based on showing a direct sum property for the 1way communication of the gapHamming problem. 1
Balanced families of perfect hash functions and their applications
 Proc. ICALP
, 2007
"... Abstract. The construction of perfect hash functions is a wellstudied topic. In this paper, this concept is generalized with the following definition. We say that a family of functions from [n] to[k] isaδbalanced (n, k)family of perfect hash functions if for every S ⊆ [n], S  = k, the number o ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Abstract. The construction of perfect hash functions is a wellstudied topic. In this paper, this concept is generalized with the following definition. We say that a family of functions from [n] to[k] isaδbalanced (n, k)family of perfect hash functions if for every S ⊆ [n], S  = k, the number of functions that are 11 on S is between T/δ and δT for some constant T>0. The standard definition of a family of perfect hash functions requires that there will be at least one function that is 11 on S,for each S of size k. In the new notion of balanced families, we require the number of 11 functions to be almost the same (taking δ to be close to 1) for every such S. Our main result is that for any constant δ>1, a δbalanced (n, k)family of perfect hash functions of size 2 O(k log log k) log n can be constructed in time 2 O(k log log k) nlog n. Using the technique of colorcoding we can apply our explicit constructions to devise approximation algorithms for various counting problems in graphs. In particular, we exhibit a deterministic polynomial time algorithm for approximating both the number of simple paths of length k and the number of simple log n cycles of size k for any k ≤ O() in a graph with n vertices. The log log log n approximation is up to any fixed desirable relative error.
Faster Deterministic Dictionaries
 In 11 th Annual ACM Symposium on Discrete Algorithms (SODA
, 1999
"... We consider static dictionaries over the universe U = on a unitcost RAM with word size w. Construction of a static dictionary with linear space consumption and constant lookup time can be done in linear expected time by a randomized algorithm. In contrast, the best previous deterministic a ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
We consider static dictionaries over the universe U = on a unitcost RAM with word size w. Construction of a static dictionary with linear space consumption and constant lookup time can be done in linear expected time by a randomized algorithm. In contrast, the best previous deterministic algorithm for constructing such a dictionary with n elements runs in time O(n ) for # > 0. This paper narrows the gap between deterministic and randomized algorithms exponentially, from the factor of to an O(log n) factor. The algorithm is weakly nonuniform, i.e. requires certain precomputed constants dependent on w. A byproduct of the result is a lookup time vs insertion time tradeo# for dynamic dictionaries, which is optimal for a certain class of deterministic hashing schemes.
Hash and displace: Efficient evaluation of minimal perfect hash functions
 In Workshop on Algorithms and Data Structures
, 1999
"... A new way of constructing (minimal) perfect hash functions is described. The technique considerably reduces the overhead associated with resolving buckets in twolevel hashing schemes. Evaluating a hash function requires just one multiplication and a few additions apart from primitive bit operations ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
A new way of constructing (minimal) perfect hash functions is described. The technique considerably reduces the overhead associated with resolving buckets in twolevel hashing schemes. Evaluating a hash function requires just one multiplication and a few additions apart from primitive bit operations. The number of accesses to memory is two, one of which is to a fixed location. This improves the probe performance of previous minimal perfect hashing schemes, and is shown to be optimal. The hash function description (“program”) for a set of size n occupies O(n) words, and can be constructed in expected O(n) time. 1
Optimal cacheoblivious implicit dictionaries
 In Proceedings of the 30th International Colloquium on Automata, Languages and Programming
, 2003
"... Abstract. We consider the issues of implicitness and cacheobliviousness in the classical dictionary problem for n distinct keys over an unbounded and ordered universe. One finding in this paper is that of closing the longstanding open problem about the existence of an optimal implicit dictionary ov ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Abstract. We consider the issues of implicitness and cacheobliviousness in the classical dictionary problem for n distinct keys over an unbounded and ordered universe. One finding in this paper is that of closing the longstanding open problem about the existence of an optimal implicit dictionary over an unbounded universe. Another finding is motivated by the antithetic features of implicit and cacheoblivious models in data structures. We show how to blend their best qualities achieving O(log n) time and O(log B n) block transfers for searching and for amortized updating, while using just n memory cells like sorted arrays and heaps. As a result, we avoid space wasting and provide fast data access at any level of the memory hierarchy. 1
A linear lower bound on index size for text retrieval
 J. Algorithms
, 2003
"... Most informationretrieval systems preprocess the data to produce an auxiliary index structure. Empirically, it has been observed that there is a tradeoff between query response time and the size of the index. When indexing a large corpus, such as the web, the size of the index is an important consi ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Most informationretrieval systems preprocess the data to produce an auxiliary index structure. Empirically, it has been observed that there is a tradeoff between query response time and the size of the index. When indexing a large corpus, such as the web, the size of the index is an important consideration. In this case it would be ideal to produce an index that is substantially smaller than the text. In this work we prove a linear worstcase lower bound on the size of any index that reports the location (if any) of a substring in the text in time proportional to the length of the pattern. In other words, an index supporting lineartime substring searches requires about as much space as the original text. Here “time ” is measured in the number of bit probes to the text; an arbitrary amount of computation may be done on an arbitrary amount of the index. Our lower bound applies to inverted word indices as well. 1
Algorithmic complexity of protein identification: Combinatorics of weighted strings
 DISCRETE APPLIED MATHEMATICS, SPECIAL ISSUE ON COMBINATORICS OF SEARCHING, SORTING, AND CODING. (2002)
, 2004
"... We investigate a problem from computational biology: Given a constant size alphabet M with a weight function / : M> +, find an efficient data structure and query algorithm solving the following problem: For a weight M C + and a string cr over A, decide whether cr contains a substring with weight M ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We investigate a problem from computational biology: Given a constant size alphabet M with a weight function / : M> +, find an efficient data structure and query algorithm solving the following problem: For a weight M C + and a string cr over A, decide whether cr contains a substring with weight M (ONE STRING MASS FINDING PROBLEM). If the answer is yes, then we may in addition require a witness, i.e. indices i _ i and ending at position j has weight M. We allow preprocessing of the string, and measure efficiency in two parameters: storage space required for the preprocessed data, and running time of the query algorithm for given M. We are interested in data structures and algorithms requiring subquadratic storage space and sublinear query time, where we measure the input size as the length of the input string. We present two efficient algorithms: LOOKUP solves the problem with O(,) space and (Wg ' loglog,) time; INTERVAL solves the problem for binary alphabets with O0, ) space in O(log,) time. We sketch a third algorithm, CLUSTER, which can be adjusted for a space time tradeoff but for which we do not yet have a resource analysis. We introduce a function on weighted strings which is closely related to the analysis of algorithms for the ONE STRING MASS FINDING PROBLEM: The number of different submasses of a weighted string. We present several properties of this function, including upper and lower bounds. Finally, we introduce two more general variants of the problem and sketch how algorithms may be extended for these variants.