Results 1  10
of
22
Succinct Data Structures for Retrieval and Approximate Membership
"... Abstract. The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f: U → {0, 1} r that has specified values on the elements of a given set S ⊆ U, S  = n, but may have any value on elements outside S. All known methods (e. g. ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
(Show Context)
Abstract. The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function f: U → {0, 1} r that has specified values on the elements of a given set S ⊆ U, S  = n, but may have any value on elements outside S. All known methods (e. g. those based on perfect hash functions), induce a space overhead of Θ(n) bits over the optimum, regardless of the evaluation time. We show that for any k, query time O(k) can be achieved using space that is within a factor 1 + e −k of optimal, asymptotically for large n. The time to construct the data structure is O(n), expected. If we allow logarithmic evaluation time, the additive overhead can be reduced to O(log log n) bits whp. A general reduction transfers the results on retrieval into analogous results on approximate membership, a problem traditionally addressed using Bloom filters. Thus we obtain space bounds arbitrarily close to the lower bound for this problem as well. The evaluation procedures of our data structures are extremely simple. For the results stated above we assume free access to fully random hash functions. This assumption can be justified using space o(n) to simulate full randomness on a RAM. 1
Counting Inversions, Offline Orthogonal Range Counting, and Related Problems
"... We give an O(n √ lg n)time algorithm for counting the number of inversions in a permutation on n elements. This improves a longstanding previous bound of O(n lg n / lg lg n) that followed from Dietz’s data structure [WADS’89], and answers a question of Andersson and Petersson [SODA’95]. As Dietz’s ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
We give an O(n √ lg n)time algorithm for counting the number of inversions in a permutation on n elements. This improves a longstanding previous bound of O(n lg n / lg lg n) that followed from Dietz’s data structure [WADS’89], and answers a question of Andersson and Petersson [SODA’95]. As Dietz’s result is known to be optimal for the related dynamic rank problem, our result demonstrates a significant improvement in the offline setting. Our new technique is quite simple: we perform a “vertical partitioning ” of a trie (akin to van Emde Boas trees), and use ideas from external memory. However, the technique finds numerous applications: for example, we obtain • in d dimensions, an algorithm to answer n offline orthogonal range counting queries in time O(n lg d−2+1/d n); • an improved construction time for online data structures for orthogonal range counting; • an improved update time for the partial sums problem; • faster Word RAM algorithms for finding the maximum depth in an arrangement of axisaligned rectangles, and for the slope selection problem. As a bonus, we also give a simple (1 + ε)approximation algorithm for counting inversions that runs in linear time, improving the previous O(n lg lg n) bound by Andersson and Petersson.
Streambased randomised language models for smt
 In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
, 2009
"... Randomised techniques allow very big language models to be represented succinctly. However, being batchbased they are unsuitable for modelling an unbounded stream of language whilst maintaining a constant error rate. We present a novel randomised language model which uses an online perfect hash fun ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
Randomised techniques allow very big language models to be represented succinctly. However, being batchbased they are unsuitable for modelling an unbounded stream of language whilst maintaining a constant error rate. We present a novel randomised language model which uses an online perfect hash function to efficiently deal with unbounded text streams. Translation experiments over a text stream show that our online randomised model matches the performance of batchbased LMs without incurring the computational overhead associated with full retraining. This opens up the possibility of randomised language models which continuously adapt to the massive volumes of texts published on the Web each day. 1
Lower Bound Techniques for Data Structures
, 2008
"... We describe new techniques for proving lower bounds on datastructure problems, with the following broad consequences:
â¢ the first Î©(lgn) lower bound for any dynamic problem, improving on a bound that had been standing since 1989;
â¢ for static data structures, the first separation between linea ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We describe new techniques for proving lower bounds on datastructure problems, with the following broad consequences:
â¢ the first Î©(lgn) lower bound for any dynamic problem, improving on a bound that had been standing since 1989;
â¢ for static data structures, the first separation between linear and polynomial space. Specifically, for some problems that have constant query time when polynomial space is allowed, we can show Î©(lg n/ lg lg n) bounds when the space is O(n Â· polylog n).
Using these techniques, we analyze a variety of central datastructure problems, and obtain improved lower bounds for the following:
â¢ the partialsums problem (a fundamental application of augmented binary search trees);
â¢ the predecessor problem (which is equivalent to IP lookup in Internet routers);
â¢ dynamic trees and dynamic connectivity;
â¢ orthogonal range stabbing;
â¢ orthogonal range counting, and orthogonal range reporting;
â¢ the partial match problem (searching with wildcards);
â¢ (1 + Îµ)approximate near neighbor on the hypercube;
â¢ approximate nearest neighbor in the lâ metric.
Our new techniques lead to surprisingly nontechnical proofs. For several problems, we obtain simpler proofs for bounds that were already known.
Substring range reporting
, 2011
"... We revisit various string indexing problems with range reporting features, namely, positionrestricted substring searching, indexing substrings with gaps, and indexing substrings with intervals. We obtain the following main results. • We give efficient reductions for each of the above problems to a ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
We revisit various string indexing problems with range reporting features, namely, positionrestricted substring searching, indexing substrings with gaps, and indexing substrings with intervals. We obtain the following main results. • We give efficient reductions for each of the above problems to a new problem, which we call substring range reporting. Hence, we unify the previous work by showing that we may restrict our attention to a single problem rather than studying each of the above problems individually. • We show how to solve substring range reporting with optimal query time and little space. Combined with our reductions this leads to significantly improved timespace tradeoffs for the above problems. In particular, for each problem we obtain the first solutions with optimal time query and O(n logO(1) n) space, where n is the length of the indexed string. • We show that our techniques for substring range reporting generalize to substring range counting and substring range emptiness variants. We also obtain nontrivial timespace tradeoffs for these problems. Our bounds for substring range reporting are based on a novel combination of suffix trees and range reporting data structures. The reductions are simple and general and may apply to other combinations of string indexing with range reporting. 1
A dynamic data structure for flexible molecular maintenance and informatics
 In SIAM/ACM GDSPM09, Accepted
"... We present the “Dynamic Packing Grid ” (DPG) data structure along with details of our implementation and performance results, for maintaining and manipulating flexible molecular models and assemblies. DPG can efficiently maintain the molecular surface (e.g., van der Waals surface and the solvent ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
We present the “Dynamic Packing Grid ” (DPG) data structure along with details of our implementation and performance results, for maintaining and manipulating flexible molecular models and assemblies. DPG can efficiently maintain the molecular surface (e.g., van der Waals surface and the solvent contact surface) under insertion/deletion / movement (i.e., updates) of atoms or groups of atoms. DPG also permits the fast estimation of important molecular properties (e.g., surface area, volume, polarization energy, etc.) that are needed for computing binding affinities in drug design or in molecular dynamics calculations. DPG can additionally be utilized in efficiently maintaining multiple “rigid” domains of dynamic flexible molecules. In DPG, each update takes only O (logw) time w.h.p. on a RAM with wbit words i.e., O (1) time in practice, and hence is extremely fast. DPG’s queries include the reporting of all atoms within O (rmax) distance from any given atom center or point in 3space in O (log logw) ( = O (1)) time w.h.p., where rmax is the radius of the largest atom in the molecule. It can also answer whether a given atom is exposed or buried under the surface within the same time bound, and can return the entire molecular surface in O (m) worstcase time, where m is the number of atoms on the surface. The data structure uses space linear in the number of atoms in the molecule. Categories and Subject Descriptors
The limits of buffering: A tight lower bound for dynamic membership in the external memory model
 In Proc. ACM Symposium on Theory of Computing
, 2010
"... We study the dynamic membership (or dynamic dictionary) problem, which is one of the most fundamental problems in data structures. We study the problem in the external memory model with cell size b bits and cache size m bits. We prove that if the amortized cost of updates is at most 0.999 (or any ot ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
We study the dynamic membership (or dynamic dictionary) problem, which is one of the most fundamental problems in data structures. We study the problem in the external memory model with cell size b bits and cache size m bits. We prove that if the amortized cost of updates is at most 0.999 (or any other constant < 1), then the query cost must be Ω(logb log n (n/m)), where n is the number of elements in the dictionary. In contrast, when the update time is allowed to be 1 + o(1), then a bit vector or hash table give query time O(1). Thus, this is a threshold phenomenon for data structures. This lower bound answers a folklore conjecture of the external memory community. Since almost any data structure task can solve membership, our lower bound implies a dichotomy between two alternatives: (i) make the amortized update time at least 1 (so the data structure does not buffer, and we lose one of the main potential advantages of the cache), or (ii) make the query time at least roughly logarithmic in n. Our result holds even when the updates and queries are chosen uniformly at random and there are no deletions; it holds for randomized data structures, holds when the universe size is O(n), and does not make any restrictive assumptions such as indivisibility. All of the lower bounds we prove hold regardless of the space consumption of the data structure, while the upper bounds only need linear space. The lower bound has some striking implications for external memory data structures. It shows that the query complexities of many problems such as 1Drange counting, predecessor, rankselect, and many others, are all the same
On dynamic bitprobe complexity
, 2005
"... This work present several advances in the understanding of dynamic data structures in the bitprobe model: • We improve the lower bound record for dynamic language membership problems to Ω(( Surpassing Ω(lg n) was listed as the first open problem in a survey by Miltersen. • We prove a bound of Ω( kn ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This work present several advances in the understanding of dynamic data structures in the bitprobe model: • We improve the lower bound record for dynamic language membership problems to Ω(( Surpassing Ω(lg n) was listed as the first open problem in a survey by Miltersen. • We prove a bound of Ω( known bounds were Ω( lg n lg lg lg n lg n lg lg n lg n lg lg n)2).) for maintaining partial sums in Z/2Z. Previously, the) and O(lg n). • We prove a surprising and tight upper bound of O ( lg lg n) for the greaterthan problem, and several predecessortype problems. We use this to obtain the same upper bound for dynamic word and prefix problems in groupfree monoids. We also obtain new lower bounds for the partialsums problem in the cellprobe and externalmemory models. Our lower bounds are based on a surprising improvement of the classic chronogram technique of Fredman and Saks [1989], which makes it possible to prove logarithmic lower bounds by this approach. Before the work of M. Pǎtrascu and Demaine [2004], this was the lg n only known technique for dynamic lower bounds, and surpassing Ω ( lg lg n) was a central open problem in cellprobe complexity.
How to Approximate A Set Without Knowing Its Size In Advance
"... The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an online fashion, supporting membership queries without false negatives and with a false positive rate at most ϵ. That is, the membership algorithm must be correct on each x ∈ S, and may ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
The dynamic approximate membership problem asks to represent a set S of size n, whose elements are provided in an online fashion, supporting membership queries without false negatives and with a false positive rate at most ϵ. That is, the membership algorithm must be correct on each x ∈ S, and may err with probability at most ϵ on each x / ∈ S. We study a wellmotivated, yet insufficiently explored, variant of this problem where the size n of the set is not known in advance. Existing optimal approximate membership data structures require that the size is known in advance, but in many practical scenarios this is not a realistic assumption. Moreover, even if the eventual size n of the set is known in advance, it is desirable to have the smallest possible space usage also when the current number of inserted elements is smaller than n. Our contribution consists of the following results: • We show a superlinear gap between the space complexity when the size is known in advance and the space complexity when the size is not known in advance. When the size is known in advance, it is wellknown that Θ(n log(1/ϵ)) bits of space are necessary and sufficient (Bloom ’70, Carter et al. ’78). However, when the size is not known in advance, we prove
Dynamic Data Structures for Document Collections and Graphs
"... In the dynamic indexing problem, we must maintain a changing collection of text documents so that we can efficiently support insertions, deletions, and pattern matching queries. We are especially interested in developing efficient data structures that store and query the documents in compressed fo ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In the dynamic indexing problem, we must maintain a changing collection of text documents so that we can efficiently support insertions, deletions, and pattern matching queries. We are especially interested in developing efficient data structures that store and query the documents in compressed form. All previous compressed solutions to this problem rely on answering rank and select queries on a dynamic sequence of symbols. Because of the lower bound in [Fredman and Saks, 1989], answering rank queries presents a bottleneck in compressed dynamic indexing. In this paper we show how this lower bound can be circumvented using our new framework. We demonstrate that the gap between static and dynamic variants of the indexing problem can be almost closed. Our method is based on a novel framework for adding dynamism to static compressed data structures. Our framework also applies more generally to dynamizing other problems. We show, for example, how our framework can be applied to develop compressed representations of dynamic graphs and binary relations.