Results 1  10
of
38
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract
 in Proceedings of the 32nd Annual ACM Symposium on the Theory of Computing
, 2000
"... Abstract. The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for spaceefficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed al ..."
Abstract

Cited by 189 (17 self)
 Add to MetaCart
Abstract. The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for spaceefficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. The text T can be represented in n lg Σ  bits by encoding each symbol with lg Σ  bits. The goal is to support fast online queries for searching any string pattern P of m symbols, with T being fully scanned only once, namely, when the index is created at preprocessing time. The text indexing schemes published in the literature are greedy in terms of space usage: they require Ω(n lg n) additional bits of space in the worst case. For example, in the standard unit cost RAM, suffix trees and suffix arrays need Ω(n) memory words, each of Ω(lg n) bits. These indexes are larger than the text itself by a multiplicative factor of Ω(lg Σ  n), which is significant when Σ is of constant size, such as in ascii or unicode. On the other hand, these indexes support fast searching, either in O(m lg Σ) timeorinO(m +lgn) time, plus an outputsensitive cost O(occ) for listing the occ pattern occurrences. We present a new text index that is based upon compressed representations of suffix arrays and suffix trees. It achieves a fast O(m / lg Σ  n +lgɛ Σ  n) search time in the worst case, for any constant
New data structures for orthogonal range searching
 In Proc. 41st IEEE Symposium on Foundations of Computer Science
, 2000
"... ..."
Lower bounds for orthogonal range searching: I. the reporting case
 Journal of the ACM
, 1990
"... Abstract. We establish lower bounds on the complexity of orthogonal range reporting in the static case. Given a collection of n points in dspace and a box [a,, b,] x. x [ad, bd], report every point whose ith coordinate lies in [a,, biJ, for each i = 1,..., d. The collection of points is fixed once ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
Abstract. We establish lower bounds on the complexity of orthogonal range reporting in the static case. Given a collection of n points in dspace and a box [a,, b,] x. x [ad, bd], report every point whose ith coordinate lies in [a,, biJ, for each i = 1,..., d. The collection of points is fixed once and for all and can be preprocessed. The box, on the other hand, constitutes a query that must be answered online. It is shown that on a pointer machine a query time of O(k + polylog(n)), where k is the number of points to be reported, can only be achieved at the expense of fl(n(logn/loglogn)d‘) storage. Interestingly, these bounds are optimal in the pointer machine model, but they can be improved (ever so slightly) on a random access machine. In a companion paper, we address the related problem of adding up weights assigned to the points in the query box.
Excluded Middle Vantage Point Forests for Nearest Neighbor Search
 In DIMACS Implementation Challenge, ALENEX'99
, 1999
"... The excluded middle vantage point forest is a new data structure that supports worst case sublinear time searches in a metric space for nearest neighbors within a xed radius of arbitrary queries. Worst case performance depends on the dataset but is not aected by the distribution of queries. Our an ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
The excluded middle vantage point forest is a new data structure that supports worst case sublinear time searches in a metric space for nearest neighbors within a xed radius of arbitrary queries. Worst case performance depends on the dataset but is not aected by the distribution of queries. Our analysis predicts vpforest performance in simple settings such as L p spaces with uniform random datasets  and experiments conrm these predictions. Another contribution of the analysis is a new perspective on the curse of dimensionality in the context of our methods and kdtrees as well. In our idealized setting the dataset is organized into a forest of O(N 1 ) trees, each of depth O(log N ). Here may be viewed as depending on , the distance function, and on the dataset. The radius of interest is an input to the organization process and the result is a linear space data structure specialized to answer queries within this distance. Searches then require O(N 1 log N) time, or...
Efficient Data Structures for Range Searching on a Grid
, 1987
"... We consider the 2dimensional range searching problem in the case where all point lie on an integer grid. A new data structure is preented that solves range queries on a U U grid in O(k + loglog U) time using O(n log n) storage, where n is the number of points and k the number of reported answers ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
We consider the 2dimensional range searching problem in the case where all point lie on an integer grid. A new data structure is preented that solves range queries on a U U grid in O(k + loglog U) time using O(n log n) storage, where n is the number of points and k the number of reported answers. Although the query
Optimal Dynamic Range Searching in Nonreplicating Index Structures
 In Proc. International Conference on Database Theory, LNCS 1540
, 1997
"... We consider the problem of dynamic range searching in tree structures that do not replicate data. We propose a new dynamic structure, called the Otree, that achieves a query time complexity of O(n (d\Gamma1)=d ) on n ddimensional points and an amortized insertion/deletion time complexity of O(l ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
We consider the problem of dynamic range searching in tree structures that do not replicate data. We propose a new dynamic structure, called the Otree, that achieves a query time complexity of O(n (d\Gamma1)=d ) on n ddimensional points and an amortized insertion/deletion time complexity of O(log n). We show that this structure is optimal when data is not replicated. In addition to optimal query and insertion/deletion times, the Otree also supports exact match queries in worstcase logarithmic time. 1 Introduction Given a set S of ddimensional points, a range query q is specified by d 1dimensional intervals [q s i ; q e i ], one for each dimension i, and retrieves all points p = (p 1 ; p 2 ; : : : p d ) in S such that h8i 2 f1; : : : ; dg : q s i p i q e i i. This type of searching in multidimensional space has important applications in geographic information systems, image databases, and computer graphics. Several structures such as the range trees [3], Prange trees [29...
Locally Lifting the Curse of Dimensionality for Nearest Neighbor Search (Extended Abstract)
 IN PROC. 11TH ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA'00
, 1999
"... We consider the problem of nearest neighbor search in the Euclidean hypercube [ 1, +1]^d with uniform distributions, and the additional natural assumption that the nearest neighbor is located within a constant fraction R of the maximum interpoint distance in this space, i.e. within distance 2R&radic ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
We consider the problem of nearest neighbor search in the Euclidean hypercube [ 1, +1]^d with uniform distributions, and the additional natural assumption that the nearest neighbor is located within a constant fraction R of the maximum interpoint distance in this space, i.e. within distance 2R√d of the query. We introduce the idea of aggressive pruning and give a family of practical algorithms, an idealized analysis, and describe experiments. Our main result is that search complexity measured in terms of ddimensional inner product operations, is i) strongly sublinear with respect to the data set size n for moderate R, ii) asymptotically, and as a practical matter, independent of dimension. Given a random data set, a random query within distance 2R√d of some database element, and a randomly constructed data structure, the search succeeds with a specified probability, which is a parameter of the search algorithm. On average a search performs...
Hierarchical representations of collections of small rectangles
 ACM Computing Surveys
, 1988
"... A tutorial survey is presented of hierarchical data structures for representing collections of small rectangles. Rectangles are often used as an approximation of shapes for which they serve as the minimum rectilinear enclosing object. They arise in applications in cartography as well as very larges ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
A tutorial survey is presented of hierarchical data structures for representing collections of small rectangles. Rectangles are often used as an approximation of shapes for which they serve as the minimum rectilinear enclosing object. They arise in applications in cartography as well as very largescale integration (VLSI) design rule checking. The different data structures are discussed in terms of how they support the execution of queries involving proximity relations. The focus is on intersection and subset queries. Several types of representations are described. Some are designed for use with the planesweep paradigm, which works well for static collections of rectangles. Others are oriented toward dynamic collections. In this case, one representation reduces each rectangle to a point in a higher multidimensional space and treats the problem as one involving point data. The other representation is area basedthat is, it depends on the physical extent of each rectangle.
Computing partial sums in multidimensional arrays
 In Proc. of the ACM Symp. on Computational Geometry
, 1989
"... 1 Introduction The central theme of this paper is the complexity of the partialsum problem: Given a ddimensional array A with n entries in a semigroup and a drectangle q = [a1; b1] \Theta \Delta \Delta \Delta \Theta [ad; bd], compute the sum oe(A; q) = X ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
1 Introduction The central theme of this paper is the complexity of the partialsum problem: Given a ddimensional array A with n entries in a semigroup and a drectangle q = [a1; b1] \Theta \Delta \Delta \Delta \Theta [ad; bd], compute the sum oe(A; q) = X
Data Structures for Dynamic Queries: An Analytical and Experimental Evaluation
 Proc. of the Workshop on Advanced Visual Interfaces. NY: ACM
, 1994
"... Dynamic Queries is a querying technique for doing range search on multikey data sets. It is a direct manipulation mechanism where the query is formulated using graphical widgets and the results are displayed graphically in real time. This paper evaluates four data structures, the multilist, the gri ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
Dynamic Queries is a querying technique for doing range search on multikey data sets. It is a direct manipulation mechanism where the query is formulated using graphical widgets and the results are displayed graphically in real time. This paper evaluates four data structures, the multilist, the grid file, kd tree and the quad tree used to organize data in high speed storage for dynamic queries. The effect of factors like size, distribution and dimensionality of data on the storage overhead and the speed of search is explored. A way of estimating the storage and the search overheads using analytical models is presented. These models are verified to be correct by empirical data. Results indicate that multilists are suitable for small (few thousand points) data sets irrespective of the data distribution. For large data sets the grid files are excellent for uniformly distributed data, and trees are good for skewed data distributions. There was no significant difference in performance bet...