Results 1 - 10
of
26
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract
- in Proceedings of the 32nd Annual ACM Symposium on the Theory of Computing
, 2000
"... Abstract. The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for space-efficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed al ..."
Abstract
-
Cited by 172 (15 self)
- Add to MetaCart
Abstract. The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for space-efficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. The text T can be represented in n lg |Σ | bits by encoding each symbol with lg |Σ | bits. The goal is to support fast online queries for searching any string pattern P of m symbols, with T being fully scanned only once, namely, when the index is created at preprocessing time. The text indexing schemes published in the literature are greedy in terms of space usage: they require Ω(n lg n) additional bits of space in the worst case. For example, in the standard unit cost RAM, suffix trees and suffix arrays need Ω(n) memory words, each of Ω(lg n) bits. These indexes are larger than the text itself by a multiplicative factor of Ω(lg |Σ | n), which is significant when Σ is of constant size, such as in ascii or unicode. On the other hand, these indexes support fast searching, either in O(m lg |Σ|) timeorinO(m +lgn) time, plus an output-sensitive cost O(occ) for listing the occ pattern occurrences. We present a new text index that is based upon compressed representations of suffix arrays and suffix trees. It achieves a fast O(m / lg |Σ | n +lgɛ |Σ | n) search time in the worst case, for any constant
Lower bounds for orthogonal range searching: I. the reporting case
- Journal of the ACM
, 1990
"... Abstract. We establish lower bounds on the complexity of orthogonal range reporting in the static case. Given a collection of n points in d-space and a box [a,, b,] x. x [ad, bd], report every point whose ith coordinate lies in [a,, biJ, for each i = 1,..., d. The collection of points is fixed once ..."
Abstract
-
Cited by 57 (4 self)
- Add to MetaCart
Abstract. We establish lower bounds on the complexity of orthogonal range reporting in the static case. Given a collection of n points in d-space and a box [a,, b,] x. x [ad, bd], report every point whose ith coordinate lies in [a,, biJ, for each i = 1,..., d. The collection of points is fixed once and for all and can be preprocessed. The box, on the other hand, constitutes a query that must be answered on-line. It is shown that on a pointer machine a query time of O(k + polylog(n)), where k is the number of points to be reported, can only be achieved at the expense of fl(n(logn/loglogn)d-‘) storage. Interestingly, these bounds are optimal in the pointer machine model, but they can be improved (ever so slightly) on a random access machine. In a companion paper, we address the related problem of adding up weights assigned to the points in the query box.
New data structures for orthogonal range searching
- In Proc. 41st IEEE Symposium on Foundations of Computer Science
, 2000
"... ..."
Excluded Middle Vantage Point Forests for Nearest Neighbor Search
- In DIMACS Implementation Challenge, ALENEX'99
, 1999
"... The excluded middle vantage point forest is a new data structure that supports worst case sublinear time searches in a metric space for nearest neighbors within a xed radius of arbitrary queries. Worst case performance depends on the dataset but is not aected by the distribution of queries. Our an ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
The excluded middle vantage point forest is a new data structure that supports worst case sublinear time searches in a metric space for nearest neighbors within a xed radius of arbitrary queries. Worst case performance depends on the dataset but is not aected by the distribution of queries. Our analysis predicts vp-forest performance in simple settings such as L p spaces with uniform random datasets | and experiments conrm these predictions. Another contribution of the analysis is a new perspective on the curse of dimensionality in the context of our methods and kd-trees as well. In our idealized setting the dataset is organized into a forest of O(N 1 ) trees, each of depth O(log N ). Here may be viewed as depending on , the distance function, and on the dataset. The radius of interest is an input to the organization process and the result is a linear space data structure specialized to answer queries within this distance. Searches then require O(N 1 log N) time, or...
Efficient Data Structures for Range Searching on a Grid
, 1987
"... We consider the 2-dimensional range searching problem in the case where all point lie on an integer grid. A new data structure is preented that solves range queries on a U U grid in O(k + loglog U) time using O(n log n) storage, where n is the number of points and k the number of reported answers ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
We consider the 2-dimensional range searching problem in the case where all point lie on an integer grid. A new data structure is preented that solves range queries on a U U grid in O(k + loglog U) time using O(n log n) storage, where n is the number of points and k the number of reported answers. Although the query
Optimal Dynamic Range Searching in Non-replicating Index Structures
- In Proc. International Conference on Database Theory, LNCS 1540
, 1997
"... We consider the problem of dynamic range searching in tree structures that do not replicate data. We propose a new dynamic structure, called the O-tree, that achieves a query time complexity of O(n (d\Gamma1)=d ) on n d-dimensional points and an amortized insertion/deletion time complexity of O(l ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
We consider the problem of dynamic range searching in tree structures that do not replicate data. We propose a new dynamic structure, called the O-tree, that achieves a query time complexity of O(n (d\Gamma1)=d ) on n d-dimensional points and an amortized insertion/deletion time complexity of O(log n). We show that this structure is optimal when data is not replicated. In addition to optimal query and insertion/deletion times, the O-tree also supports exact match queries in worst-case logarithmic time. 1 Introduction Given a set S of d-dimensional points, a range query q is specified by d 1-dimensional intervals [q s i ; q e i ], one for each dimension i, and retrieves all points p = (p 1 ; p 2 ; : : : p d ) in S such that h8i 2 f1; : : : ; dg : q s i p i q e i i. This type of searching in multidimensional space has important applications in geographic information systems, image databases, and computer graphics. Several structures such as the range trees [3], P-range trees [29...
Locally Lifting the Curse of Dimensionality for Nearest Neighbor Search (Extended Abstract)
- IN PROC. 11TH ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA'00
, 1999
"... We consider the problem of nearest neighbor search in the Euclidean hypercube [ 1, +1]^d with uniform distributions, and the additional natural assumption that the nearest neighbor is located within a constant fraction R of the maximum interpoint distance in this space, i.e. within distance 2R&radic ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
We consider the problem of nearest neighbor search in the Euclidean hypercube [ 1, +1]^d with uniform distributions, and the additional natural assumption that the nearest neighbor is located within a constant fraction R of the maximum interpoint distance in this space, i.e. within distance 2R√d of the query. We introduce the idea of aggressive pruning and give a family of practical algorithms, an idealized analysis, and describe experiments. Our main result is that search complexity measured in terms of d-dimensional inner product operations, is i) strongly sublinear with respect to the data set size n for moderate R, ii) asymptotically, and as a practical matter, independent of dimension. Given a random data set, a random query within distance 2R√d of some database element, and a randomly constructed data structure, the search succeeds with a specified probability, which is a parameter of the search algorithm. On average a search performs...
Hierarchical representations of collections of small rectangles
- ACM Computing Surveys
, 1988
"... A tutorial survey is presented of hierarchical data structures for representing collections of small rectangles. Rectangles are often used as an approximation of shapes for which they serve as the minimum rectilinear enclosing object. They arise in applications in cartography as well as very large-s ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
A tutorial survey is presented of hierarchical data structures for representing collections of small rectangles. Rectangles are often used as an approximation of shapes for which they serve as the minimum rectilinear enclosing object. They arise in applications in cartography as well as very large-scale integration (VLSI) design rule checking. The different data structures are discussed in terms of how they support the execution of queries involving proximity relations. The focus is on intersection and subset queries. Several types of representations are described. Some are designed for use with the plane-sweep paradigm, which works well for static collections of rectangles. Others are oriented toward dynamic collections. In this case, one representation reduces each rectangle to a point in a higher multidimensional space and treats the problem as one involving point data. The other representation is area based-that is, it depends on the physical extent of each rectangle.
Computing partial sums in multidimensional arrays
- In Proc. of the ACM Symp. on Computational Geometry
, 1989
"... 1 Introduction The central theme of this paper is the complexity of the partial-sum problem: Given a d-dimensional array A with n entries in a semigroup and a d-rectangle q = [a1; b1] \Theta \Delta \Delta \Delta \Theta [ad; bd], compute the sum oe(A; q) = X ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
1 Introduction The central theme of this paper is the complexity of the partial-sum problem: Given a d-dimensional array A with n entries in a semigroup and a d-rectangle q = [a1; b1] \Theta \Delta \Delta \Delta \Theta [ad; bd], compute the sum oe(A; q) = X
Data Structures for Dynamic Queries: An Analytical and Experimental Evaluation
- Proc. of the Workshop on Advanced Visual Interfaces. NY: ACM
, 1994
"... Dynamic Queries is a querying technique for doing range search on multi-key data sets. It is a direct manipulation mechanism where the query is formulated using graphical widgets and the results are displayed graphically in real time. This paper evaluates four data structures, the multilist, the gri ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
Dynamic Queries is a querying technique for doing range search on multi-key data sets. It is a direct manipulation mechanism where the query is formulated using graphical widgets and the results are displayed graphically in real time. This paper evaluates four data structures, the multilist, the grid file, k-d tree and the quad tree used to organize data in high speed storage for dynamic queries. The effect of factors like size, distribution and dimensionality of data on the storage overhead and the speed of search is explored. A way of estimating the storage and the search overheads using analytical models is presented. These models are verified to be correct by empirical data. Results indicate that multilists are suitable for small (few thousand points) data sets irrespective of the data distribution. For large data sets the grid files are excellent for uniformly distributed data, and trees are good for skewed data distributions. There was no significant difference in performance bet...

