Results 1  10
of
20
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
, 1998
"... The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimens ..."
Abstract

Cited by 1019 (40 self)
 Add to MetaCart
The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimensional Euclidean space where X = ! d under some l p norm. Despite decades of effort, the current solutions are far from satisfactory; in fact, for large d, in theory or in practice, they provide little improvement over the bruteforce algorithm which compares the query point to each data point. Of late, there has been some interest in the approximate nearest neighbors problem, which is: Find a point p 2 P that is an fflapproximate nearest neighbor of the query q in that for all p 0 2 P , d(p; q) (1 + ffl)d(p 0 ; q). We present two algorithmic results for the approximate version that significantly improve the known bounds: (a) preprocessing cost polynomial in n and d, and a trul...
Similarity search in high dimensions via hashing
, 1999
"... The nearest or nearneighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building search/index structures for performing similarity search over highdimensional data, e.g., image dat ..."
Abstract

Cited by 622 (13 self)
 Add to MetaCart
The nearest or nearneighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building search/index structures for performing similarity search over highdimensional data, e.g., image databases, document collections, timeseries databases, and genome databases. Unfortunately, all known techniques for solving this problem fall prey to the \curse of dimensionality. &quot; That is, the data structures scale poorly with data dimensionality; in fact, if the number of dimensions exceeds 10 to 20, searching in kd trees and related structures involves the inspection of a large fraction of the database, thereby doing no better than bruteforce linear search. It has been suggested that since the selection of features and the choice of a distance metric in typical applications is rather heuristic, determining an approximate nearest neighbor should su ce for most practical purposes. In this paper, we examine a novel scheme for approximate similarity search based on hashing. The basic idea is to hash the points
The Atree: An Index Structure for HighDimensional Spaces Using Relative Approximation
, 2000
"... We propose a novel index structure, Atree (Approximation tree), for similarity search of highdimensional data. The basic idea of the Atree is the introduction of Virtual Bounding Rectangles (VBRs), which contain and approximate MBRs and data objects. VBRs can be represented rather compactly, and ..."
Abstract

Cited by 107 (0 self)
 Add to MetaCart
We propose a novel index structure, Atree (Approximation tree), for similarity search of highdimensional data. The basic idea of the Atree is the introduction of Virtual Bounding Rectangles (VBRs), which contain and approximate MBRs and data objects. VBRs can be represented rather compactly, and thus affect the tree configuration both quantitatively and qualitatively. Firstly, since tree nodes can install large number of entries of VBRs, fanout of nodes becomes large, thus leads to fast search. More importantly, we have a free hand in arranging MBRs and VBRs in tree nodes. In the Atrees, nodes contain entries of an MBR and its children VBRs. Therefore, by fetching a node of an Atree, we can obtain the information of exact position of a parent MBR and approximate position of its children. We have performed experiments using both synthetic and real data sets. For the real data sets, the Atree outperforms the SRtree and the VAFile in all range of dimensionality up to 64 dimension, which is the highest dimension in our experiments. The Atree achieves 77.3 % (77.7%, resp.) savings in page accesses compared to the SRtree (the VAFile, resp.) for 64dimensional real data.
Spgist: An extensible database index for supporting space partitioning trees
 J. Intell. Inf. Syst
"... Abstract. Emerging database applications require the use of new indexing structures beyond Btrees and Rtrees. Examples are the kD tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of a ..."
Abstract

Cited by 28 (9 self)
 Add to MetaCart
(Show Context)
Abstract. Emerging database applications require the use of new indexing structures beyond Btrees and Rtrees. Examples are the kD tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these indexes is that they recursively divide the space into partitions. A new extensible index structure, termed SPGiST is presented that supports this class of data structures, mainly the class of space partitioning unbalanced trees. Simple method implementations are provided that demonstrate how SPGiST can behave as a kD tree, a trie, a quadtree, or any of their variants. Issues related to clustering tree nodes into pages as well as concurrency control for SPGiST are addressed. A dynamic minimumheight clustering technique is applied to minimize disk accesses and to make using such trees in database systems possible and efficient. A prototype implementation of SPGiST is presented as well as performance studies of the various SPGiST’s tuning parameters. Keywords: spacepartitioning trees, spatial databases, extensible index, generalized search trees, clustering
High Dimensional Similarity Search With Space Filling Curves
 In Proceedings of the 17th International Conference on Data Engineering
, 2000
"... We present a new approach for approximate nearest neighbor queries for sets of high dimensional points under any L t metric, t = 1,2,3,... The proposed algorithm is efficient and simple to implement. The algorithm uses multiple shifted copies of the data points and stores them in up to (d + 1) Btr ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
We present a new approach for approximate nearest neighbor queries for sets of high dimensional points under any L t metric, t = 1,2,3,... The proposed algorithm is efficient and simple to implement. The algorithm uses multiple shifted copies of the data points and stores them in up to (d + 1) Btrees where d is the dimensionality of the data, sorted according to their position along a space filling curve. This is done in a way that allows us to guarantee that a neighbor within an O(d^(1+1/t)) factor of the exact nearest, can be returned with at most (d + 1) log p n page accesses, where p is the branching factor of the Btrees. In practice, for real data sets, our approximate technique finds the exact nearest neighbor between 87% and 99% of the time and a point no farther than the third nearest neighbor between 98% and 100% of the time. Our solution is dynamic, allowing insertion or deletion of points in O(d log p n) page accesses and generalizes easily to find approximate knea...
Developing a DataBlade for a New Index
, 1999
"... In order to better support current and new applications, the major DBMS vendors are stepping beyond uninterpreted binary large objects, termed BLOBs, and are beginning to offer extensibility features that allow external developers to extend the DBMS with, e.g., their own data types and accompanying ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
In order to better support current and new applications, the major DBMS vendors are stepping beyond uninterpreted binary large objects, termed BLOBs, and are beginning to offer extensibility features that allow external developers to extend the DBMS with, e.g., their own data types and accompanying access methods. Existing solutions include DB2 extenders, Informix DataBlades, and Oracle cartridges. Extensible systems offer new and exciting opportunities for researchers and thirdparty developers alike. This paper reports on an implementation of an Informix DataBlade for the GRtree, a new Rtree based index. This effort represents a stress test of the perhaps currently most extensible DBMS, in that the new DataBlade aims to achieve better performance, not just to add functionality. The paper provides guidelines for how to create an access method DataBlade, describes the sometimes surprising challenges that must be negotiated during DataBlade development, and evaluates the extensibility of the Informix Dynamic Server.
PKTREE: A SPATIAL INDEX STRUCTURE FOR HIGH DIMENSIONAL POINT DATA
"... In this chapter we present the PKtree which is an index structure for high dimensional point data. The proposed indexing structure can be viewed as combining aspects of the PRquad or KD tree but where unnecessary nodes are eliminated. The unnecessary nodes are typically the result of skew in the ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
In this chapter we present the PKtree which is an index structure for high dimensional point data. The proposed indexing structure can be viewed as combining aspects of the PRquad or KD tree but where unnecessary nodes are eliminated. The unnecessary nodes are typically the result of skew in the point distribution and we show that by eliminating these nodes the performance of the resulting index is robust to skewed data distributions. The index structure is formally defined, efficiently updatable and bounds on the number of nodes and the mean height of the tree can be proved. Bounds on the expected height of the tree can be given under certain mild constraints on the spatial distribution of points. Empirical evidence both on real data sets and generated data sets shows that the PKtree outperforms the recently proposed spatial indexes based on the Rtree such as the SRtree and Xtree by a wide margin. It is also significant that the relative performance advantage of the PKtree grows with the dimensionality of the data set.
SpacePartitioning Trees in PostgreSQL: Realization and Performance
 In Proc. of the 22nd International Conference on Data Engineering (ICDE’06
, 2006
"... Many evolving database applications warrant the use of nontraditional indexing mechanisms beyond B+trees and hash tables. SPGiST is an extensible indexing framework that broadens the class of supported indexes to include diskbased versions of a wide variety of spacepartitioning trees, e.g., dis ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Many evolving database applications warrant the use of nontraditional indexing mechanisms beyond B+trees and hash tables. SPGiST is an extensible indexing framework that broadens the class of supported indexes to include diskbased versions of a wide variety of spacepartitioning trees, e.g., diskbased trie variants, quadtree variants, and kdtrees. This paper presents a serious attempt at implementing and realizing SPGiSTbased indexes inside PostgreSQL. Several index types are realized inside PostgreSQL facilitated by rapid SPGiST instantiations. Challenges, experiences, and performance issues are addressed in the paper. Performance comparisons are conducted from within PostgreSQL to compare update and search performances of SPGiSTbased indexes against the B+tree and the Rtree for string, point, and line segment data sets. Interesting results that highlight the potential performance gains of SPGiSTbased indexes are presented in the paper. 1
The PNtree: A parallel and distributed multidimensional index, Distributed and Parallel Databases 17 (2
, 2005
"... Abstract. Multidimensional indexing is concerned with the indexing of multiattributed records, where queries can be applied on some or all of the attributes. Indexing multiattributed records is referred to by the term multidimensional indexing because each record is viewed as a point in a multidim ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Multidimensional indexing is concerned with the indexing of multiattributed records, where queries can be applied on some or all of the attributes. Indexing multiattributed records is referred to by the term multidimensional indexing because each record is viewed as a point in a multidimensional space with a number of dimensions that is equal to the number of attributes. The values of the point coordinates along each dimension are equivalent to the values of the corresponding attributes. In this paper, the PNtree, a new index structure for multidimensional spaces, is presented. This index structure is an efficient structure for indexing multidimensional points and is parallel by nature. Moreover, the proposed index structure does not lose its efficiency if it is serially processed or if it is processed using a small number of processors. The PNtree can take advantage of as many processors as the dimensionality of the space. The PNtree makes use of B+trees that have been developed and tested over years in many DBMSs. The PNtree is compared to the Hybrid tree that is known for its superiority among various index structures. Experimental results show that parallel processing of the PNtree reduces significantly the number of disk accesses involved in the search operation. Even in its serial case, the PNtree outperforms the Hybrid tree for large database sizes.
A framework for supporting the class of spacepartitioning trees
, 2001
"... Emerging database applications require the use of new indexing structures beyond Btrees and Rtrees. Examples are the kD tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these i ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Emerging database applications require the use of new indexing structures beyond Btrees and Rtrees. Examples are the kD tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these indexes is that they recursively divide the space into partitions. A new extensible index structure, termed SPGiST is presented that supports this class of data structures, mainly the class of space partitioning unbalanced trees. Simple method implementations are provided that demonstrate how SPGiST can behave as a kD tree, a trie, a quadtree, or any of their variants. Issues related to clustering tree nodes into pages as well as concurrency control for SPGiST are addressed. A dynamic minimumheight clustering technique is applied to minimize disk accesses and to make using such trees in database systems possible and efficient. A prototype implementation of SPGiST is presented as well as performance studies of the various SPGiST’s tuning parameters. Keywords: SPGiST, spacepartitioning trees, GiST, spatial tree indexes, access methods, clustering. 1.